**Q1. Load the "titanic" dataset using the load_dataset function of seaborn. Use Plotly express to plot a scatter plot for age and fare columns in the titanic dataset.**

A scatter plot is a type of graph used to display the relationship between two continuous variables. It consists of a set of points, where each point represents the value of two variables, one plotted along the horizontal axis (X-axis) and the other plotted along the vertical axis (Y-axis).

Scatter plots are useful for identifying patterns or trends in data, as well as for detecting the presence of outliers. They can also be used to identify the strength and direction of the relationship between the two variables, such as a positive or negative correlation.

In a scatter plot, the independent variable is plotted on the X-axis, and the dependent variable is plotted on the Y-axis. The points on the graph represent the values of both variables for each observation in the dataset. By examining the distribution of points on the graph, it is possible to draw conclusions about the relationship between the two variables being studied.

Here is the example, given below, to show how a scatter plot is plotted in the Plotly library, alongwith the titanic dataset, fetched and loaded through Seaborn visualization library:

In [None]:
import seaborn as sns
a1 = sns.load_dataset('titanic')
a1

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x=a1.age, y=a1.fare, mode='markers'))
fig.show()

**Q2. Using the tips dataset in the Plotly library, plot a box plot using Plotly express.**

A box plot, also known as a box-and-whisker plot, is a graphical representation of a dataset that displays its distribution and summary statistics. It is a standardized way of displaying the distribution of data based on five key statistics: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

In a box plot, a rectangular box is drawn to represent the middle 50% of the data, from the first quartile (Q1) to the third quartile (Q3), with a line inside the box representing the median. The distance between the first and third quartile is called the interquartile range (IQR). Two lines, called whiskers, extend from the box to the minimum and maximum values in the dataset, excluding any outliers that fall outside of the whiskers.

Box plots are useful for visually summarizing the distribution of a dataset, and they can be used to identify skewness, outliers, and other features of the data. They are commonly used in statistical analysis, data visualization, and data exploration.

Here is the example, given below, to show how a box plot is plotted using Plotly express in the Plotly library:

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.box(df, y="total_bill")
fig.show()

**Q3. Using the tips dataset in the Plotly library, Plot a histogram for x= "sex" and y="total_bill" column in the tips dataset. Also, use the "smoker" column with the pattern_shape parameter and the "day" column with the color parameter.**

A histogram is a graphical representation of a distribution of data that displays the frequencies of observations within different intervals or bins. The x-axis of a histogram shows the range of values of the variable being measured, while the y-axis displays the frequency of observations falling within each range. The bars in a histogram are drawn such that their heights represent the frequency of observations within each bin or interval. Histograms are commonly used to visualize the distribution of data in various fields, such as statistics, finance, and data analysis. They can help identify patterns or trends in the data, such as the presence of outliers or a skew towards one end of the range.

Here is the example, given below, to show how a histogram is plotted using Plotly express in the Plotly library:

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="sex", y="total_bill", pattern_shape="smoker", color="day")
fig.show()

**Q4. Using the iris dataset in the Plotly library, Plot a scatter matrix plot, using the "species" column for the color parameter.**

**Note: Use "sepal_length", "sepal_width", "petal_length", "petal_width" columns only with the dimensions parameter.**

A scatter matrix plot is a type of data visualization that allows us to visualize the pairwise relationships between multiple variables at the same time. It is essentially a matrix of scatter plots, where each variable is plotted against every other variable in the dataset.

A scatter matrix plot is a useful tool for exploring multivariate datasets, as it can help identify patterns and correlations between variables that might not be apparent when looking at each variable individually. It can also be used to detect outliers or anomalies in the data.

The diagonal of the scatter matrix plot shows a histogram of each variable, providing a univariate view of the data. The off-diagonal plots show the scatter plot of two variables against each other, providing a bivariate view of the relationship between the two variables.

A scatter matrix plot is a powerful way to visualize and explore multivariate data, but it can become cluttered and hard to interpret if there are too many variables. Therefore, it is important to carefully select the variables to be included in the scatter matrix plot, based on the research question and the goals of the analysis.

Here is the example, given below, to show how a scatter matrix plot is plotted using Plotly express in the Plotly library:

In [None]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df,
    dimensions=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    color="species")
fig.show()

**Q5. What is Distplot? Using Plotly express, plot a distplot.**

**distplot** is a function in the Python data visualization library **Seaborn** that is used to plot a histogram and estimate the probability density function (PDF) of a continuous variable.

The **distplot** function takes a one-dimensional array or a Pandas Series object as input and produces a histogram of the data with a density curve overlaid on top of it. It also provides several options for customizing the appearance of the plot, such as changing the color of the histogram and density curve, adjusting the bin size of the histogram, and adding a rug plot to show individual data points.

Here is the example, given below, to show how a **distplot** is plotted using Plotly express in the Plotly library:

In [7]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", color="sex", marginal="rug",
                   hover_data=df.columns)
fig.show()