**What is Seaborn?**

Seaborn offers a variety of powerful tools for visualizing data, including scatter plots, line plots, bar plots, heat maps, and many more. It also provides support for advanced statistical analysis, such as regression analysis, distribution plots, and categorical plots.

**Installing Seaborn**

In [None]:
# install seaborn with pip
!pip install seaborn

In [None]:
# install seaborn with conda
!conda install seaborn

**Sample Datasets**

Seaborn provides several built-in datasets that we can use for data visualization and statistical analysis. These datasets are stored in pandas dataframes, making them easy to use with Seaborn's plotting functions.

In [None]:
import warnings

warnings.filterwarnings('ignore')

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
print(sns.get_dataset_names())

One of the most common datasets that’s also used in all the official examples of Seaborn is called `tips dataset`; it contains information about tips given in restaurants. Here's an example of loading and visualizing the Tips dataset in Seaborn:

In [None]:
# Load the Tips dataset

tips = sns.load_dataset("tips")

# Create a histogram of the total bill amounts

# sns.histplot(data=tips, x="total_bill")
tips.head()

Here is another example of loading the `exercise` dataset.

In [None]:
# Load the exercise dataset

exercise = sns.load_dataset("exercise")

# check the head

exercise.head()

**Seaborn Plot types**

Seaborn provides a wide range of plot types that can be used for data visualization and exploratory data analysis. Broadly speaking, any visualization can fall into one of the three categories. 

- Univariate – x only (contains only one axis of information)
- Bivariate – x and y (contains two axis of information)
- Trivariate – x, y, z (contains three axis of information)

**Seaborn scatter plots**

Scatter plots are used to visualize the relationship between two continuous variables. Each point on the plot represents a single data point, and the position of the point on the x and y-axis represents the values of the two variables. 

The plot can be customized with different colors and markers to help distinguish different groups of data points. In Seaborn, scatter plots can be created using the scatterplot() function. 

In [None]:
tips = sns.load_dataset("tips")

sns.scatterplot(x="total_bill", y="tip", data=tips)

This simple plot can be improved by customizing the `hue` and `size` parameters of the plot. Here’s how:

In [None]:
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

# customize the scatter plot

sns.scatterplot(x="total_bill", y="tip", hue="sex", size="size", sizes=(50, 200), data=tips)

# add labels and title

plt.xlabel("Total Bill")

plt.ylabel("Tip")

plt.title("Relationship between Total Bill and Tip")

# display the plot

plt.show()

**Seaborn line plots**

Line plots are used to visualize trends in data over time or other continuous variables. In a line plot, each data point is connected by a line, creating a smooth curve. In Seaborn, line plots can be created using the lineplot() function

In [None]:
fmri = sns.load_dataset("fmri")

sns.lineplot(x="timepoint", y="signal", data=fmri)

We can very easily customize this by using `event` and `region` columns from the dataset.

In [None]:
# customize the line plot

sns.lineplot(x="timepoint", y="signal", hue="event", style="region", markers=True, dashes=False, data=fmri)

# add labels and title

plt.xlabel("Timepoint")

plt.ylabel("Signal Intensity")

plt.title("Changes in Signal Intensity over Time")

# display the plot

plt.show()

**Seaborn bar plots**

Bar plots are used to visualize the relationship between a categorical variable and a continuous variable. In a bar plot, each bar represents the mean or median (or any aggregation) of the continuous variable for each category. In Seaborn, bar plots can be created using the barplot() function. 

In [None]:
titanic = sns.load_dataset("titanic")

sns.barplot(x="class", y="fare", data=titanic)

Let’s customize this plot by including `sex` column from the dataset.

In [None]:
# customize the bar plot

sns.barplot(x="class", y="fare", hue="sex", ci=None, palette="muted", data=titanic)

# add labels and title

plt.xlabel("Class")

plt.ylabel("Fare")

plt.title("Average Fare by Class and Gender on the Titanic")

# display the plot

plt.show()

**Seaborn histograms**

Histograms visualize the distribution of a continuous variable. In a histogram, the data is divided into bins and the height of each bin represents the frequency or count of data points within that bin. In Seaborn, histograms can be created using the histplot() function.

In [None]:
iris = sns.load_dataset("iris")

sns.histplot(x="petal_length", data=iris)

Customizing a histogram

In [None]:
# customize the histogram

sns.histplot(data=iris, x="petal_length", bins=20, kde=True, color="green")

# add labels and title

plt.xlabel("Petal Length (cm)")

plt.ylabel("Frequency")

plt.title("Distribution of Petal Lengths in Iris Flowers")

# display the plot

plt.show()

**Seaborn density plots**

Density plots, also known as kernel density plots, are a type of data visualization that display the distribution of a continuous variable. They are similar to histograms, but instead of representing the data as bars, density plots use a smooth curve to estimate the density of the data. In Seaborn, density plots can be created using the kdeplot() function. 

In [None]:
tips = sns.load_dataset("tips")

sns.kdeplot(data=tips, x="total_bill")

Let’s improve the plot by customizing it.

In [None]:
# Create a density plot of the "total_bill" column from the "tips" dataset

# We use the "hue" parameter to differentiate between "lunch" and "dinner" meal times

# We use the "fill" parameter to fill the area under the curve

# We adjust the "alpha" and "linewidth" parameters to make the plot more visually appealing

sns.kdeplot(data=tips, x="total_bill", hue="time", fill=True, alpha=0.6, linewidth=1.5)

# Add a title and labels to the plot using Matplotlib

plt.title("Density Plot of Total Bill by Meal Time")

plt.xlabel("Total Bill ($)")

plt.ylabel("Density")

# Show the plot

plt.show()

**Seaborn box plots**

Box plots are a type of visualization that shows the distribution of a dataset. They are commonly used to compare the distribution of one or more variables across different categories.

In [None]:
tips = sns.load_dataset("tips")

sns.boxplot(x="day", y="total_bill", data=tips)

Customize the box plot by including `time` column from the dataset.

In [None]:
# create a box plot of total bill by day and meal time, using the "hue" parameter to differentiate between lunch and dinner

# customize the color scheme using the "palette" parameter

# adjust the linewidth and fliersize parameters to make the plot more visually appealing

sns.boxplot(x="day", y="total_bill", hue="time", data=tips, palette="Set3", linewidth=1.5, fliersize=4)

# add a title, xlabel, and ylabel to the plot using Matplotlib functions

plt.title("Box Plot of Total Bill by Day and Meal Time")

plt.xlabel("Day of the Week")

plt.ylabel("Total Bill ($)")

# display the plot

plt.show()

**Seaborn violin plots**

A violin plot is a type of data visualization that combines aspects of both box plots and density plots. It displays a density estimate of the data, usually smoothed by a kernel density estimator, along with the interquartile range (IQR) and median in a box plot-like form. 

The width of the violin represents the density estimate, with wider parts indicating higher density, and the IQR and median are shown as a white dot and line within the violin.

In [None]:
# load the iris dataset from Seaborn

iris = sns.load_dataset("iris")

# create a violin plot of petal length by species

sns.violinplot(x="species", y="petal_length", data=iris)

# display the plot

plt.show()

**Seaborn heatmaps**

A heatmap is a graphical representation of data that uses colors to depict the value of a variable in a two-dimensional space. Heatmaps are commonly used to visualize the correlation between different variables in a dataset.

In [None]:
# Load the dataset

tips = sns.load_dataset('tips')

# Create a heatmap of the correlation between variables

corr = tips.corr()

sns.heatmap(corr)

# Show the plot

plt.show()

Another example of a heatmap using the `flights` dataset.

We pivot the data to make it suitable for heatmap representation using the .pivot() method. Then, we create a heatmap using the sns.heatmap() function and pass the pivoted flights variable as the argument. 

In [None]:
# Load the dataset

flights = sns.load_dataset('flights')

# Pivot the data

flights = flights.pivot('month', 'year', 'passengers')

# Create a heatmap

sns.heatmap(flights, cmap='Blues', annot=True, fmt='d')

# Set the title and axis labels

plt.title('Passengers per month')

plt.xlabel('Year')

plt.ylabel('Month')

# Show the plot

plt.show()

**Seaborn pair plots**

Pair plots are a type of visualization in which multiple pairwise scatter plots are displayed in a matrix format. Each scatter plot shows the relationship between two variables, while the diagonal plots show the distribution of the individual variables.

In [None]:
# Load iris dataset

iris = sns.load_dataset("iris")

# Create pair plot

sns.pairplot(data=iris)

# Show plot

plt.show()


We can customize this plot by using `hue` and `diag_kind` parameter.

In [None]:
# Create pair plot with custom settings

sns.pairplot(data=iris, hue="species", diag_kind="kde", palette="husl")

# Set title

plt.title("Iris Dataset Pair Plot")

# Show plot

plt.show()

**Seaborn joint plots**

Joint plot is a powerful visualization technique in seaborn that combines two different plots in one visualization: a scatter plot and a histogram. The scatter plot shows the relationship between two variables, while the histogram shows the distribution of each individual variable. This allows for a more comprehensive analysis of the data, as it shows the correlation between the two variables and their individual distributions.

Here is a simple example of building a seaborn joint plot using the iris dataset:

In [None]:
# load iris dataset

iris = sns.load_dataset("iris")

# plot a joint plot of sepal length and sepal width

sns.jointplot(x="sepal_length", y="sepal_width", data=iris)

# display the plot

plt.show()

**Seaborn facet grids**

FacetGrid is a powerful seaborn tool that allows you to visualize the distribution of one variable as well as the relationship between two variables, across levels of additional categorical variables. 

FacetGrid creates a grid of subplots based on the unique values in the categorical variable specified.

In [None]:
# load the tips dataset

tips = sns.load_dataset('tips')

# create a FacetGrid for day vs total_bill

g = sns.FacetGrid(tips, col="day")

# plot histogram for total_bill in each day

g.map(sns.histplot, "total_bill")

**Customizing Seaborn plots**

Seaborn is a powerful data visualization library that provides numerous ways to customize the appearance of plots. Customizing Seaborn plots is an essential part of creating meaningful and visually appealing visualizations. 

Here are some examples of customizing seaborn plots:

**Changing Color Palettes**

Here is  an example of how you can change the color palettes of your seaborn plots

In [None]:
# Load sample dataset

tips = sns.load_dataset("tips")

# Create a scatter plot with color palette

sns.scatterplot(x="total_bill", y="tip", hue="day", data=tips, palette="Set2")

# Customize plot

plt.title("Total Bill vs Tip")

plt.xlabel("Total Bill ($)")

plt.ylabel("Tip ($)")

plt.show()

**Adjusting Figure Size**

To adjust the figure size on your seaborn plots

In [None]:
# Load sample dataset

iris = sns.load_dataset("iris")

# Create a violin plot with adjusted figure size

plt.figure(figsize=(8,6))

sns.violinplot(x="species", y="petal_length", data=iris)

# Customize plot

plt.title("Petal Length Distribution by Species")

plt.xlabel("Species")

plt.ylabel("Petal Length (cm)")

plt.show()

**Adding Annotations**

Annotations can help to make your visualizations easier to read.

In [None]:
# Load sample dataset

diamonds = sns.load_dataset("diamonds")

# Create a scatter plot with annotations

sns.scatterplot(x="carat", y="price", data=diamonds)

# Add annotations

plt.text(1, 18000, "Large, Expensive Diamonds", fontsize=12, color="red")

plt.text(2.5, 5000, "Small, Affordable Diamonds", fontsize=12, color="blue")

# Customize plot

plt.title("Diamond Prices by Carat")

plt.xlabel("Carat (ct)")

plt.ylabel("Price ($)")

plt.show()

Reference Link for Seaborn Documentation:
    https://seaborn.pydata.org/

#  ! Great Job