# Introduction To Seaborn’s API

### Overview Of Seaborn’s Capabilities And Syntax
**Seaborn is a Python data visualization library that offers an interface to create a wide variety of statistical graphics easily. It’s built on top of Matplotlib and integrates well with pandas data structures. Here’s an overview of what Seaborn brings to the table:**

**Enhanced Visualization:**  Seaborn extends the capabilities of Matplotlib by providing a rich set of options for visualizing statistical data. It includes functions for creating complex plots like heatmaps, pair plots, and facet grids with minimal coding.

**Simplified Syntax:**  One of the key features of Seaborn is its high-level interface, which simplifies the process of creating elaborate visualizations. For example, you can create a histogram with KDE (Kernel Density Estimate) overlay with just one line of code.

**Advanced Plot Types:**  Seaborn specializes in statistical visualizations and offers plot types that are not natively available in Matplotlib, such as violin plots and swarm plots.

**Attractive Default Styles:**  Seaborn comes with aesthetically pleasing default styles that make the visualizations more appealing without needing extensive customization.

**Easy Integration with Pandas:**  Seaborn works seamlessly with pandas DataFrames, making it easier to visualize data directly from CSVs, Excel spreadsheets, or SQL databases.

**Customization and Styling:**  While offering beautiful defaults, Seaborn also provides extensive customization options for color palettes, themes, and plot styles to suit different requirements and preferences.



### Understanding The Difference Between Seaborn And Matplotlib


**While Seaborn Is Built On Matplotlib, There Are Distinct Differences Between The Two Libraries:**

**Purpose and Design Philosophy:**  Matplotlib is a versatile library designed for creating a wide array of graphs with a lot of room for customization. Seaborn, on the other hand, focuses more on providing a streamlined approach for statistical data visualization.

**Ease of Use:**  Seaborn simplifies the creation of many common plot types. For instance, generating a box plot or a violin plot is more straightforward in Seaborn than in Matplotlib.

**Customization:**  While Matplotlib excels in the flexibility of customizing plots, Seaborn offers more coherent and easy-to-set aesthetics and themes.

**Data Handling:**  Seaborn integrates more closely with pandas DataFrames, making it more convenient for data analysis tasks straight from DataFrame structures.

**Statistical Functionality:**  Seaborn provides built-in functions for creating complex statistical visualizations, whereas Matplotlib relies more on the user’s knowledge to create these from scratch.

Overall, Seaborn is often preferred for statistical data visualization for its simplicity and attractive visuals, while Matplotlib is the go-to for more customized and intricate graphic designs.



**✅ Knowledge Check:**  Understanding Seaborn’s API

**After exploring the introduction to Seaborn’s API, let’s test your understanding with a few questions:**

How does Seaborn’s purpose and design philosophy differ from Matplotlib?

What makes Seaborn’s syntax simpler and more user-friendly compared to Matplotlib, especially for statistical data visualization?

Name at least two plot types that Seaborn offers which are not natively available in Matplotlib.

How does Seaborn’s integration with pandas DataFrames enhance its usability for data analysis?

Discuss the customization options in Seaborn. How does it provide more attractive default styles compared to Matplotlib?



# Distribution Plots
In this section, we delve into creating and customizing distribution plots using Seaborn. These plots are fundamental for exploring and understanding the distribution of your data, whether it’s univariate (one variable) or bivariate (two variables).



### Creating Histograms
Histograms are one of the most common ways to visualize the distribution of a dataset. Seaborn makes creating and customizing histograms straightforward.

### Basic Histogram: To create a basic histogram, you can use the histplot function.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Load sample data
data = sns.load_dataset('tips')
sns.histplot(data['total_bill'])
plt.title('Histogram of Total Bill')
plt.show()


Customization: You can customize the number of bins, color, and more, to better understand the data distribution.

In [None]:
sns.histplot(data['total_bill'], bins=20, color='skyblue')
plt.title('Histogram of Total Bill with Custom Bins and Color')
plt.show()


### Kernel Density Estimation (KDE) Plots
KDE plots are useful for visualizing the probability density of a continuous variable. Seaborn’s kdeplot function allows you to create these plots easily.
**Creating a KDE Plot:**

In [None]:
sns.kdeplot(data['total_bill'], fill=True)
plt.title('KDE of Total Bill')
plt.show()


Customization: Adjust the bandwidth, color, or add a histogram to your KDE plot.

In [None]:
sns.kdeplot(data['total_bill'], bw_adjust=0.5, color='red', shade=True)
plt.title('Customized KDE of Total Bill')
plt.show()


### Exploring Univariate And Bivariate Distributions
Seaborn provides functions to explore both univariate and bivariate distributions.

**Univariate Distribution:**  Using displot, you can create a histogram with a KDE overlay for univariate distribution.

In [None]:
sns.displot(data['total_bill'], color='navy', kde=True, alpha=0.6)
plt.title('Univariate Distribution of Total Bill')
plt.show()

Bivariate Distribution: jointplot is used to visualize bivariate distributions, showing the relationship between two different variables.

In [None]:
sns.jointplot(x='total_bill', y='tip', data=data, kind='scatter')
plt.title('Bivariate Distribution of Total Bill and Tips', pad=70)
plt.show()


### Customizing Plot Styles And Colors
Seaborn allows extensive customization to suit your needs or preferences. You can change the overall theme, color palettes, and more.
**Changing Themes:**  Use set_style to change the background and grid style.

In [None]:
sns.set_style('darkgrid')
sns.histplot(data['total_bill'])
plt.show()

Color Palettes: Leverage Seaborn’s rich palette options to make your plots more visually appealing.

In [None]:
sns.set_palette('bright')
sns.kdeplot(data['total_bill'], fill=True)
plt.show()


✅ Knowledge Check

**Now that you have learned about distribution plots in Seaborn, let’s check your understanding:**
What is the primary purpose of a histogram in data visualization?

A. To show relationships between two variables.

B. To visualize the frequency distribution of a single variable.

C. To display the median and quartiles of a dataset.

How does a KDE plot differ from a histogram?

A. A KDE plot shows categorical data, while a histogram shows numerical data.

B. A KDE plot is used for bivariate distributions only, while a histogram is not.

C. A KDE plot provides a smooth estimation of a dataset’s probability density function, while a histogram shows discrete bins.

What does the shade/fill parameter do in a KDE plot?

A. It changes the color of the plot.

B. It fills the area under the KDE curve.

C. It adjusts the bandwidth of the KDE.


##🚀 Challenge
**Scenario:**  You are a data analyst working with a dataset that contains information on restaurant tips.

**Tasks:**

**1. Create a Histogram:**  Generate a histogram for the size of party column to understand its distribution.

**2. Overlay a KDE Plot:**  On the same histogram, overlay a KDE plot to provide a smooth estimation of the distribution.

**3. Customize Your Plot:**  Customize your plot by adding an appropriate title, changing the color of the histogram, and adjusting the number of bins.

**4. Create a Joint Plot with Tip Size:**  Generate a separate plot to show the effect of party size on the size of the tip.

Share your code and briefly explain your observations from the plots.



In [None]:
data.head()

In [None]:
sns.displot(data=data, x='size',kde=True)
plt.show()

In [None]:
sns.jointplot(data,x='size', y='tip')


### Categorical Data Visualization
In this section, we explore how Seaborn enables effective visualization of categorical data. Seaborn simplifies the creation of complex visualizations for categorical data analysis, such as bar plots, box plots, and violin plots. Understanding these plots and customizing them can provide deeper insights into your data.

### Generating Bar Plots, Box Plots, And Violin Plots
**1. Bar Plots:**  Bar plots are used for displaying the distribution of a categorical variable. They show the frequency or count of each category using bars.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
titanic = sns.load_dataset('titanic')

In [None]:
# Creating a bar plot
sns.barplot(x='class', y='survived', data=titanic)
plt.title('Survival Rate by Class')
plt.show()

In [None]:
titanic.head()

In [None]:
ti = titanic[titanic['class'] == 'First']

In [None]:
ti['survived'].sum()/len(ti)


**2. Box Plots:**  Box plots provide a visual summary of the central tendency, dispersion, and skewness of the data, and can also highlight outliers.

In [None]:
# Creating a box plot
sns.boxplot(x='class', y='age', data=titanic)
plt.title('Age Distribution by Class')
plt.show()


**3. Violin Plots:**  Violin plots combine features of box plots and density plots. They are useful for displaying the distribution of the data and its probability density.

In [None]:
# Creating a violin plot
sns.violinplot(x='class', y='age', data=titanic)
plt.title('Age Distribution by Class')
plt.show()


### Understanding The Nuances Of Each Plot
Bar Plots are best when you need to compare the quantity of categorical data across different categories.

Box Plots are ideal for visualizing the distribution of data, making it easy to see the median, quartiles, and outliers.

Violin Plots are useful when both the distribution of the data and the number of data points are of interest.


### Customizing Categorical Plots For Better Insights
**Seaborn allows extensive customization to make these plots more informative:**

**Customizing Bar Plots:**  Add hue parameter for nested grouping, change bar colors, or add annotations to show the value on each bar.

**Customizing Box Plots:**  Adjust whisker length, add swarm plots for individual data points, or change color palettes.

**Customizing Violin Plots:**  Split the violins to compare distributions across a second categorical variable, adjust the bandwidth of the KDE, or combine with swarm plots.

By customizing these plots, you can make your categorical data visualizations more intuitive and insightful, which is essential for effective data analysis.


### Scatter Plots And Joint Plots
In this section, we explore how to create and utilize scatter plots and joint plots using Seaborn, emphasizing their use in displaying relationships between two variables and visualizing their individual distributions.

### Scatter Plots With Regression Lines
Scatter plots are a fundamental tool in statistical analysis, allowing us to visualize the relationship between two continuous variables. Seaborn simplifies the process of adding regression lines to these plots, providing immediate insights into any linear relationships.

### Creating A Scatter Plot With A Regression Line
**To create a scatter plot with a regression line in Seaborn:**


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Loading example dataset
data = sns.load_dataset('tips')

# Creating the scatter plot
sns.lmplot(x='total_bill', y='tip', data=data)

# Adding plot title and labels
plt.title('Scatter Plot with Regression Line')
plt.xlabel('Total Bill')
plt.ylabel('Tip')

# Displaying the plot
plt.show()


This code will produce a scatter plot of total_bill against tip from the tips dataset, with a regression line fitted to the data.


### Using Joint Plots To Explore Relationships And Distributions
Joint plots are particularly useful for a comprehensive view of how two variables relate to each other and their individual distribution characteristics.

**To create a joint plot in Seaborn:**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Loading example dataset
data = sns.load_dataset('tips')

# Creating the joint plot
sns.jointplot(x='total_bill', y='tip', data=data, kind='reg')

# Adding a title
plt.subplots_adjust(top=0.9)
plt.suptitle('Joint Plot of Total Bill and Tip')

# Displaying the plot
plt.show()


In this example, kind=’reg’ adds a linear regression fit to the scatter plot, providing an immediate understanding of the relationship between total_bill and tip. The histograms on the top and right show the distribution of each variable, respectively.

### Insights From Scatter And Joint Plots
**Scatter Plots:**  Ideal for observing the relationship between two continuous variables. The addition of a regression line helps in understanding the nature of their relationship, be it linear or non-linear.

**Joint Plots:**  Extend the capabilities of scatter plots by including histograms, giving a more complete picture of the data. They are especially useful in exploratory data analysis to understand both relationships and distributions simultaneously.
By mastering scatter and joint plots in Seaborn, you enhance your ability to conduct detailed exploratory analysis, uncovering insights that might be missed with simpler visualizations.


### Pair Plots And Heatmaps
Utilizing Pair Plots For Exploring Pairwise Relationships In A Dataset
Pair plots are a comprehensive way to visualize the relationships between each pair of variables in a dataset. Seaborn’s pairplot function is a powerful tool for this purpose, providing a grid of Axes where each variable is shared across the y-axes in a single row and the x-axes in a single column.

**The following Python code demonstrates how to create a pair plot using Seaborn:**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Load a sample dataset
iris = sns.load_dataset('iris')

In [None]:
iris.head()

In [None]:
iris['species'].unique()

In [None]:
# Create a pair plot
sns.pairplot(iris, hue='species', markers=["o", "s", "D"])
plt.show()

sepal_length=4.5, sepal_width=3, petal_length=1.8, petal_width=0.7

In [None]:
def firstpredict(pl):
  if pl < 2:
    return 1
  elif pl < 5:
    return 2
  else:
    return 3

def secondpredict(pw):
  if pw < 0.8:
    return 1
  elif pw < 1.7:
    return 2
  else:
    return 3

to_species = {1: "setosa", 2: "versicolor", 3: "virginica"}

from_species = {"setosa": 1, "versicolor": 2, "virginica": 3}

res = firstpredict(4.5)*0.4 + secondpredict(1.75)*0.6
print(res)




In this example, iris is a pandas DataFrame. The hue parameter specifies which column in the DataFrame should be used for color encoding adding another dimension to the plot. This method allows for a quick and comprehensive exploration of the pairwise relationships in the dataset.


### Heatmaps
Generating Heatmaps For Visualizing Correlation Matrices And Complex Data Grids
Heatmaps are an effective visualization tool for representing complex data matrices, especially useful for displaying correlation matrices. In Seaborn, creating a heatmap can help identify relationships between multiple variables easily.

**A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. The value is in the range of -1 to 1. If two variables have high correlation, it indicates a strong relationship between them. Here’s a breakdown of what the correlation coefficients mean:**

**Coefficient Value Close to +1:**  Indicates a strong positive relationship between the two variables. As one variable increases, the other variable tends to also increase.

**Coefficient Value Close to -1:**  Indicates a strong negative relationship. As one variable increases, the other variable tends to decrease.

**Coefficient Value Close to 0:**  Suggests a weak or no linear relationship between the variables.

A correlation matrix is used to measure and display the degree of linear relationship between pairs of variables in a dataset. It helps in identifying how closely connected different variables are, aiding in understanding their relationships and dependencies.

**To visualize a correlation matrix with a heatmap in Seaborn, you can use the following Python code:**

In [None]:
# Calculating the correlation matrix
correlation_matrix = iris.corr()

In [None]:
# Create a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix Heatmap")
plt.show()

The corr() method calculates the correlation matrix of the DataFrame, and the heatmap function in Seaborn is used to visualize this matrix. The annot=True argument annotates the heatmap with the correlation coefficients, and cmap='coolwarm'specifies the color palette used for visualization.

Heatmaps are particularly useful for spotting patterns and understanding the strength and direction of the relationship between multiple variables, making them a key tool in exploratory data analysis.

### Facet Grids And Pair Grids
In this part of the lesson, we focus on two powerful aspects of Seaborn - Facet Grids and Pair Grids. These features allow for the creation of complex grid layouts that can be used to visualize data in detailed and structured ways.


### Facet Grids
Facet grids are an excellent way to explore and visualize data across multiple subplots. They allow you to create a grid of plots based on the values of certain variables, enabling a comprehensive comparison across different subsets of your dataset.

To create a facet grid, Seaborn’s FacetGrid function is used. This function needs a DataFrame and the names of the variables that will form the rows and columns of the grid.
Once the grid is set up, you can map different plot types to each facet.
Example of Creating a Facet Grid

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Load a sample dataset
data = sns.load_dataset("tips")

In [None]:
# Create a facet grid
g = sns.FacetGrid(data, col="time", row="smoker")
g.map(sns.scatterplot, "total_bill", "tip")

plt.show()


In this example, total_bill vs. tip scatter plots are created for different times of the day and smoker/non-smoker categories.

### Pair Grids
Pair grids are another powerful feature in Seaborn used for complex pairwise relationship analysis. They are especially useful when you want to visualize the relationships between multiple variables in a dataset.

PairGrid allows for much more customization compared to pairplot. You can control the kinds of plots to use in the diagonal, upper triangle, and lower triangle of the grid.
It is a great tool for exploratory data analysis, letting you see both the distribution of single variables and relationships between two variables.

Example of Creating a Custom Pair Grid

In [None]:
# Load the dataset
iris = sns.load_dataset("iris")

# Create a pair grid
g = sns.PairGrid(iris, hue="species")

In [None]:
# Map different plots to different sections of the grid
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(sns.histplot)

In [None]:
plt.show()


### Advanced Customization In Seaborn


#### Adjusting Plot Aesthetics For Publication-Quality Graphics
**Seaborn provides extensive customization options that allow you to create publication-quality graphics. Here are some ways to refine your plots:**

**1. Setting the Context:**  Seaborn allows you to set the context with sns.set_context(), which can be ‘paper’, ‘notebook’, ‘talk’, or ‘poster’. This scales various elements of the figure to be appropriate for different presentation settings.

In [None]:
sns.set_context('paper')  # Ideal for articles and reports

**2. Customizing Style:**  With sns.set_style(), you can customize the background and axes of your plots. Styles include ‘darkgrid’, ‘whitegrid’, ‘dark’, ‘white’, and ‘ticks’.

In [None]:
sns.set_style('whitegrid')

**3. Scaling Plot Elements:**  Use sns.set() to globally configure the scale of plot elements, which is useful for tweaking the appearance of your graphics.

In [None]:
sns.set(rc={"figure.figsize": (8, 4)})  # Custom figure size

**4. Fine-tuning with Despine:**  Remove axes spines using sns.despine() for a cleaner look, especially useful in cases with a white background.

In [None]:
sns.despine()


#### Using Seaborn’s Themes And Color Palettes
**Seaborn’s theming capabilities are one of its most powerful features:**

**1. Themes:**  Use sns.set_theme() to quickly apply a default theme to your plots.

sns.set_theme()

**2. Color Palettes:**  Seaborn has a rich variety of color palettes. You can use functions like sns.color_palette() to customize the color scheme of your plots. Palettes can be qualitative, sequential, or diverging.

sns.set_palette('pastel')

**3. Palette Customization:**  Customize your own palettes using Seaborn’s functions like sns.light_palette() or sns.dark_palette().

custom_palette = sns.light_palette("navy", reverse=True)
sns.set_palette(custom_palette)


#### Customizing And Controlling The Layout Of Complex Plots
**Managing complex plot layouts is simplified with Seaborn:**

**1. Subplots:**  Create complex layouts with plt.subplots() and pass the axes objects to Seaborn plotting functions.

fig, ax = plt.subplots(2, 2, figsize=(10, 8))
sns.histplot(data=data, x='var1', ax=ax[0, 0])
sns.boxplot(data=data, x='var2', ax=ax[0, 1])

**2. FacetGrid:**  Use sns.FacetGrid() for creating a grid of plots based on a categorical variable.

g = sns.FacetGrid(data, col='category', col_wrap=4)
g.map(sns.histplot, 'value')

**3. PairGrid:**  sns.PairGrid() is perfect for pairwise relationships in a dataset.

g = sns.PairGrid(data)

g.map_upper(sns.scatterplot)

g.map_lower(sns.kdeplot)

g.map_diag(sns.histplot)

By mastering these advanced customization techniques, you can elevate your data visualizations to be both visually appealing and highly informative, suitable for professional publications and presentations.


### Integrating Seaborn With Pandas


#### Leveraging Pandas Data Structures For Seaborn Plots
Seaborn is designed to work seamlessly with pandas DataFrames, allowing for straightforward integration of complex datasets into your visualizations. This compatibility is a significant advantage as it enables the direct use of DataFrame columns for plotting various types of visualizations.



#### Using DataFrame Columns For Plotting
**In Seaborn, you can directly reference pandas DataFrame columns to define the axes of your plots. This integration simplifies the process of creating plots from DataFrame data. For example:**

In [None]:
import seaborn as sns
import pandas as pd

# Load a sample dataset
data = pd.read_csv('your_dataset.csv')

# Simple scatter plot using DataFrame columns
sns.scatterplot(x='column_x', y='column_y', data=data)


Here, ‘column_x’ and ‘column_y’ are column names in the data DataFrame, and Seaborn uses these columns to plot the data points.
Exploring Data with Seaborn’s Advanced Plots

Seaborn’s advanced plotting functions, such as pairplot and jointplot, are particularly useful with pandas DataFrames. They provide insights into the relationships between multiple columns at once.

**For instance, pairplot can be used to visualize pairwise relationships in a dataset:**

In [None]:
# Pairplot using the entire DataFrame
sns.pairplot(data)

This function creates a grid of Axes such that each variable in data is shared across the y-axes across a single row and the x-axes across a single column.



#### Using Seaborn Effectively With DataFrame Operations
Combining pandas’ powerful data manipulation capabilities with Seaborn’s visualization tools can lead to more insightful analyses.
Filtering and Plotting

**You can filter or manipulate your DataFrame with pandas operations and then plot the resulting data. For example, if you want to visualize data that meets certain criteria:**

In [None]:
# Filtering the data
filtered_data = data[data['column'] > threshold_value]

# Visualizing the filtered data
sns.histplot(filtered_data['relevant_column'])

Grouping and Aggregation for Visualization

**Pandas’ groupby and aggregation functions can be used to preprocess data for grouped visualizations in Seaborn:**

In [None]:
# Grouping and aggregating data
grouped_data = data.groupby('grouping_column').mean()

# Plotting the aggregated data
sns.barplot(x=grouped_data.index, y='aggregated_column', data=grouped_data)


In this case, the data is first grouped by ‘grouping_column’, and then the mean of each group is calculated. The resulting aggregated data is then visualized using a bar plot.



**🚀 Challenge:**  Analyzing Sales Data With Seaborn And Pandas
Objective Utilize Seaborn and pandas to analyze and visualize a real dataset named sales_data.csv containing sales information of a fictional company.

**Task Overview:**

**The dataset includes columns date, product, category, price, quantity, and revenue. You can find the dataset on the [GitHub repository of intern2grow](https:** //github.com/intern2grow/sales-data-analysis).

Challenge Steps

**1. Load and Inspect Data:**

- Use pandas to load sales_data.csv.

- Inspect the data to understand its structure.

**2. Data Preprocessing:**

- Handle missing values and convert data types as needed.

**3. Sales Trends Over Time:**

- Create a line plot showing the trend of total revenue over time.

**4. Product Category Comparison:**

- Generate a bar plot to compare the total revenue across different categories.

**5. Relationship Between Price and Quantity Sold:**

- Use a scatter plot to explore the relationship between price and quantity.

**6. Product Performance Analysis:**

- Employ a pairplot to investigate pairwise relationships involving price, quantity, and revenue.

**7. Aggregated Category Insights:**

- Aggregate data by category and calculate average price.

- Visualize this using a suitable Seaborn plot.