
Visualize Distributions With Seaborn

Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used to visualize random distributions.

In [None]:
pip install seaborn

Distplots

Distplot stands for distribution plot, it takes as input an array and plots a curve corresponding to the distribution of points in the array.



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

We will be using: sns.distplot(arr, hist=False) to visualize random distributions in this tutorial.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot([0, 1, 2, 3, 4, 5])

plt.show()

Visualization of Binomial Distribution

Example

Parameters:

n: The number of trials.

p: The probability of success on each trial.

In [None]:
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist=True, kde=False)

plt.show()

random.binomial(n=10, p=0.5, size=1000):

This generates 1000 random samples from a binomial distribution.

n=10: The number of trials in each experiment.

p=0.5: The probability of success in each trial.

size=1000: The number of random samples to generate.

hist=True: This argument ensures that a histogram is plotted.

kde=False: This argument prevents the KDE from being overlaid on the histogram.

the kde argument controls whether a kernel density estimate (KDE) is plotted along with the histogram.

Difference Between Normal and Binomial Distribution

The main difference is that normal distribution is continous whereas binomial is discrete, but if there are enough data points it will be quite similar to normal distribution with certain loc and scale.

Example

In [None]:

from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot(random.normal(loc=50, scale=5, size=1000), hist=False, label='normal')
sns.distplot(random.binomial(n=100, p=0.5, size=1000), hist=False, label='binomial')

plt.show()

Normal Distribution:

 Use when dealing with continuous data that is approximately symmetric and bell-shaped.
Binomial Distribution:

 Use when counting the number of successes in a fixed number of independent trials with the same probability of success.

 Note: In some cases, the binomial distribution can be approximated by the normal distribution if the conditions are met (n is large and p is not too close to 0 or 1). This is known as the normal approximation to the binomial distribution.


Continuous Data:

Measurements: Height, weight, temperature, length, time, concentration, etc.
Scores: Test scores (e.g., IQ, SAT), exam scores, ratings (e.g., customer satisfaction).
Errors: Measurement errors, experimental errors, random noise.
Natural Phenomena: Many natural phenomena, such as the distribution of heights in a population.

Examples:

The distribution of heights in a group of people.

The distribution of IQ scores in a population.

The distribution of errors in a measurement process.

Binomial Distribution

Discrete Data (counts):

Yes/No Events: Coin flips (heads/tails), dice rolls (success/failure), quality control (defective/non-defective), survey responses (agree/disagree).
Counting Events: Number of successes in a fixed number of trials.

Here are the key concepts of Seaborn, along with examples and explanations:

1. Relational Plots
Relational plots are used to visualize relationships between two variables. Seaborn provides two primary functions: scatterplot() and lineplot().

Examples:

The number of heads in 100 coin flips.

The number of defective items in a batch of 100 products.

The number of people who prefer a certain brand of soda in a survey.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Load an example dataset
tips = sns.load_dataset("tips")

# Create a scatter plot
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()

Explanation: The scatterplot() function shows the relationship between the total_bill and tip columns in the tips dataset. Scatter plots are great for visualizing correlation or trends.

Example: Line Plot


Explanation: The lineplot() function connects data points with a line, making it useful for time series data or continuous variables.

In [None]:

sns.lineplot(x="total_bill", y="tip", data=tips)
plt.show()

2. Categorical Plots

Categorical plots are used to visualize data that can be divided into categories. Seaborn provides functions like barplot(), countplot(), boxplot(), violinplot(), and stripplot().


Explanation: The barplot() function shows the mean value of total_bill for each day in the tips dataset. Bars represent the mean, and error bars show the uncertainty.

In [None]:
sns.barplot(x="day", y="total_bill", data=tips)
plt.show()


Explanation: The boxplot() function shows the distribution of the total_bill for each day. Box plots display the median, quartiles, and potential outliers.

sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()

3. Distribution Plots

Distribution plots help to visualize the distribution of a single variable. Seaborn provides functions like distplot() (deprecated, replaced by displot()), kdeplot(), and histplot().

Explanation: The histplot() function shows the distribution of the total_bill column as a histogram. The kde=True argument adds a Kernel Density Estimate (KDE) curve to show the smooth distribution.

In [None]:
sns.histplot(tips["total_bill"], kde=True)
plt.show()


Explanation: The kdeplot() function shows the smoothed distribution of total_bill using a kernel density estimate.

In [None]:
sns.kdeplot(tips["total_bill"])
plt.show()

4. Matrix Plots
Matrix plots are used to display data in a matrix format. These plots are useful for visualizing correlation matrices or any other grid-based data.

Example: Heatmap

Explanation: The heatmap() function visualizes the correlation matrix, where annot=True adds the correlation coefficients in each cell. The cmap argument controls the color scheme.

In [None]:

corr = tips.corr()  # Compute the correlation matrix
sns.heatmap(corr, annot=True, cmap="coolwarm")
plt.show()

The pairplot() function is used to create a grid of scatter plots and histograms that show pairwise relationships between variables in a dataset.

Example: Pair Plot

Explanation: The pairplot() function creates a scatter plot for every pair of variables in the dataset and histograms for the diagonal plots. This is useful for quickly visualizing relationships between multiple variables.

In [None]:


sns.pairplot(tips)
plt.show()

Facet grids allow for creating subplots based on different levels of a categorical variable. FacetGrid and catplot() are useful for visualizing complex relationships.

Example: Facet Grid


Explanation: The FacetGrid function creates subplots based on combinations of the sex and smoker categories. Each subplot contains a scatter plot of total_bill vs. tip.

In [None]:

g = sns.FacetGrid(tips, col="sex", row="smoker")
g.map(sns.scatterplot, "total_bill", "tip")
plt.show()

7. Regression Plots
Seaborn provides functions to easily visualize linear regression fits: regplot() and lmplot().

Example: Regression Plot


Explanation: The regplot() function adds a linear regression line to the scatter plot of total_bill vs. tip, showing the trend and relationship.

In [None]:

sns.regplot(x="total_bill", y="tip", data=tips)
plt.show()

The jointplot() function creates a plot that combines a scatter plot with marginal histograms or KDE plots to show the relationship between two variables.

Example: Joint Plot

Explanation: The jointplot() function shows a scatter plot of total_bill vs. tip with histograms on the margins to display the distribution of each variable.

In [None]:

sns.jointplot(x="total_bill", y="tip", data=tips, kind="scatter")
plt.show()

The swarmplot() function shows all the points along a categorical axis, avoiding overlap by adjusting the positions of data points.

Example: Swarm Plot

Explanation: The swarmplot() function shows all data points, spread out to avoid overlap, allowing you to see the exact distribution of values.

In [None]:

sns.swarmplot(x="day", y="total_bill", data=tips)
plt.show()

Seaborn makes it easy to change the appearance of plots with functions like set_style() and set_palette().

Example: Setting Style

Explanation: The set_style() function changes the overall appearance of the plot. In this case, whitegrid adds a grid background.

In [None]:

sns.set_style("whitegrid")
sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()