### Plotting Distributions with Seaborn

Seaborn's strength is in visualizing statistical calculations. Seaborn includes several plots that allow you to graph univariate distribution, including KDE plots, box plots, and violin plots. Explore the Jupyter notebook below to get an understanding of how each plot works.

First, we'll read in four datasets. In order to plot them in Seaborn, we'll combine them using NumPy's .concatenate() function into a Pandas DataFrame.

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

# Take in the data from the CSVs as NumPy arrays:
set_one = np.genfromtxt(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\07b_test_dataset1.csv", delimiter=",")
set_two = np.genfromtxt(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\07b_test_dataset2.csv", delimiter=",")
set_three = np.genfromtxt(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\07b_test_dataset3.csv", delimiter=",")
set_four = np.genfromtxt(r"C:\Users\andrew.morris\Documents\GitHub\murry_code\Codecademy Lesson Notes\test data\07b_test_dataset4.csv", delimiter=",")

# Creating a Pandas DataFrame:
n=500
df = pd.DataFrame({
    "label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
    "value": np.concatenate([set_one, set_two, set_three, set_four])
})


Explore the dataset using print(df)

In [None]:
print(df)

#### Bar Charts (they can hide information)


The following code converts the dataframe created earlier into a barchart:

In [None]:
sns.barplot(data = df,
            x = 'label',
            y = 'value')
plt.show()      

We can get a lot of information from these bar charts, but we can’t get everything. For example, what are the minimum and maximum values of these datasets? How spread out is this data?

While we may not see this information in our bar chart, these differences might be significant and worth understanding better.

#### KDE Plots - What are they?

Bar plots can tell us what the mean of our dataset is, but they don’t give us any hints as to the distribution of the dataset values. For all we know, the data could be clustered around the mean or spread out evenly across the entire range.

To find out more about each of these datasets, we’ll need to examine their distributions. A common way of doing so is by plotting the data as a histogram, but histograms have their drawback as well.

Seaborn offers another option for graphing distributions: KDE Plots.

KDE stands for Kernel Density Estimator. A KDE plot gives us the sense of a univariate as a curve. A univariate dataset only has one variable and is also referred to as being one-dimensional, as opposed to bivariate or two-dimensional datasets which have two variables.

KDE plots are preferable to histograms because depending on how you group the data into bins and the width of the bins, you can draw wildly different conclusions about the shape of the data. Using a KDE plot can mitigate these issues, because they smooth the datasets, allow us to generalize over the shape of our data, and aren’t beholden to specific data points.

#### KDE Plots in Seaborn

To plot a KDE in Seaborn, we use the method sns.kdeplot().

A KDE plot takes the following arguments:

* data - the univariate dataset being visualized, like a Pandas DataFrame, Python list, or NumPy array
* shade - a boolean that determines whether or not the space underneath the curve is shaded

##### example code:
```python
sns.kdeplot(dataset1, shade=True)
sns.kdeplot(dataset2, shade=True)
sns.kdeplot(dataset3, shade=True)
plt.legend()
plt.show()
```

Let’s examine the KDE plots of our three datasets:

In [None]:
# using the datasets loading in earlier

sns.kdeplot(set_one, shade=True)
sns.kdeplot(set_two, shade=True)
sns.kdeplot(set_three, shade=True)
sns.kdeplot(set_four, shade=True)
plt.show()


#### Box Plots - What are they?
While a KDE plot can tell us about the shape of the data, it’s cumbersome to compare multiple KDE plots at once. They also can’t tell us other statistical information, like the values of outliers.

The box plot (also known as a box-and-whisker plot) can’t tell us about how our dataset is distributed, like a KDE plot. But it shows us the range of our dataset, gives us an idea about where a significant portion of our data lies, and whether or not any outliers are present.

Let’s examine how we interpret a box plot:

* The box represents the interquartile range
* The line in the middle of the box is the median
* The end lines are the first and third quartiles
* The diamonds show outliers

#### Box Plots In Seaborn
One advantage of the box plot over the KDE plot is that in Seaborn, it is easy to plot multiples and compare distributions.

The box plot does a good job of showing certain differences, the different between Dataset 1 and Dataset 2; however, it does not show that Dataset 3 is bimodal.

To plot a box plot in Seaborn, we use the method sns.boxplot().

A box plot takes the following arguments:

* data - the dataset we’re plotting, like a DataFrame, list, or an array
* x - a one-dimensional set of values, like a Series, list, or array
* y - a second set of one-dimensional data

##### example code:
```python
sns.boxplot(data=df, x='label', y='value')
plt.show()
```

If you use a Pandas Series for the x and y values, the Series will also generate the axis labels. For example, if you use the value Series as your y value data, Seaborn will automatically apply that name as the y-axis label.

In [None]:
# using the datasets from earlier:

sns.boxplot(data = df, x = 'label', y = 'value')
plt.show()


#### Violin Plots

Violin Plots are a powerful graphing tool that allows you to compare multiple distributions at once.

To plot a violin plot in Seaborn, use the method `sns.violinplot()`.

There are several options for passing in relevant data to the x and y parameters:

* data - the dataset that we’re plotting, such as a list, DataFrame, or array
* x, y, and hue - a one-dimensional set of data, such as a Series, list, or array
* any of the parameters to the function `sns.boxplot()`

example code:

```python
sns.violinplot(data=df, x="label", y="value")
plt.show()
```

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

# Setting styles:
sns.set_style("darkgrid")
sns.set_palette("pastel")

# Add your code below:
sns.violinplot(data=df, x="label", y="value")
plt.show()