* <a href="https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/DVwPY_S4/4-DVwPy_S2_Seaborn.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a><br/>
* <a href="https://nbviewer.org/github/4dsolutions/clarusway_data_analysis/blob/main/DVwPY_S4/4-DVwPy_S2_Seaborn.ipynb"><img align="left" src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" alt="Open in nbviewer" title="Open and View using nbviewer"></a>

___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

![image.png](https://i.ibb.co/hg2Kd1X/seabornlogo.png)

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.

Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them

* [Seaborn Home Page](https://seaborn.pydata.org/)
* [Seaborn Intro](https://seaborn.pydata.org/introduction.html)

In [None]:
from IPython.display import YouTubeVideo

In [None]:
YouTubeVideo("GcXcSZ0gQps")

In [None]:
YouTubeVideo('6GUZXDef2U0')

In [None]:
YouTubeVideo('Pkvdc2Z6eBg')

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">CONTENT</p>

* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#0)
* [COUNTPLOT](#1)
    * ["hue" Parameter](#1.1)
    * [Extra Information](#1.2)
* [BARPLOT](#2)
    * ["ci" Parameter](#2.1)
    * ["estimator" Parameter](#2.2)    
* [BOXPLOT](#3)
    * ["width" Parameter](#3.1)
    * [Optional Boxplot Examples](#3.2)
    * ["orient" Parameter](#3.3)
    * [Changing x & y](#3.4)
* [VIOLINPLOT](#4)
    * [Optionel Violinplot Example](#4.1)
    * ["split" Parameter](#4.2)
    * ["inner" Parameter](#4.3)
    * ["bandwidth" Parameter](#4.4)
    * [Changing x & y](#4.5)
* [SWARMPLOT](#5) 
    * [Optional Swarmplot Example](#5.1)
    * ["dodge" Parameter](#5.2)
* [BOXENPLOT (LETTER-VALUE PLOT)](#6)        
* [LINEPLOT](#7)
* [THE END OF THE SEABORN SESSION 02](#8)    

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline
import scipy
import seaborn as sns

# import warnings
# warnings.filterwarnings('ignore') 

In [None]:
print(sns.get_dataset_names())

In [None]:
tips = sns.load_dataset("tips")
tips.head()

In [None]:
tips.info()

In [None]:
# tips.describe().T
tips.describe(include=np.number).T
# df.describe: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

In [None]:
tips.describe(include='category').T

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:CENTER; border-radius:10px 10px;">COUNTPLOT</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A simple plot, it merely shows the total count of rows per category. 

In [None]:
tips['day'].value_counts()

In [None]:
plt.figure(figsize=(6, 4))
sns.countplot(x='day', data=tips); # palette='bone'); # in case of penguins :-D

For contrast, let's generate a similar bar chart by talking directly to matplotlib.

Resources:

* [About Color](https://matplotlib.org/stable/gallery/color/colormap_reference.html)
* [Pyplot Bar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html)

In [None]:
s = tips['day'].value_counts()
s = s.reindex(index = ['Thur', 'Fri', 'Sat', 'Sun'])
plt.bar(x=s.index, height=s.values, color=['tab:blue', 'tab:orange', 'tab:green', 'tab:red']);

**How to annotate?**

[matplotlib.axes.Axes.annotate](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.annotate.html)

Now that we have appreciated using Seaborn high level, lets combine it with also using the matplotlib underbelly.

In [None]:
fig, ax = plt.subplots()

sns.countplot(x='day', data=tips, ax=ax)  # talk to the ax
# ax = sns.countplot(x='day', data=tips)
ax.set_ylim(0,100)
for p in ax.patches:  # the rectangles
    ax.annotate( "%d" % (p.get_height()), (p.get_x()+0.3, + p.get_height() + 1));

In [None]:
list(ax.patches)

In [None]:
# fig, ax = plt.subplots()
sns.countplot(x='day', data=tips, palette="tab10");

In [None]:
sns.countplot(x='day', data=tips, palette="tab10")
ax = plt.gca()      # another good one to know about: get current axes
ax.set_ylim(0,100)  # try this above too
for p in ax.patches:
    ax.annotate("{:^2.0f}".format(p.get_height()), (p.get_x()+0.34, p.get_height()+1));

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"hue" Parameter</p>

<a id="1.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
sns.set()

In [None]:
day = tips.groupby("day")["day"].count() 
day

In [None]:
day = tips.groupby("day").count().index
day

The code below reminds us of everything we know how to do in pandas and matplotlib.

In [None]:
# Optional hard way to do it (REMEMBER matplotlib Session 02)

day = tips.groupby("day").count().index   
day_of_total_bill= tips.groupby("day")["total_bill"].sum()
day_tip = tips.groupby("day")["tip"].sum()

fig, ax = plt.subplots(figsize=(4, 5))

p = np.arange(len(day))
width = 0.20

ax.bar(p - width/2, day_of_total_bill, width, label="total_bill")
ax.bar(p + width/2, day_tip,           width, label="tip")

ax.set_xticks(p)
ax.set_xticklabels(day)

plt.legend();
# plt.show()

In [None]:
groups = tips.groupby("day").agg('count')
sns.barplot(x=groups.index, y="total_bill", data=groups);
plt.ylabel("Count");

In [None]:
fig, ax = plt.subplots()

ax = sns.countplot(x='day', data=tips, hue="sex")
ax.set_ylim(0, 65) # more 'head room'

for p in ax.patches:
    ax.annotate(int((p.get_height())), (p.get_x()+0.15, p.get_height()+1));

In [None]:
fig, ax = plt.subplots()

sns.countplot(x='day', data=tips, hue="sex", ax=ax)

for p in ax.patches:
    ax.annotate((p.get_height()), (p.get_x()+0.1, p.get_height()+1));

#### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Extra Information</p>

<a id="1.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
# extra information

fig, ax = plt.subplots()

ax = sns.countplot(x='day', data=tips, hue="sex")

for p in ax.patches:
    ax.annotate((p.get_height()), (p.get_x()+0.1, p.get_height()+1))
    # print(tips.day.count()) # -- total records, could be outside loop
    ax.text(p.get_x()+0.05, p.get_height()-3, str(round(p.get_height()/tips.day.count(), 2))) # yüzdelik ekledik

In [None]:
59 / 244

In [None]:
mpg = sns.load_dataset('mpg')

mpg.head()

In [None]:
sns.countplot(x='cylinders', data=mpg);

In [None]:
sns.countplot(x='model_year', data=mpg);

In [None]:
sns.countplot(x='origin', data=mpg);

In [None]:
sns.countplot(x='model_year', hue='origin', data=mpg);

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:CENTER; border-radius:10px 10px;">BARPLOT</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

So far we've seen the y axis default to a count (similar to a .groupby(x_axis).count() call in pandas). We can expand our visualizations by specifying a specific continuous feature for the y-axis. Keep in mind, you should be careful with these plots, as they may imply a relationship continuity along the y axis where there is none

[sns.barplot](https://seaborn.pydata.org/generated/seaborn.barplot.html)

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips);  # defaults to estimator = mean

In [None]:
sns.barplot(x="sex", y="tip", data=tips);

In [None]:
sns.barplot(x="day", y="total_bill", data=tips, hue="sex", errorbar=None);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"ci" Parameter</p>

<a id="2.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's Talk About Bootstrapping:

* [Towards Data Science](https://towardsdatascience.com/why-bootstrap-sampling-is-the-badass-tool-of-probabilistic-thinking-5d8c7343fb67)
* [Coding Disciple](https://codingdisciple.com/bootstrap-hypothesis-testing.html)

Bootstrapping involves resampling the same N data points, with replacement, meaning duplications are OK.  Because of replacement, each sample is likely a little different and so the mean is as well.  Compute a mean thousands of times.  What range of mean values encompasses 95% of all of them?  This is your confidance interval.  

What if you want a margin of error around something other than the mean, such as standard deviation?  Choose a different estimator in that case.  Confidance interval, as a term, relates to mean values.

In [None]:
YouTubeVideo("655X9eZGxls")

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips, errorbar='sd'); # default :95 , ci=coinfidence interval - güven aralığı

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips, errorbar='ci'); # default :95 , ci=coinfidence interval - güven aralığı

**Correlation Between Height and Weight**<br>
At the beginning of the Spring 2017 semester a sample of World Campus students were surveyed and asked for their height and weight. In the sample, Pearson's r = 0.487. A 95% confidence interval was computed of [0.410, 0.559].

The correct interpretation of this confidence interval is that we are 95% confident that the correlation between height and weight in the population of all World Campus students is between 0.410 and 0.559.

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips, errorbar=('ci', 95));

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips, errorbar=('ci', 60));

In [None]:
sns.barplot(x="sex", y="total_bill", data=tips, errorbar='sd', hue="smoker" );

In [None]:
sns.barplot(x='day', y="total_bill", data=tips);

In [None]:
sns.barplot(x='day', y="total_bill", data=tips, hue='sex'); 

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"estimator" Parameter</p>

<a id="2.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

"estimator" params (np.mean, np.median, np.sum, np,max, np.min, np.count_nonzero)

In [None]:
sns.barplot(x='day', y="total_bill", data=tips, estimator=np.sum); 

In [None]:
sns.barplot(x='day', y="total_bill", data=tips, hue='sex', estimator=np.sum); 

In [None]:
tips.groupby(["day", "sex"])["total_bill"].mean()

In [None]:
plt.figure(figsize=(10, 8))

sns.barplot(x='day', y="total_bill", data=tips, hue='sex'); 

**diferences betwen barplot and count plot**

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(12, 4))

sns.barplot(x="day", y="total_bill", data=tips, ax = axs[0])
sns.countplot(x="day", data=tips, ax = axs[1])

plt.tight_layout()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(12, 4))

sns.barplot(x="day", y="total_bill", data=tips, ax = ax[0], estimator=np.sum)   # estimator param (np.mean, np.median, np.sum, np,max, np.min, np.count_nonzero)
sns.countplot(x="day", data=tips, ax = ax[1])

plt.tight_layout()

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:CENTER; border-radius:10px 10px;">BOXPLOT</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

[SOURCE 01](https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-6th-box-whisker-plots/v/constructing-a-box-and-whisker-plot) & [SOURCE 02](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51)

As described in the videos, a boxplot display distribution through the use of quartiles and an IQR for outliers.

Lets remember about [plt.legend](   ).

* [Adjusting Legend Location](https://youtu.be/CSY-sMPAHzQ)
* [Another Tutorial](https://youtu.be/lnfGvdCqGYs)

In [None]:
sns.boxplot(x='day', y="total_bill", data=tips);

In [None]:
# Orientation

plt.figure(figsize=(14, 5))

sns.boxplot(y='day', x="total_bill", data=tips, hue="sex", orient='h');

In [None]:
sns.boxplot(x='day', y="total_bill", data=tips, hue="sex")

# plt.legend(loc='best');
plt.legend(bbox_to_anchor=(1.1, .1), loc=3);

In [None]:
sns.boxplot(x='day', y="total_bill", data=tips, hue="sex")

plt.legend(bbox_to_anchor=(1.05, 1), loc=2);

In [None]:
plt.figure(figsize=(14, 5))

sns.boxplot(x='day', y="total_bill", data=tips, hue="sex");

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">"width" Parameter</p>

<a id="3.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
# width

plt.figure(figsize=(10, 5))

sns.boxplot(x='day', y='total_bill', data=tips, hue='sex', width=0.3);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Optional Boxplot Examples</p>

<a id="3.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
df = pd.read_csv("StudentsPerformance.csv")
df.head()

In [None]:
df.info()

In [None]:
sns.boxplot(y='math score', data=df);

In [None]:
sns.boxplot(x='parental level of education', 
            y='math score', data=df);

In [None]:
sns.boxplot(y='parental level of education', 
            x='math score', data=df, orient='h');

In [None]:
plt.figure(figsize=(16, 6))

sns.boxplot(x='parental level of education', 
            y='math score', data=df);

In [None]:
plt.figure(figsize=(12, 6))

sns.boxplot(x='parental level of education', 
            y='math score', data=df, hue='gender');

In [None]:
plt.figure(figsize=(12, 6))

sns.boxplot(x='parental level of education', 
            y='math score', data=df, hue='gender')

# Optional move the legend outside
plt.legend(bbox_to_anchor=(1.05, 1), 
           loc=2, borderaxespad=0.);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"orient" Parameter</p>

<a id="3.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
sns.boxplot(x='math score', 
            y='parental level of education', 
            data=df, 
            orient='h');

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Changing x & y</p>

<a id="3.4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
plt.figure(figsize=(12, 6))

sns.boxplot(x='parental level of education', 
            y='math score', 
            data=df, 
            hue='gender', 
            width=1.0);

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">VIOLINPLOT</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.

In [None]:
sns.violinplot(x='day', y="total_bill", data=tips);

![](https://www.researchgate.net/profile/Jonathan-Chambers-3/publication/329035470/figure/fig15/AS:695026912870412@1542718737802/Explanation-of-Violin-plot-Densities-are-estimated-using-a-Gaussian-kernel-density.png)

In [None]:
sns.violinplot(x='day', y="total_bill", data=tips, hue="smoker");

In [None]:
# oriantation,  just change x and y, you dont need orient param 

plt.figure(figsize=(9, 6))

sns.violinplot(y='day', x="total_bill", data=tips, hue="sex");   

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Optional Violinplot Example</p>

<a id="4.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df);

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df, hue='gender');

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"split" Parameter</p>

<a id="4.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

When using hue nesting with a variable that takes two levels, setting split to True will draw half of a violin for each level. This can make it easier to directly compare the distributions.

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='day', y='total_bill', 
               data=tips, hue='sex', split=True);

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df, hue='gender', 
               split=True);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"inner" Parameter</p>

<a id="4.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

default: inner='box'

Representation of the datapoints in the violin interior. If box, draw a miniature boxplot. If quartiles, draw the quartiles of the distribution. If point or stick, show each underlying datapoint. Using None will draw unadorned violins.

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='day', y="total_bill", 
               data=tips, hue="sex", inner=None);

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='day', y="total_bill", 
               data=tips, hue="sex", inner="quartile");

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df, inner=None);

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df, inner='quartile');

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df, inner='stick');

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"bandwidth" Parameter</p>

<a id="4.4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Similar to bandwidth argument for kdeplot

In [None]:
## bandwidth
## Similar to bandwidth argument for kdeplot

plt.figure(figsize=(12, 6))

sns.violinplot(x='day', y="total_bill", data=tips);

In [None]:
plt.figure(figsize=(12, 6))

sns.violinplot(x='parental level of education', 
               y='math score', data=df);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Changing x & y</p>

<a id="4.5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
# Simply switch the continuous variable to y and the categorical to x

sns.violinplot(x='math score', 
               y='parental level of education', 
               data=df, orient="h");

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">SWARMPLOT</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
sns.swarmplot(x='total_bill', data=tips);

In [None]:
sns.swarmplot(x='total_bill', data=tips, size=7);

In [None]:
sns.swarmplot(x='total_bill', y="smoker", data=tips, size=7);

In [None]:
sns.swarmplot(x='total_bill', y="smoker", 
              hue="sex", data=tips, size=7);

In [None]:
sns.swarmplot(x='total_bill', y="smoker", 
              hue="sex", dodge=True, data=tips, size=7);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">Optional Swarmplot Example</p>

<a id="5.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
# warning -- default size too big
sns.swarmplot(x='math score', data=df ); # default size=5
# sns.stripplot(x='math score', data=df ); # default size=5

In [None]:
plt.figure(figsize=(12, 6))
sns.swarmplot(x='math score', data=df ); # default size=5

In [None]:
sns.swarmplot(x='math score', data=df, size=3);

In [None]:
sns.swarmplot(x='math score', y='race/ethnicity', 
              data=df, size=3);

In [None]:
sns.stripplot(x='race/ethnicity', y='math score', 
              data=df, size=2);

In [None]:
plt.figure(figsize=(15, 6))

sns.swarmplot(x='race/ethnicity', y='math score', 
              data=df, hue='gender', dodge=True);

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:left; border-radius:10px 10px;">"dodge" Parameter</p>

<a id="5.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
plt.figure(figsize=(12, 6))

sns.swarmplot(x='race/ethnicity', y='math score', 
              data=df, hue='gender', dodge=True);

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:CENTER; border-radius:10px 10px;">BOXENPLOT (LETTER-VALUE PLOT)</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Official Paper on this plot: [Official Paper](https://vita.had.co.nz/papers/letter-value-plot.html)

This style of plot was originally named a “letter value” plot because it shows a large number of quantiles that are defined as “letter values”. It is similar to a box plot in plotting a nonparametric representation of a distribution in which all features correspond to actual observations. By plotting more quantiles, it provides more information about the shape of the distribution, particularly in the tails.

In [None]:
sns.boxplot(x='math score', y='race/ethnicity', data=df);

In [None]:
sns.boxenplot(x='math score', y='race/ethnicity', data=df);

In [None]:
sns.boxenplot(x='race/ethnicity', y='math score', data=df);

In [None]:
plt.figure(figsize=(12, 6))

sns.boxenplot(x='race/ethnicity', 
              y='math score', data=df, hue='gender');

The box plot shows the median as the centerline (50th percentile), then the 25th and 75th percentile as the box boundaries. Then the IQR method is used to calculate outlier boundaries (1.5 * IQR + Q3 for the upper boundary, for example). Q3 is the 3rd quartile, or 75th percentile of the data (75% of the data is below this value). Outliers outside of the outlier whiskers are shown as distinct points.

Boxenplots (actually called letter-value plots in the original paper and in the lvplot R package) show the distribution differently and are better for bigger datasets. Classic boxplots can have too many outliers and don't show as much information about the distribution. Letter-value plots (boxenplots) start with the median (Q2, 50th percentile) as the centerline. Each successive level outward contains half of the remaining data. So the first two sections out from the centerline contain 50% of the data. After that, the next two sections contain 25% of the data. This continues until we are at the outlier level. Each level out is shaded lighter. There are 4 methods for calculating outliers (described in the paper and available in seaborn). The default is to end up with around 5-8 outliers in each tail.

![Capture13.PNG](https://i.ibb.co/YfpTDg9/Capture13.png)

[SOURCE 01](https://vita.had.co.nz/papers/letter-value-plot.html) & [SOURCE 02](https://stackoverflow.com/questions/52403381/how-boxen-plot-is-different-from-box-plot)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:CENTER; border-radius:10px 10px;">LINEPLOT</p>

<a id="7"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

[Line Plot](https://seaborn.pydata.org/generated/seaborn.lineplot.html)

"By default, the plot aggregates over multiple y values at each value of x and shows an estimate of the central tendency and a confidence interval for that estimate."

So given years within months (e.g. May: all years averaged)

In [None]:
print(sns.get_dataset_names())

flights = sns.load_dataset("flights")
flights

In [None]:
sns.lineplot(data=flights.groupby(["month"]).mean(), x="month", y="passengers");

In [None]:
sns.scatterplot(data=flights.groupby(["month"]).mean(), x="month", y="passengers");

In [None]:
plt.figure(figsize=(20, 6))

sns.lineplot(x='year', y='passengers', data=flights);

In [None]:
sns.scatterplot(x='year', y='passengers', data=flights);

In [None]:
f_sum = flights.groupby(["year", "month"]).sum()

plt.figure(figsize=(20, 6))

sns.lineplot(y=f_sum.passengers , 
             x=f_sum.reset_index().index);

In [None]:
f_sum

In [None]:
f_sum1 = flights.groupby(["year", "month"]).sum().reset_index()
f_sum1

In [None]:
plt.figure(figsize=(20, 6))

sns.lineplot(x=f_sum1.year , y=f_sum1.passengers);

In [None]:
sns.scatterplot(x=f_sum1.year , y=f_sum1.passengers);

In [None]:
plt.figure(figsize=(20, 6))

sns.lineplot(y=f_sum1.passengers , x=f_sum1["month"], errorbar='ci');

In [None]:
plt.figure(figsize=(20, 6))
sns.scatterplot(y=f_sum1.passengers , x=f_sum1["month"]);

In [None]:
plt.figure(figsize=(20, 6))

sns.lineplot(x='year', y='passengers', 
             data=flights[flights.month=="May"]);

In [None]:
flights

In [None]:
flights_wide = flights.pivot("year", "month", "passengers")
flights_wide

In [None]:
sns.lineplot(data=flights_wide)
plt.legend(loc=(1.04, 0));

In [None]:
sns.lineplot(data=flights, x="year", y="passengers", 
             hue="month")
plt.legend(loc=(1.04, 0));

In [None]:
flights_wide.T

In [None]:
plt.figure(figsize=(16, 6))
sns.lineplot(data=flights_wide.T)
plt.legend(loc=(1.04, 0));

In [None]:
plt.figure(figsize=(16, 6))
sns.lineplot(data=flights_wide.T)
plt.legend(loc=(1.04, 0));

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">THE END OF THE SEABORN SESSION 02</p>

<a id="8"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

___