<center>
<h1>Python and Pandas - Introduction</h1>
<h2>
Charts - part 2
</h2>
Based on "Applied Plotting, Charting & Data Representation in Python" (Coursera)
</center>

## Subplots

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

plt.subplot?

In [None]:
plt.figure()
plt.subplot(1, 2, 1)
linear_data = np.arange(1, 9)
plt.plot(linear_data, '-o')

In [None]:
exponential_data = linear_data**2 

# subplot with 1 row, 2 columns, and current axis is 2nd subplot axes
plt.subplot(1, 2, 2)
plt.plot(exponential_data, '-o')

Same scale on y axes

In [None]:
plt.figure()
ax1 = plt.subplot(1, 2, 1)
plt.plot(linear_data, '-o')

ax2 = plt.subplot(1, 2, 2, sharey=ax1)
plt.plot(exponential_data, '-x')


### More charts

In [None]:
fig, ((ax1,ax2,ax3), (ax4,ax5,ax6), (ax7,ax8,ax9)) = plt.subplots(3, 3, sharex=True, sharey=True)

ax5.plot(linear_data, '-')

### Task 1
Prepare the chart with two subplots (bar plots) that present data1 and data2 lists.
- Subplots shall be arranged vertically (the second one below the first one)
- Subplots shall share X axis
- Add y labels ("Data 1" and "Data 2") to subplots

In [None]:
data1 = [2, 3, 5, 1, 2, 5, 2]
data2 = [3, 4, 1, -2, -1, 2, 1]

In [None]:
# Enter your code here

## Histograms

In [None]:
plt.figure()
sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
ax1 = plt.subplot(121)
plt.hist(sample)
ax2 = plt.subplot(122, sharey=ax1)
plt.hist(sample, bins = 20);

## Box plots

##### Prepare and analyze data

In [None]:
import pandas as pd
normal_sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
random_sample = np.random.random(size=10000)
gamma_sample = np.random.gamma(2, size=10000)

df = pd.DataFrame({'normal': normal_sample, 
                   'random': random_sample, 
                   'gamma': gamma_sample})

df.describe()

### Task 2
Prepare the chart with 3 subplots (histograms) that contain data from `df` dataframe.
- Histograms shall be arranged horizontally (in one row)
- Histograms shall contain 25 bins
- Histograms shall share y axis
- Histograms shall have titles

In [None]:
# Enter your code here

In [None]:
df.describe()

##### Box Plot

In [None]:
plt.figure()
plt.boxplot(df['normal'], whis='range');

Default size of the whisker is equal to 1.5 * IQR (InterQuartile Range)

In [None]:
plt.figure()
plt.boxplot(df['normal']);

Whiskers can be set as relative values

In [None]:
plt.figure()
plt.boxplot(df['normal'], whis=[0.1,99.9]);

Boxplots are often used to display different distributions at the same time.

In [None]:
plt.figure()
plt.boxplot([ df['normal'], df['random'], df['gamma'] ], whis='range');

#### Task 3
Add title and x tick labels to the chart above. Use column names from `df` Dataframe as x tick tables. 

In [None]:
# Enter your code here

## Pandas

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

In [None]:
plt.style.available

In [None]:
# use the 'seaborn-colorblind' style
plt.style.use('seaborn-colorblind')

### DataFrame.plot

In [None]:
np.random.seed(123)

df = pd.DataFrame({'A': np.random.randn(365).cumsum(0), 
                   'B': np.random.randn(365).cumsum(0) + 20,
                   'C': np.random.randn(365).cumsum(0) - 20}, 
                  index=pd.date_range('1/1/2019', periods=365))
df.head()     

In [None]:
df.plot(); 

We can select which plot we want to use by passing it into the 'kind' parameter.

In [None]:
df.plot('A','B', kind = 'scatter');

#### Types of plots
- Pandas offers different types of plots
- instead of using argument `kind`, you can use `DataFrame.plot.kind` syntax. 

##### list of available types:
- `'line'` : line plot (default)
- `'bar'` : vertical bar plot
- `'barh'` : horizontal bar plot
- `'hist'` : histogram
- `'box'` : boxplot
- `'kde'` : Kernel Density Estimation plot
- `'density'` : same as 'kde'
- `'area'` : area plot
- `'pie'` : pie plot
- `'scatter'` : scatter plot
- `'hexbin'` : hexbin plot

Create a scatterplot from columns `A` (x axis) and `C` (y axis) that changes color and size of the dots based on the data from column `B`.

In [None]:
df.plot.scatter('A', 'C', c='B', s=df['B'], colormap='viridis')

`DataFrame.plot` is using `matplotlib.pyplot`. You can modify the `DataFrame.plot` charts in the same way that you modify `pyplot` charts.

In [None]:
ax = df.plot.scatter('A', 'C', c='B', s=df['B'], colormap='viridis')
ax.set_aspect('equal')

### More `df.plot` examples 

In [None]:
df.plot.box();

In [None]:
df.plot.hist(alpha=0.7);

[Kernel density estimation plots](https://en.wikipedia.org/wiki/Kernel_density_estimation) are useful for deriving a smooth continuous function from a given sample.

In [None]:
df.plot.kde();

### pandas.plotting

[Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set)

In [None]:
iris = pd.read_csv('iris.csv')
iris.head()

In [None]:
pd.plotting.scatter_matrix(iris);

### Task 4
Prepare a pie plot that displays how many irises are in each group (setosa, versicolor, virginica). 

**Note:** Before you plot the chart, you may want to prepare (group and aggregate) the data.

In [None]:
# Enter your code here

# Seaborn

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib notebook

In [None]:
np.random.seed(1234)

v1 = pd.Series(np.random.normal(0,10,1000), name='v1')
v2 = pd.Series(2*v1 + np.random.normal(60,15,1000), name='v2')

In [None]:
plt.figure()
plt.hist(v1, alpha=0.7, bins=np.arange(-50,150,5), label='v1')
plt.hist(v2, alpha=0.7, bins=np.arange(-50,150,5), label='v2')
plt.legend();

## Joint plots

In [None]:
sns.jointplot(v1, v2, alpha=0.4);

You can manipulate `seaborn` charts using `matplotlib.pyplot` code.

In [None]:
grid = sns.jointplot(v1, v2, alpha=0.4);
grid.ax_joint.set_aspect('equal')

#### Hexabin plot

In [None]:
sns.jointplot(v1, v2, kind='hex');

In [None]:
sns.jointplot(v1, v2, kind='kde', space=0);

In [None]:
iris = pd.read_csv('iris.csv')
iris.head()

## Pairplot

In [None]:
sns.pairplot(iris, hue='Name', diag_kind='kde', size=2);

## Violin plot

In [None]:
plt.figure(figsize=(8,6))
plt.subplot(121)
sns.swarmplot('Name', 'PetalLength', data=iris);
plt.subplot(122)
sns.violinplot('Name', 'PetalLength', data=iris);

In [None]:
f = plt.figure()
sns.violinplot(y = 'PetalLength', data=iris);

### Task 5
Prepare a jointplot (`kde` type) that shows relation between the length and width of petals for the entire iris dataset.

In [None]:
# Enter your code here