# Plotting

* Choose your plot type based on the question you are answering and the data type(s) you are working with
* Use pandas one-liners to iterate through plots quickly
* Try modifying the plot defaults
* Creating plots involves decision-making

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

## Data visualization

In [None]:
df = pd.read_csv('../data/ted.csv')

In [None]:
df.head(3)

In [None]:
df.dtypes

In [None]:
dir(df.dtypes)

In [None]:
df.dtypes.get_dtype_counts()

### Scatter Plot

In [None]:
# basic scatter plot
df.plot.scatter(x='comments', y='views');

To plot multiple column groups in a single axes, repeat plot method specifying target ax.
<br>
It is recommended to specify color and label keywords to distinguish each groups.

In [None]:
ax = df.plot.scatter(x='comments', y='views', color='DarkBlue', label='Group 1');

df.plot.scatter(x='languages', y='num_speaker', color='DarkGreen', label='Group 2', ax=ax);

In [None]:
# The keyword c may be given as the name of a column to provide colors for each point:

df.plot.scatter(x='comments', y='views', c='languages', s=50);

You can pass other keywords supported by matplotlib scatter.
<br>
The example below shows a bubble chart using a column of the DataFrame as the bubble size.

In [None]:
df.head(500).plot.scatter(x='comments', y='views', s=df['languages'] * 100);

### Bar Plot

In [None]:
# histogram shows the frequency distribution of a single numeric variable
df.comments.plot(kind='hist')

In [None]:
# modify the plot to be more informative
df[df.comments < 1000].comments.plot(kind='hist')

In [None]:
# check how many observations we removed from the plot
df[df.comments >= 1000].shape

In [None]:
# can also write this using the query method
df.query('comments < 1000').comments.plot(kind='hist')

In [None]:
# can also write this using the loc accessor
df.loc[df.comments < 1000, 'comments'].plot(kind='hist')

In [None]:
# increase the number of bins to see more detail
df.loc[df.comments < 1000, 'comments'].plot(kind='hist', bins=20)

### Pie Plot

In [None]:
dfb = df['comments'].head(40)
dfb.plot.pie(subplots=True, figsize=(7, 7));