# Plotting data from pandas dataframes

Lesson website: https://swcarpentry.github.io/python-novice-gapminder/09-plotting.html

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

#### pandas plot()
Plot data directly from a Pandas dataframe.
<p>We can also plot Pandas dataframes.
<p>Before plotting, we convert the column headings from a string to integer data type, since they represent numerical values, using str.replace() to remove the gpdPercap_ prefix and then astype(int) to convert the series of string values (['1952', '1957', ..., '2007']) to a series of integers: [1925, 1957, ..., 2007].


In [None]:
df_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')

# Extract year from last 4 characters of each column name
# The current column names are structured as 'gdpPercap_(year)',
# so we want to keep the (year) part only for clarity when plotting GDP vs. years
# To do this we use replace(), which removes from the string the characters stated in the argument
# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions

years = df_oceania.columns.str.replace('gdpPercap_', '')

# Convert year values to integers, saving results back to dataframe

df_oceania.columns = years.astype(int)

df_oceania.loc['Australia'].plot()

.plot() is a wrapper function for a visualization library in python, matplotlib. You can add an argument, kind, to change the type of plot displayed.

For example, `df.plot(kind='bar')`. Note that the default kind is 'line'.

|string value|plot type|
-------------|----------
|'line' | line plot (default)|
|'bar' | vertical bar plot|
|'barh' | horizontal bar plot|
|'hist' | histogram|
|'box' | boxplot|
|'kde' | Kernel Density Estimation plot|
|'density' | same as kde|
|'area' |area plot|
|'pie' | pie plot|
|'scatter' | scatter plot (DataFrame only)|
|'hexbin' | hexbin plot (DataFrame only)|

You should make sure that your data is oriented in the right way. For example, with a line chart each column corresponds to a line. If your plot looks weird, try transposing your dataframe with `df.T`

In [None]:
df_oceania.plot()

In [None]:
df_oceania.T.plot()

### Style your plot!

You can use matplotlib to style your plot! Here are some examples:

In [None]:
df_oceania.T.plot()
plt.ylabel('GDP per capita')

In [None]:
plt.style.use('ggplot')
df_oceania.T.plot(kind='bar')
plt.ylabel('GDP per capita')
plt.xlabel('Year')
plt.legend(loc='upper left')

'ggplot' is one of many styles you can use. Here is a list with examples: https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html

### Save your plot

You can use `plt.savefig('filename.png')` to save your plot to an image file. Make sure  you first use the `plt.gcf()` (get current figure) function. For example:

In [None]:
df_oceania.plot(kind='bar')
fig = plt.gcf()
fig.savefig('my_figure.png')

## Practice Time!
1. Import new region specific gapminder data (e.g. Europe) and make a plot showing gdp for that region over time. Start by replicating the plots we just made for Oceania, and then if you’re up for it try a new "kind" of plot!

2. Import gapminder all (with pop, life expectancy, gdp), plot one of the available datapoints for a specific region by creating a new dataframe with the filtered rows and columns

3. Go crazy! Get creative! Make a totally new plot using the gapminder data, and play with style and formatting options. Can refer to [COB branding](https://www.boston.gov/departments/innovation-and-technology/brand-guidelines) and the [pandas.plot() documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html).