# Plotting

## Overview
**Teaching:** 15 min

**Exercises:** 15 min

## Questions
- How can I plot my data?

- How can I save my plot for publishing?

## Objectives
- Create a time series plot showing a single data set.

- Create a scatter plot showing relationship between two data sets.

## Plot data directly from a [Pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

- We can also plot [Pandas dataframes](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).
- This implicitly uses [matplotlib.pyplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot).
- Before plotting, we convert the column headings from a `string` to `integer` data type, since they represent numerical values, using `str.replace() `to remove the `gpdPercap_ prefix` and then `astype(int)` to convert the series of string values (`['1952', '1957', ..., '2007']`) to a series of integers: [`1925, 1957, ..., 2007]`.

In [None]:
import pandas as pd

data = pd.read_csv("../../../data/gapminder_gdp_oceania.csv", index_col="country")

# Extract year from last 4 characters of each column name
# The current column names are structured as 'gdpPercap_(year)',
# so we want to keep the (year) part only for clarity when plotting GDP vs. years
# To do this we use replace(), which removes from the string the characters stated in the argument
# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions

years = data.columns.str.replace("gdpPercap_", "")

# Convert year values to integers, saving results back to dataframe

data.columns = years.astype(int)

data.loc["Australia"].plot()

## Select and transform data, then plot it.

- By default, [`DataFrame.plot`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html#pandas.DataFrame.plot) plots with the rows as the X axis.
- We can transpose the data in order to plot multiple series.

In [None]:
data.T.plot()
plt.ylabel("GDP per capita")

### Scatter plot

- Plot a scatter plot correlating the GDP of Australia and New Zealand

In [None]:
data.T.plot.scatter(x="Australia", y="New Zealand")

## Exercises

See `../exercises/05-plotting_exercises.ipynb`.

## Key Points

- You can plot data directly from a Pandas dataframe.

- Select and transform data, then plot it.

- but to create a greater variety of higher quality plots - and to be able to customize them - you will want to use a Python package created specifically for visualizations

    + i.e. [`matplotlib`](https://matplotlib.org/), seaborn, bokeh, and plotly



Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/09-plotting/index.html) 2018–2023 by [The Carpentries](https://carpentries.org/)

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/09-plotting/index.html) 2016–2018 by [Software Carpentry Foundation](https://software-carpentry.org/)