# Data Analysis and Visualization in Python
## Data Ingest & Visualization - Matplotlib & Pandas
Questions
* What other tools can I use to create plots apart from ggplot?
* Why should I use Python to create plots?

Objectives
* Import the pyplot toolbox to create figures in Python.
* Use matplotlib to make adjustments to Pandas

## Obtain data

In [None]:
import pandas as pd

# Load the data into a DataFrame
surveys_df = pd.read_csv('../data/surveys.csv')
species_df = pd.read_csv("../data/species.csv")

## Clean up your data and open it using Python and Pandas

In [None]:
# Common delimiters are ',' for comma, ' ' for space, and '\t' for tab
pd.read_csv?

In [None]:
df = pd.DataFrame({'1stcolumn':[100,200], '2ndcolumn':[10,20]}) # this just creates a DataFrame for the example!
print('With the old column names:\n') # the \n makes a new line, so it's easier to see
print(df)

In [None]:
df.columns = ['FirstColumn','SecondColumn'] # rename the columns!
print('\n\nWith the new column names:\n')
print(df)

## Matplotlib package
A great resource for help styling your figures is the matplotlib gallery (http://matplotlib.org/gallery.html), which includes plots in many different styles and the source code that creates them. The simplest of plots is the 2 dimensional line plot.

### Using the `pyplot` toolbox

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
my_plot = surveys_df.plot("hindfoot_length", "weight", kind="scatter")
plt.show() # not necessary, but looks better in Jupyter Notebooks

### `plt` pyplot versus object-based matplotlib

In [None]:
import numpy as np
sample_data = np.random.normal(0, 0.1, 1000)

In [None]:
plt.hist(sample_data)
plt.show() # not necessary, but looks better in Jupyter Notebooks

In [None]:
fig, ax = plt.subplots()  # initiate an empty figure and axis matplotlib object
ax.hist(sample_data, 30)
plt.show() # not necessary, but looks better in Jupyter Notebooks

In [None]:
fig, ax1 = plt.subplots() # prepare a matplotlib figure
ax1.hist(sample_data, 30)

# Add a plot of a Beta distribution
a = 5
b = 10
beta_draws = np.random.beta(a, b)
# adapt the labels
ax1.set_ylabel('density')
ax1.set_xlabel('value')

# add additional axes to the figure
ax2 = fig.add_axes([0.125, 0.575, 0.3, 0.3])
#ax2 = fig.add_axes([left, bottom, right, top])
ax2.hist(beta_draws)

plt.show() # not necessary, but looks better in Jupyter Notebooks

### Link matplotlib and Pandas

In [None]:
fig, ax1 = plt.subplots() # prepare a matplotlib figure

surveys_df.plot("hindfoot_length", "weight", kind="scatter", ax=ax1)

# Provide further adaptations with matplotlib:
ax1.set_xlabel("Hindfoot length")
ax1.tick_params(labelsize=16, pad=8)
fig.suptitle('Scatter plot of weight versus hindfoot length', fontsize=15)

plt.show() # not necessary, but looks better in Jupyter Notebooks

### Saving matplotlib figures

In [None]:
fig.savefig("my_plot_name.png")

### Exercise - Saving figures
Save in `pdf` format in 300 dpi

In [None]:
fig.savefig("my_plot_name.pdf", dpi=300)

## Make other types of plots
http://matplotlib.org/users/screenshots.html

### Homework
Use line plots to visualize the surveys data