# Tutorial 4 - Basic Charting in Pandas
The goal of this tutorial is to get you comfortable creating basic charts such as bar and line charts using pandas dataframes. Visualization is a very helpful tool for exploring and for communicating data. Things we'll cover:
- Bar charts, histograms, line charts
- Axes labels
- Legends

We won't cover these here, but you should be aware that there are many other visualization libraries that can be used in Jupyter notbeooks including:
- [Bokeh](http://bokeh.pydata.org/en/latest/)
- [Seaborn](https://github.com/mwaskom/seaborn)
- [Plotly](https://plot.ly/python/)

In [None]:
import pandas as pd

Let's load in the 911 calls dataset that we used in Problem Set 1. Here's the [dataset](https://github.com/comp-journalism/UMD-J479V-J779V-Spring2017/blob/master/Data/911_Calls_Baltimore_11-2016.csv?raw=true) in case you need to download it again. 

In [None]:
calls_df = pd.read_csv("Data/911_Calls_Baltimore_11-2016.csv", parse_dates=["callDateTime"])

### Charting Examples
Pandas has plotting capabilities built-in (it's built on top of a library called Matplotlib which is ultimately more flexible). Lots more examples can be found [here](http://pandas.pydata.org/pandas-docs/stable/visualization.html). We'll examine how to do some basic analysis and charting, but first we need to import some things and set some style parameters.  

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

# This magic function allows you to see the charts directly within the notebook. 
%matplotlib inline

# This command will make the plots more attractive by adopting the common style of a different library called ggplot
matplotlib.style.use("ggplot")

In [None]:
calls_df.columns

Let's make a bar chart showing the aggregate number of calls of each type of priority. 

In [None]:
#Here's how we get the counts for each type of priority
calls_df.priority.value_counts()

In [None]:
# Create a bar chart plot
calls_df.priority.value_counts().plot.bar()

Hmmm... It's hard to read those labels vertically, so let's make a horizontal bar chart instead. 

In [None]:
calls_df.priority.value_counts().plot.barh()

We can change the color of the bars if we want. But we need to create the chart a bit differently so we can give it parameters. Note in the next cell we're passing parameters to the `plot()` function, telling it the kind of plot and the color to use. Here's [the documentations](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html) for all of the parameters that `plot()` can accept. 

In [None]:
calls_df.priority.value_counts().plot(kind="barh", color="purple")

You can make the chart bigger with the `figsize` parameter, you can also set a title and change the font size to improve readability.

In [None]:
calls_df.priority.value_counts().plot(kind="barh", color="purple", figsize=(12,4), title="Number of 911 calls by priority type in Baltimore.", fontsize=14)

Histograms can be useful mechanisms to summarize the distribution of values. Let's create a histogram of the hour of the day when calls are made. 

In [None]:
# Create a new column which stores the hour of the day
calls_df["hour_of_day"] = pd.DatetimeIndex(calls_df["callDateTime"]).hour

In [None]:
# All of the priorities types aggregated
calls_df.hist(column="hour_of_day", bins=range(24), figsize=(16,4))

We'd like to label each of the bins individually though. We can do that because the return value from the `hist()` function is an *axis* object that can be further manipulated to change the presentation of the chart such as adding an axis label, or specifying specific labels for tick marks. 

In [None]:
# All of the priorities types aggregated
axes = calls_df.hist(column="hour_of_day", bins=range(24), figsize=(16,4))
axes[0][0].set_xticks(range(0,24))
axes[0][0].set_xlabel("Hour of Day")

We can also break out histograms by priority type in order to compare.  

In [None]:
calls_df.hist(column="hour_of_day", by="priority", bins=range(24), figsize=(16,8), sharey=True)

In [None]:
calls_df.columns

We can also just show the equivalent of the histogram as a smooth line chart. 

In [None]:
axes = calls_df.hour_of_day.value_counts().sort_index().plot()
axes.set_xticks(range(0,24))
axes.set_xlabel("Hour of Day")

... Or as an area chart.

In [None]:
axes = calls_df.hour_of_day.value_counts().sort_index().plot(kind="area")
axes.set_xticks(range(0,24))
axes.set_xlabel("Hour of Day")

We can also plot different lines on the same set of axes. 

In [None]:
calls_df[calls_df.priority == "Low"].hour_of_day.value_counts().sort_index().plot()
calls_df[calls_df.priority == "Medium"].hour_of_day.value_counts().sort_index().plot()
calls_df[calls_df.priority == "High"].hour_of_day.value_counts().sort_index().plot()

But then we also need a legend to differentiate what's what on the chart. 

In [None]:
calls_df[calls_df.priority == "Low"].hour_of_day.value_counts().sort_index().plot()
calls_df[calls_df.priority == "Medium"].hour_of_day.value_counts().sort_index().plot()
calls_df[calls_df.priority == "High"].hour_of_day.value_counts().sort_index().plot()
plt.legend(["Low", "Medium", "High"], loc="best")