# Module 6: Displaying our Data

In this section we are going to look at ways to display the information that we find from working with our datasets. Drawing plots and graphs can be complicated, so we'll stick to a basic example here. You are free to show your results as a table or can ask a counselor for help if you have questions.

## Matplotlib

Much like using Pandas for working with the dataset files, we need to use another package, *Matplotlib*, to draw graphs for us. We need to have a few lines of Python to setup things for using both Pandas and Matplotlib:

In [None]:
import pandas as pd                      # import pandas for data frames
import numpy as np                       # numpy has math/stats functions
import matplotlib.pyplot as plt          # need to plot figures

plt.rcParams['figure.figsize'] = [7,5]   # make the default plot 7" x 5"

Once we have that setup, we can read in our data file. For this example, we're going to use the dataset about how many women work in the Federal government over time. Here we load in the data and print off the first 5 lines, using `head()`.

In [None]:
df = pd.read_csv('gender-by-quarter.csv', index_col=0)
df.head(5)

Notice that the data we have is:

* the date
* whether the numbers are for permanent employees or people appointed by the President
* the total number of employees
* the total number of female employees
* what percentage of employees are female

We can use `describe()` to get a broad picture of what's in the data.


In [None]:
df.describe()

By looking at this output we can see some things that help us understand the data already. For example, if we look at the **count** row, we can see that the dataset has 88 entries.

If we look at the **pct_female** column, we can see that the maximum number of female employees is 53%, the minimum is 37%, and the average is 45.29%. 

While this is useful, it doesn't really help us understand how things have changed over time.

Let's dig in a little closer and just take a look at the employees that were appointed to the office. These people are usually chosen by the President or their advisor and only work for the current President.

We use a dataframe selection, to pick all the entries where the *toa_group* column is set to `midlevel-appointees`. After this, `df2` only has the rows for the appointees and not the permanent employees.

In [None]:
df2 = df[df['toa_group'] == "midlevel-appointees"]
df2

Now let's look at plotting this information in a graph. We would like to show how the percentage of female employees changes over time. We're in luck, since our data is organized by date. 

We only need to pick the column we are interested in, `pct_female`, and plot it on the graph.

Most of the other lines are used to draw labels and titles, or change the formatting.

In [None]:
# create a figure and set of axes to draw on
fig,ax = plt.subplots()                     

# set the title of our graph
plt.title('Mid-level Female Federal Government Employees')

# plot the percentage of female appointees over time into our plot
df2['pct_female'].plot()

# set the y-axis label and range
plt.ylabel('% female employees')
ax.set_ylim(0,70)

# set the x-axis label and have it print the dates nicely
plt.xlabel('Date')
fig.autofmt_xdate()

# show the plot on the screen
plt.show()

Here we can see that the number of female workers changes over time. 

From this graph, it looks that the number of appointed female federal workers was higher under the Clinton and Obama presidencies and lower during the Bush and Trump presidencies.

### Now you try. ###

Go back to the original data frame, `df`, and create a new data frame that only contains data for the permanent employees.

Next, try to make a similar plot that shows the percentage of female employees over time

Next, try to plot the total number of total and female employees over time, not just their percentage. You can plot multiple columns on the same graph, just by using multiple `.plot()` statements.