<a href="https://colab.research.google.com/github/Shadrock/online-python-course/blob/master/Analyzing_a_Survey_Assignment/Kenya_Counties_create_charts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Creating Charts in Python**
So far we've done some processing on our survey of counties in Kenya, but we can take things further and explore some more features using a Jupyter notbook. We’re going to use a Python library called [Matplotlib](https://matplotlib.org/) to create some graphical charts based on our data.

Try the simple code below and you’ll see that the output is a line chart is plotted in the notebook, under the code cell. matplotlib can also output charts in other formats like image files, but being able to edit the code and regenerate the chart inline is one of the nice features of Jupyter notebooks.

In [None]:
import matplotlib.pyplot as plt
vals = [3,2,5,0,1]
plt.plot(vals)

If you want to use matplotlib directly from Python instead of via a Notebook, you just need to add one final line to each of your programs: `plt.show()` will display a window with the chart you created, and pause the script until you close it.

All the examples here assume you’re using matplotlib with a Notebook, so remember this additional line if you're not. 

## Load Your Data
Load the survey using the same code you did previously.

In [None]:
# Upload local script to Colab - running this creates a "choose file" button to upload local files. 
from google.colab import files
uploaded = files.upload()

We'll use matplotlib to generate a bar graph to display the vote counts from the Kenya Counties program we wrote.

If you already have the counties program loaded in Colab, then you can add the following code to it. Otherwise, we'll need to run some of the old code here to generate some of the variables we created including `counts`, which holds a dictionary mapping each county name to the vote count. Let’s use that to plot the vote counts. Previously, we used the code below, which we created at the very end of the Notebook to "munge" our data and create different functions for really clean code. 

In [None]:
# Create an empty dictionary for associating county names with vote counts
counts = {}

# Create an empty list with the names of everyone who voted
voted = []

# Clean up (munge) a string so it's easy to match against other strings
def clean_string(s):
  return s.strip().capitalize().replace("  "," ")

# Check if someone has voted already and return True or False
def has_already_voted(name):
  if name in voted:
    print(name + " has already voted! Fraud!")
    return True
  return False

# Count a vote for the county variety named 'county'
def count_vote(county):
  if not county in counts:
    # First vote for this county
    counts[county] = 1
  else:
    # Increment the county count
    counts[county] = counts[county] + 1

with open("KEcounty_votes.txt") as file:
  for line in file:
    line = line.strip()
    name, vote = line.split(" - ")
    name = clean_string(name)
    vote = clean_string(vote)
    
    if not has_already_voted(name):
      count_vote(vote)
    voted.append(name)

print("Results:")
print()
for name in counts:
    print(name + ": " + str(counts[name]))

Start by importing two modules - pyplot is one way to plot graph data with Matplotlib. It’s modelled on the way charting works in another popular commercial program, MATLab. NumPy is a module providing lots of numeric functions for Python.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

This loop processes the dictionary into a format that’s easy to send to matplotlib - a list of radish names (for the labels on the bars) and a list of vote counts (for the actual graph.)

In [None]:
names = []
votes = []
# Split the dictionary of names->votes into two lists, one holding names and the other holding vote counts
for county in counts:
    names.append(county)
    votes.append(counts[county])

We create a range of indexes for the X values in the graph, one entry for each entry in the “counts” dictionary (ie `len(counts)`), numbered 0,1,2,3,etc. This will spread out the graph bars evenly across the X axis on the plot.

`np.arange` is a NumPy function like the `range()` function in Python, only the result it produces is a “NumPy array”. We’ll see why this is useful in a second.

In [None]:
# The X axis can just be numbered 0,1,2,3...
x = np.arange(len(counts))

`plt.bar()` creates a bar graph, using the "x" values as the X axis positions and the values in the votes array (ie the vote counts) as the height of each bar. 

`plt.xticks()` specifies a range of values to use as labels ("ticks") for the X axis.

Finally, `rotation=90` ensures that the labels are drawn sideways (90 degree angle) not straight. You can experiment with different rotations to create different effects.

In [None]:
plt.bar(x, votes)
plt.xticks(x + 0, names, rotation=90)


# **Challenge**
There’s no label on the Y axis showing that it represents the vote count.

Can you update your bar graph code so it does this? Take a look at the [ylabel() function in the pyplot documentation](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.ylabel).