# Qualitative Graphs

## Summary

* Qualitative graphs are visual displays of qualitative (categorical) data.
* Most common display is the bar graph.
* We can graph categorical frequency (the count) or relative frequency (the percent)

In [None]:
import matplotlib.pyplot as plt
import numpy as np  # needed for cluster (grouped) bar chart
from collections import Counter  # makes creating bar charts for raw data much easier

***With all of the tools and techniques available for working with data, why should we bother to obtain a visualization of it?***

## Bar Graphs

A <span style="color:blue">**bar graph**</span> is a visual display of data in which bars are plotted.

* One dimension represents each category.
* Other dimension represents the frequency (or relative frequency) of each category.

### Example 1.

Construct a bar graph of the following table.

|Day|Hours Worked|
|:---:|:---:|
|Monday|5|
|Tuesday|8|
|Wednesday|7|
|Thursday|6|
|Friday|9|

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [5, 8, 7, 6, 9]

fig, ax = plt.subplots()
bar_colors = ['blue', 'green']  # bars can be different colors
chart = ax.bar(
    days,  # the horizontal (x) axis values
    freq,  # the vertical (y) axis values
    color = bar_colors  # don't include this if you want all bars to be the same color
)
ax.set_title("Bar Chart Example 1")  # create a title
ax.set_xlabel("Day of Week")  # label the x-axis
ax.set_ylabel("Frequency")    # label the y-axis
ax.bar_label(chart)  # provides labels for each bar's height

plt.show()

The cell below will create the same bar graph, but the bars will be horizontal instead of vertical. Preference for one style over the other is purely aesthetic. Notice the `ax.barh` method used.

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [5, 8, 7, 6, 9]

fig, ax = plt.subplots()
bar_colors = ['blue', 'green']  # bars can be different colors
chart = ax.barh(
    days,
    freq,
    color = bar_colors
)
ax.invert_yaxis()  # go from Mon to Fri instead of Fri to Mon
ax.set_title("Horizontal Bar Chart Example 1")  # create a title
ax.set_xlabel("Frequency")  # label the x-axis
ax.set_ylabel("Day of Week")    # label the y-axis
ax.bar_label(chart)  # provides labels for each bar's height

plt.show()

### Relative Frequency Bar Graphs

We can create relative frequency bar graphs which will use proportions (or percentages) for one of the axes; rather than just the raw frequencies. 

The table below lists the pecentages for the data in Example 1, which were calculated separately.

|Day|Percent Total|
|:---:|:---:|
|Monday|14.29%|
|Tuesday|22.86%|
|Wednesday|20.00%|
|Thursday|17.14%|
|Friday|25.71%|

One way to create a relative frequency bar graph is to use the previously calculated percent values for the frequency list, as shown:

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [14.29, 22.86, 20.00, 17.14, 25.71]

fig, ax = plt.subplots()
bar_colors = ['blue', 'green']  # bars can be different colors
chart = ax.bar(
    days,  # the horizontal (x) axis values
    freq,  # the vertical (y) axis values
    color = bar_colors  # don't include this if you want all bars to be the same color
)
ax.set_title("Relative Frequency Bar Chart Example 1")  # create a title
ax.set_xlabel("Day of Week")  # label the x-axis
ax.set_ylabel("Percent")    # label the y-axis
ax.bar_label(chart)  # provides labels for each bar's height

plt.show()

However, we can also utilize Python's ability to calculate the percentages internally to produce the same graph without having to manually calculate the percentages:

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [5, 8, 7, 6, 9]
total_freq = sum(freq)  # calculate the sum of all frequencies

percents = [round(100 * i/total_freq, 2) for i in freq]  # creates a list of relative frequencies

fig, ax = plt.subplots()
bar_colors = ['blue', 'green']  # bars can be different colors
chart = ax.bar(
    days,
    percents,
    color = bar_colors
)
ax.set_title("Alternative Method for Creating Relative Frequency Bar Graphs")  # create a title
ax.set_xlabel("Day of Week")  # label the x-axis
ax.set_ylabel("Percent")    # label the y-axis
ax.bar_label(chart)  # provides labels for each bar's height

plt.show()

### Example 2

One week, a questionnaire was given to hotel guests asking them to rate their satisfaction with their experience. The ratings ranged from 1 (not satisfied) to 5 (very satisfied). Construct a bar graph of the data below:

In [None]:
ratings = [2, 3, 1, 2, 3, 4,
           1, 5, 5, 2, 2, 4,
           5, 3, 2, 5, 3, 4,
           4, 3, 5, 1, 1, 1,
           3, 5, 3, 1, 4, 5]

We could, with a little bit of time, obtain counts on each of the ratings. However, and this is especially helpful if we have thousands of ratings, we can also use `Counter` method from the `collections` Python library given at the beginning of this notebook.

In [None]:
freqs = Counter(ratings)  # this will collect each unique rating and store a count for it
horiz_axis = freqs.keys()  # this is each unique rating
vert_axis = freqs.values()  # this is the count of each unique rating

fig, ax = plt.subplots()
bar_colors = ['blue', 'green', 'red']
chart = ax.bar(horiz_axis, vert_axis, color = bar_colors)
ax.set_title("Example 2: Frequency Bar Graph from Raw Data")  
ax.set_xlabel("Rating")  
ax.set_ylabel("Frequency")    
ax.bar_label(chart)  

plt.show()

### Example 3

Construct a relative frequency bar graph of the hotel ratings.

In [None]:
total_freq = sum(vert_axis)  # use this to find the total frequency

# creates a list of relative frequencies for each class as a percentage
rel_freqs = [round(100*frequency/total_freq, 2) for frequency in vert_axis]
    
fig, ax = plt.subplots()
bar_colors = ['blue', 'green', 'red']
chart = ax.bar(horiz_axis, rel_freqs, color = bar_colors)
ax.set_title("Example 3: Relative Frequency Bar Graph")  
ax.set_xlabel("Rating")  
ax.set_ylabel("Percent")    
ax.bar_label(chart)  

plt.show()

### Clustered and Stacked Bar Graphs

When graphing 2 or more data sets on a bar graph, we can use either clustered or stacked bar graphs.

* Clustered bar graphs will place the data sets side-by-side on the graph.
* Stacked bar graphs will place one data set on top of the other.

### Example 4

Seeing the results of the questionnaires, the hotel made some changes and the following month, asked 40 new guests to rate their experience. The results, along with the previous results are listed:

|Rating|Freq. (Sample 1)|Freq. (Sample 2)|
|:---:|:---:|:---:|
|One|6|4|
|Two|5|2|
|Three|7|12|
|Four|5|10|
|Five|7|12|

(a) Create a clustered bar graph of the ratings.

In [None]:
ratings = ['1','2','3','4','5']
sample1_freq = [6,5,7,5,7]
sample2_freq = [4,3,12,10,12]

x = np.arange(len(labels))  # the ratings' label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
sample1 = ax.bar(x-width/2, sample1_freq, width, label="Sample 1", color='red')  # need to include label for clustered bar chart
sample2 = ax.bar(x+width/2, sample2_freq, width, label="Sample 2", color='blue')

ax.set_title("Example 4: Clustered Bar Graph")  
ax.set_xlabel("Rating")  
ax.set_ylabel("Frequency")
ax.bar_label(sample1)
ax.bar_label(sample2)
ax.legend()  # For knowing individual bar colors

plt.show()

(b) Create a stacked bar graph of the ratings.

In [None]:
ratings = ['1','2','3','4','5']
sample1_freq = [6,5,7,5,7]
sample2_freq = [4,3,12,10,12]

fig, ax = plt.subplots()
sample1 = ax.bar(ratings, sample1_freq, label="Sample 1")  # need to include label for stacked bar chart
sample2 = ax.bar(ratings, sample2_freq, bottom=sample1_freq, label="Sample 2")

ax.set_title("Example 4: Stacked Bar Graph")  
ax.set_xlabel("Rating")  
ax.set_ylabel("Frequency")    
ax.bar_label(sample1, label_type='center')
ax.bar_label(sample2, label_type='center')
ax.bar_label(sample2)  # adds the heights of both bars
ax.legend()  # For knowing individual bar colors

plt.show()

## Pie Charts

Another common type of qualitative graph is the pie chart. Pie charts

* Use area to allow for quick comparison of the part-to-whole nature of percentage. 

* Each slice of the pie (the central angle) is proportional to the percentage that slice is of the whole.

* Are related to a chart called the *donut chart*

***Warning!! Mr. Bain's Opinion:*** I, personally, don't like pie charts and donut charts. I find using area seems unnecessary when trying to convey information that the height of a bar graph can easily provide (especially when the heights are labeled.)

### Example 5

Create a pie chart for the number of hours worked in Example 1. Then determine what percent of the week was spent working on Monday and Tuesday?

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [5, 8, 7, 6, 9]
wedge_colors = ['violet', 'yellow', 'silver', 'orange', 'lightblue']

fig, ax = plt.subplots()
chart = ax.pie(
    x = freq, 
    labels = days,
    colors = wedge_colors,
    autopct='%0.2f%%'  # the number before f is the number of decimal places for displaying percents
)
ax.set_title("Pie Chart of Example 1 \n")  # create a title with newline separator \n
ax.axis('equal')  # forces pie to be drawn as a circle

plt.show()

Below is a donut chart of the above pie chart. It's literally just the pie chart with a hole in the middle.

In [None]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
freq = [5, 8, 7, 6, 9]
wedge_colors = ['violet', 'yellow', 'silver', 'orange', 'lightblue']

fig, ax = plt.subplots()
chart = ax.pie(
    x = freq, 
    labels = days,
    colors = wedge_colors,
    autopct='%0.2f%%'  # the number before f is the number of decimal places for displaying percents
)

hole = plt.Circle(
    xy = (0,0),  # coordinates of the center of the circle
    radius = 0.8,
    color = 'white'  # background color of circle
    )

fig = plt.gcf()
fig.gca().add_artist(hole)
ax.set_title("Donut Chart of Example 1 \n")  # create a title with newline separator \n
ax.axis('equal')  # forces pie to be drawn as a circle

plt.show()