# Plot Types - Discrete Data

Discrete data, or categorical data, describes the case where data can only take certain values. This is in contrast to continuous data, where data can take any value. In this notebook, we'll look at some examples of different types of plots for discrete data. We'll keep the complexity of plots relatively simple for now, and build up more complexity in later notebooks.

## Bar Plot

We can use a simple bar plot from Matplotlib. To do this, we first need to import the relevant part of Matplotlib using ```import matplotlib.pyplot```. Assuming we import this as ```plt```, a bar plot can be inserted using ```plt.bar```. The manual page can be found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html). Manual pages for particular plot types can be invaluable for explaining exactly how they work or learning about new options or functionality.

Following that, we can set up our data. For now, we'll just manually specify lists containing the names of the categories and their values. However, this data can be generated from any source, such as as the result of a calculation, simulation or analysis of more complex data.

We're using Maplotlib in a fairly simple way here, so we can change the label on the x-axis, y-axis and the title by setting ```plt.xlabel```, ```plt.ylabel``` and ```plt.title``` respectively. The way we're using Matplotlib in this example, there is only one figure being created, and this will set these properties of it.

Finally, to plot the bar graph, we write ```plt.bar``` and provide to it the names of the categories and the values associated with each as arguments.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Set up the categories
x = ["Maths", "Physics", "Biology", "Chemistry", "Medicine", "Computing"]
# Set up the values
y = [10, 20, 32, 15, 27, 18]

# Set up the label for x-axis
plt.xlabel("Subject")
# Set up the label for the y-axis
plt.ylabel("Number of Students")
# Set up the title of the chart
plt.title("Favourite Subject")

# Plot the bar chart
plt.bar(x, y)

## Stacked Bar Charts

When using multiple plotting commands such as ```plt.bar``` they will be superimposed on the same figure. We can use this to plot two stacked bar charts, one on top of the other. We can cause the second series of bars to appear above the first by setting the ```bottoms``` argument of the ```plt.bars``` function call such that the bottom of the bars representing the conference papers is at the top of the bars representing the journal papers.

When using the ```plt.bar``` functions, we also give each of these data series a name by setting the ```label``` argument. These labels are used when we use the ```legend``` function to instruct Matplotlib to set up a key.

In [None]:
# Import matplotlib and numpy
import matplotlib.pyplot as plt
import numpy as np

# Load the data from a file
data = np.loadtxt("publications.txt")

# Assign columns of the array to variables
year = data[:, 0]
journal = data[:, 1]
conference = data[:, 2]

# Set up the labels and the titles
plt.ylabel("Number of Publications")
plt.xlabel("Year")
plt.title("Publications")

# Create a bar plot for each of the data series
# The "label" argument adds a label to the series to be used in the key
plt.bar(year, journal, label = "Journal Papers")
# When adding the second data set, the "bottom" argument causes the bars to be stacked on top of the first bars
plt.bar(year, conference, label = "Conference Papers", bottom = journal)

# The legend command places a legend on the figure
# Because the legend uses the labels set up in the plt.bar commands, it must appear after these commands
plt.legend()

## Histograms

Histograms are a type of chart which represent a distribution of values in a number of "bins". The width of a bar relates to the width of a bin and the area of the bar may relate to the number of values in that bin.

A histogram may be inserted using ```plt.hist``` ([manual entry](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)). The first argument is a list of values. These are the individual values which make up the distribution. By default, the y-axis shows the number of entries in each bin but the chart type can be caused to modify the heights to make the areas of the bins relate to the number of entries. The example below is deliberately simple - you will have a chance to explore more of the options in the exercise.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Create the data
test_scores = [65, 70, 59, 50, 75, 66, 64, 30, 63, 90, 61, 60, 68, 45, 72, 57, 54, 81, 83, 42]

# Create the histogram
plt.hist(test_scores)

## Pie Charts

Creating a pie chart in Matplotlib requires using the ```plt.pie``` function ([manual entry](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pie.html)). The only required argument is the sequence of values associated with each slice. The ```labels``` arguments accepts a sequence of strings.

The ```explode``` argument takes a sequence of values which can cause certain segments to be exploded outward from the centre of the pie chart. The order of the values corresponds with the order of the values provided.

The ```start_angle``` argument alters the angle (in degrees) the first value starts at. An angle of zero relates to the 3 o'clock position and increasingly positive values relate to anti-clockwise rotation.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Define the data
classifications = ["Fail", "3", "2:2", "2:1", "1"]
students = [2, 5, 30, 50, 30]

# Plot the pie chart
# The first argument sets the values relating to each slice
# labels sets the name of each slice
# explode causes slices to be "exploded" from the centre
# startangle rotates the angle of the first slice of the pie chart
plt.pie(students, labels = classifications, explode = [0.5, 0.1, 0, 0, 0], startangle = 45)

## Exercise: A Stacked Histogram

This exercise is designed to allow you to practice using the features we've explored in this notebook and to practice using the documentation to work out how to produce different effects in a figure.

The heights of 20 males and 20 females have been recorded and are given in the cell below. Using this data, produce a histogram which shows the probability density function of the sample. 

* The male and female distributions should be separately visible, with one stacked on top of the other.
* A legend should act as a key to the male and female data.
* Add an appropriate title and appropriate labels on the x and y axes.
* The histogram should specifically show the probability density function (i.e. area under the histogram should integrate to 1). For reference, this means the peak of histogram should be at about 3.5/m.
* The bins of the histogram should each be 5cm wide and span the range 1.4m-2m (i.e. 1.4m-1.45m, 1.45m-1.5m, ... 1.95m-2m). 

You may find the ```hist``` [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) helpful. 

In [None]:
# The heights
male = [1.7, 1.8, 1.75, 1.6, 1.82, 1.78, 1.84, 1.9, 1.79, 1.59, 1.72, 1.68, 1.63, 1.64, 1.75, 1.66, 1.8, 1.71, 1.55, 1.85]
female = [1.5, 1.7, 1.6, 1.55, 1.67, 1.57, 1.62, 1.59, 1.45, 1.57, 1.79, 1.75, 1.64, 1.69, 1.62, 1.52, 1.74, 1.66, 1.67, 1.59]



If you're stuck, have a look at some of the hints below:

* Look at the "x" parameter to find out how to provide multiple data sets to "hist"
* Look at the "bins" parameter to find out how to set the bins of the histogram
* To find out how to stack the histograms, look at the "stacked" parameter
* To find out how to makes sure the stacked histograms have an area of 1, examine the "density" parameter