## CS102-4 - Further Computing

Prof. Götz Pfeiffer<br>
School of Mathematics, Statistics and Applied Mathematics<br>
NUI Galway

### 3. Aspects of Data Visualization

# Week 9: Line Plots, Scatter Plots, Histograms

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Anatomy of a Plot

* For all `Matplotlib` plots, we start by creating a **figure** and an **axes**.

* The **figure** is an instance of the class ``plt.Figure``.
* It can be thought of as a single container that contains all the objects representing axes, graphics, text, and labels.
* A plot has one figure.

* The **axes** (plural of axis) is an instance of the class ``plt.Axes``.
* It is a bounding box with ticks and labels, representing an $x,y$-plane.
* It will eventually contain the plot elements that make up our visualization.
* Once we have created an axes, we can use the ``ax.plot`` function to plot some data.
* A figure can have several axes instances.

* We'll commonly use the variable name ``fig`` to refer to a figure instance, and ``ax`` to refer to an axes instance, or a group of axes instances.

* In their simplest form, a figure and axes can be created as follows:

In [None]:
fig = plt.figure()
ax = plt.axes()

## Simple Line Plots

* Perhaps the simplest of all plots is the visualization of a single function $y = f(x)$.
* Let's start with a simple sinusoid:

In [None]:
fig = plt.figure()
ax = plt.axes()

x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));

* Note how reasonable defaults are chosen for the dimensions of the axes,
the tick and the labels.

* The plot is contained in the `fig` object an can be redrawn at any time,
albeit in a different physical space.

In [None]:
fig

* Alternatively, the `pylab` interface can create the figure and axes in the background.

In [None]:
plt.plot(x, np.sin(x))

* If we want to create a single figure with multiple lines, we can simply 
  call the ``plot`` function multiple times:

In [None]:
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))

* The same on the existing `fig` and `ax` objects from above:

In [None]:
ax.plot(x, np.cos(x))
fig

## Adjusting the Plot: Line Colors and Styles

* The ``plt.plot()`` function takes additional arguments that can be used to specify line colors and styles.
* To adjust the color, use the ``color`` keyword, which accepts a string argument specifying a color in a variety of ways.

In [None]:
plt.plot(x, x + 0, color='blue')        # specify color by name
plt.plot(x, x + 1, color='g')           # short color code (rgbcmyk)
plt.plot(x, x + 2, color='0.75')        # Grayscale between 0 and 1
plt.plot(x, x + 3, color='#FFDD44')     # Hex code (RRGGBB from 00 to FF)
plt.plot(x, x + 4, color=(1.0,0.2,0.3)) # RGB tuple, values 0 to 1
plt.plot(x, x + 5, color='chartreuse'); # all HTML color names supported

* If no color is specified, `Matplotlib` will automatically cycle through a set of default colors for multiple lines.

In [None]:
plt.plot(x, x + 0)
plt.plot(x, x + 1)
plt.plot(x, x + 2)
plt.plot(x, x + 3)
plt.plot(x, x + 4)
plt.plot(x, x + 5);

* The line style can be adjusted using the ``linestyle`` keyword:

In [None]:
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');

* For short, you can use the following codes:

In [None]:
plt.plot(x, x + 4, linestyle='-')  # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':');  # dotted

* These ``linestyle`` and ``color`` codes can be combined into a single non-keyword argument:

In [None]:
plt.plot(x, x + 0, '-g')  # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r');  # dotted red

* These single-character color codes reflect the standard abbreviations in the RGB (**R**ed/**G**reen/**B**lue) and CMYK (**C**yan/**M**agenta/**Y**ellow/blac**K**) color systems, commonly used for digital color graphics.

In [None]:
#?plt.plot

## Adjusting the Plot: Axes Limits

* The most basic way to adjust axis limits is to use the ``plt.xlim()`` and ``plt.ylim()`` methods:

In [None]:
plt.plot(x, np.sin(x))

plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);

* If for some reason you'd like either axis to be displayed in reverse, you can simply reverse the order of the arguments:

In [None]:
plt.plot(x, 0.1*(x-5)**2-1)

plt.xlim(10, 0)
plt.ylim(1.2, -1.2);

* A useful related method is ``plt.axis()`` (**axis** with an **i**).
* The ``plt.axis()`` method allows you to set the ``x`` and ``y`` limits with a single call, by passing a list which specifies ``[xmin, xmax, ymin, ymax]``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]);

* The ``plt.axis()`` method goes beyond this, allowing you to do things like automatically tighten the bounds around the current plot:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('tight');

* It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in ``x`` is equal to one unit in ``y``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('equal');

In [None]:
#plt.axis?

## Titles, Labels, Legends

* There are methods that can be used to quickly set the title and axes labels:

In [None]:
plt.plot(x, np.sin(x))
plt.title("A Sine Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");

* The position, size, and style of these labels can be adjusted using optional arguments to the functions ...

* When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type.
* The  ``plt.legend()`` method creates a legend/
* The label of each line can be specified by using the ``label`` keyword of the plot function:

In [None]:
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b', label='cos(x)')
plt.axis('equal')

plt.legend();

* Note how the ``plt.legend()`` function keeps track of the line style and color, and matches these with the correct label.

In [None]:
#plt.legend?

## Simple Scatter Plots

* Instead of points being joined by line segments, in a scatter plot the points are represented individually with a dot, circle, or other shape.

* It turns out that the ``plt.plot``/``ax.plot`` function can produce scatter plots as well:

In [None]:
x = np.linspace(0, 10, 30)
y = np.sin(x)

plt.plot(x, y, 'o', color='black');

* The third argument in the function call is a character that represents the type of symbol used for the plotting.  
* The full list of available symbols can be seen in the documentation of ``plt.plot``,
* Most of the possibilities are fairly intuitive:

In [None]:
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
    plt.plot(rng.rand(5), rng.rand(5), marker,
             label="marker='{0}'".format(marker))
plt.legend()
plt.xlim(0, 1.8);

* These character codes can be used together with line and color codes to plot points along with a line connecting them:

In [None]:
plt.plot(x, y, '-ok');

* Additional keyword arguments to ``plt.plot`` specify a wide range of properties of the lines and markers:

In [None]:
plt.plot(x, y, '-p', color='gray',
         markersize=16, linewidth=4,
         markerfacecolor='lightblue',
         markeredgecolor='g',
         markeredgewidth=1)
plt.ylim(-1.2, 1.2);

In [None]:
#?plt.plot

## Scatter Plots with ``plt.scatter``

* A second, more powerful method for creating scatter plots is the ``plt.scatter`` function:

In [None]:
plt.scatter(x, y, marker='o');

* The primary difference of ``plt.scatter`` from ``plt.plot`` is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.
* In order to better see the overlapping results, we'll also use the ``alpha`` keyword to adjust the transparency level:

In [None]:
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,
            cmap='viridis')
plt.colorbar();  # show color scale

* Notice that the color argument is automatically mapped to a color scale (shown here by the ``colorbar()`` command).
* Also note that the size argument is given in pixels.
* In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.

* For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured:

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T

plt.scatter(features[0], features[1], alpha=0.2,
            s=100*features[3], c=iris.target, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);

* We see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data:
the $(x, y)$ location of each point corresponds to the sepal length and width, the size of the point is related to the petal width, and the color is related to the particular species of flower.

# Histograms and Binnings

* A simple histogram can be a great first step in understanding a dataset.

In [None]:
data = np.random.randn(1000)
plt.hist(data);

The ``hist()`` function has many options to tune both the calculation and the display; 
here's an example of a more customized histogram:

In [None]:
plt.hist(data, bins=30, 
         density=True, 
         alpha=0.5,
         histtype='stepfilled', 
         color='steelblue',
         edgecolor='none');

In [None]:
#?plt.hist

* This combination of ``histtype='stepfilled'`` with some transparency ``alpha`` can be very useful when comparing histograms of several distributions:

In [None]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.3, density=True, bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

* If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the ``np.histogram()`` function is available:

In [None]:
counts, bin_edges = np.histogram(data, bins=5)
print(counts)