## <center>Scientific Programming - 7MRI0020 - 2021/2022</center>


## <center>Week 05 - Scientific Libraries - Part 02</center>


### <center>School of Biomedical Engineering & Imaging Sciences</center>
### <center>King's College London</center>

### Contents
* Matplotlib
  * Graphs, scatter plots, bar plots
  * multiple plots
  * image or surfaces
* Pandas
  * Dataframes
  * DataFrame operations

# Matplotlib

* Very useful plotting library with a few quirks
* We'll go into more depth with some more complicated uses

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

plt.plot(np.sin(np.arange(-np.pi, np.pi, 0.02)) + 1)

* Plotting multiple plots in one image can be done in multiple ways, here's an easy one:

In [None]:
f, ax = plt.subplots(1, 2, figsize=(12, 2))
ax[0].plot(np.arange(-np.pi, np.pi, 0.02))
ax[1].plot(np.cos(np.arange(-np.pi, np.pi, 0.02)) + 1)

* Titles:

In [None]:
f, ax = plt.subplots(1, 2, figsize=(12, 2))

ax[0].plot(np.arange(-np.pi, np.pi, 0.02))
ax[0].set_title("Linear Range")

ax[1].plot(np.cos(np.arange(-np.pi, np.pi, 0.02)) + 1)
ax[1].set_title("Cos + 1")

* `plt.subplots` takes as arguments the number of rows and columns to plot plus other configuration values
* It returns the figure object and the subplot objects
* Each subplot object has methods for plotting, setting titles, axes, etc.

In [None]:
figure, ax = plt.subplots(2, 3, figsize=(12, 4), 
    sharex='col', sharey='row', # share scales
    gridspec_kw={'hspace': 0, 'wspace': 0}) # remove whitespace

ax[0,0].plot(np.cos(np.arange(-np.pi, np.pi, 0.02)**1))
ax[0,1].plot(np.cos(np.arange(-np.pi, np.pi, 0.02)**2))
ax[0,2].plot(np.cos(np.arange(-np.pi, np.pi, 0.02)**3))
ax[1,0].plot(np.sin(np.arange(-np.pi, np.pi, 0.02)**1))
ax[1,1].plot(np.sin(np.arange(-np.pi, np.pi, 0.02)**2))
ax[1,2].plot(np.sin(np.arange(-np.pi, np.pi, 0.02)**3))

* A `Figure` object was explicitly created in the above
* Whenever functions like `plot` are called with the `pyplot` library (shortened to `plt` everywhere here) a hidden `Figure` object is created
* Creating a `Figure` object yourself allows more flexibility in creating complex figures like this one
* Often not necessary however, intent behind `pyplot` is to replicate Matlab facilities as much as possible

* Multiple values can be plotted on the same plot, either with a single call to `plt.plot` or mulitiple calls:

In [None]:
t = np.arange(0, 50)
plt.plot(t, t ** 0.25 - 2, "bs")  # bs == blue squares
plt.plot(t, np.sin(t * 2), "g^")  # g^ == green triangles
plt.show()

* A wide range of controls for how values are plotted, eg. log scale y-axis:

In [None]:
plt.semilogy(t)

* Annotations can also be added

In [None]:
plt.annotate("What's that?!", xy=(10, 10), xytext=(23, 5), 
             arrowprops=dict(facecolor='black', shrink=0.05))
plt.semilogy(t)

* Axis labels, titles, legends:

In [None]:
t = np.arange(0, 50)
plt.plot(t, np.sin(t * 2), "g^", label="Green Triangles")
plt.plot(t, np.cos(t * 2) - 2, "b.", label="Blue Circles")
plt.ylabel("Y Axis")
plt.xlabel("X Axis")
plt.title("Pretty")
plt.grid()
plt.legend()

* The grid lines and their labels can be manipulated as well:

In [None]:
plt.plot(t * t, label="tt")
plt.legend()
plt.grid()
plt.yticks([2000], labels=["HERE"])
plt.xticks([0, 10, 20], labels=["zero", "ten", "twenty"])
plt.show()

* Other types of plots exist for bar graphs, scatter plots, images, contours, and vector drawing
* Style for axes, grids, colors, overlays, etc. is controllable with figure methods and library functions
* Read the documentation!

In [None]:
f, ax = plt.subplots(2, 3, figsize=(12, 4), gridspec_kw={"hspace": 0.2, "wspace": 0.2})

x, y = np.mgrid[:200, :300]  # coordinate grid for 200x300 area
im = np.sin(x * np.pi * 0.01) * np.sin(y * np.pi * 0.01) * x * y
levels = np.arange(im.min(), im.max(), (im.max() - im.min()) / 10)

ax[0, 0].plot(np.cos(np.arange(-np.pi, np.pi, 0.02) ** 1))
ax[0, 1].scatter(np.random.randn(100), np.random.randn(100), np.random.randn(100) * 50)
ax[0, 2].bar(np.arange(10), np.random.randn(10), color=["r", "b"] * 5)
ax[1, 0].imshow(im[::-1])  # images are drawn flipped by default
ax[1, 1].contour(im, levels=levels)
_ = ax[1, 2].hist(im.flat)

* Plotting facilities can be used to draw pictures instead
* This was seen with the tictactoe board example:

In [None]:
board = np.random.randint(0, 3, size=(3, 3))  # random board

fig = plt.figure(figsize=[4, 4])  # create a figure
fig.patch.set_facecolor((1, 1, 0.8))

ax = fig.add_subplot(111)  # set values and title
ax.set_axis_off()
ax.set_title("Tic Tac Toe!")

* Now draw the lines by plotting straight lines
* Markers will be drawn by plotting individual points with different markers
* This can be done with Polygon instead but this is easier

In [None]:
for x in range(4):
    ax.plot([x, x], [0, 3], "k")  # vertical lines
    ax.plot([0, 3], [x, x], "k")  # horizontal lines

fig  # tell jupyter to draw this again

In [None]:
for i, j in np.ndindex(board.shape):
    val = board[i, j]

    if val in (1, 2):
        marker = "x" if val == 1 else "o"
        color = "b" if val == 1 else "r"

        _=ax.plot(0.5 + i, 2.5 - j, marker, markersize=30, markeredgecolor=color, 
                  markerfacecolor=(1, 1, .8), markeredgewidth=10)

fig

* 3D plotting is also possible:

In [None]:
from mpl_toolkits.mplot3d import Axes3D

a = np.arange(0.0, 20.0 * np.pi, 0.01)
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
_ = ax.plot(np.sin(a) * a, np.cos(a) * a, a)

* Images are an important area of matplotlib usage
* Any 2D Numpy array can be plotted as an image
* 3D images are expected to have a RGB third dimension

In [None]:
im = plt.imread("chelsea.png")
print(im.shape, im.min(), im.max())
plt.axis("off")
plt.imshow(im)

* We can just view the red channel:

In [None]:
plt.axis("off")
_ = plt.imshow(im[..., 0])

* With a greyscale color map would be better:

In [None]:
plt.axis("off")
_ = plt.imshow(im[..., 0], cmap="gray")  # note spelling

* Images being just Numpy arrays exposes them to all the math operations we have from the library:

In [None]:
plt.axis("off")
_ = plt.imshow(1 - im[80:160, 120:220])  # negative image

* Drawing multiple pictures requires subfigure plotting:

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 10))
ax[0].axis("off")
ax[0].imshow(1 - im[80:160, 120:220])
ax[1].axis("off")
ax[1].imshow(im[80:160, 120:220])

* It's often better to just stack images together with Numpy:

In [None]:
plt.figure(figsize=(20, 10))  # explicitly create a figure to get the size set
plt.imshow(np.hstack([1 - im[80:160, 120:220], im[80:160, 120:220]]))

# Pandas
* Library for dealing with tabular data in a database-like way
* Provides facilities for manipulating columns, setting types per column, and interfacing with numpy and matplotlib

* A `Series` is a column of data, `DataFrame` a full table
* Tables of numbers can be loaded straight from numpy:

In [None]:
import pandas as pd

dat = pd.DataFrame(np.random.randint(0, 10, size=(6, 6)))

dat

Pandas provides a number of facilities for plotting with matplotlib:

In [None]:
dat.plot.bar()

A number of plots make sense for statistical data in particular:

In [None]:
dat.boxplot()

* Series can be accessed by index:

In [None]:
series = dat[0]
print(type(series), series.name, series.dtype, np.asarray(series))
print(series)

* Rows accessed in different ways, eg. as Series:

In [None]:
row1 = dat.loc[1]
print(type(row1), row1.name, row1.dtype, np.asarray(row1))
row1

* Column and row names can be set:

In [None]:
pd.DataFrame(
    data=np.random.randint(0, 10, size=(5, 4)),
    columns=["one", "two", "three", "four"],
    index=["r0", "r1", "r2", "r3", "r4"],
    dtype=np.float32,
)

* Columns can be provided as objects:

In [None]:
dmap = {
    "one": np.ones((4,), dtype=np.int8), 
    "two": np.ones((4,), dtype=np.float32) * 2
}

pd.DataFrame(dmap)

* Converting to numpy can be done with the `np.asarray` function
* This will convert column types to match as numpy arrays are homogeneous:

In [None]:
np.asarray(pd.DataFrame(dmap))

* Various functions exist for manipulating data in tables, eg. `Series.map`:

In [None]:
# apply map to column 2
dat[2].map(lambda i: i > 3)

* `DataFrame.applymap` can be applied to a whole table:

In [None]:
dat.applymap(lambda i: i > 3)

* Loading from CSV files is common for tabular data, eg. load some stock price information:

In [None]:
df=pd.read_csv("stocks.csv")
df

* A new column can be added to compute the difference between and closing and opening prices for each value:

In [None]:
df["diff"] = df["close"] - df["open"]
df

* Operators can be applied to data frames and series, like Numpy these produces data structures of results rather than scalar values:

In [None]:
df["diff"] >= 0  # days where stocks went up during trading

* Series like this can be used to select rows in a data frame:

In [None]:
df[df["diff"] >= 0]  # filtered data frame with only positive trading days

* `DataFrame.pivot` is used to reorder data by selecting a column as a row index, a column of identifiers (or any relatively small set of values) to be the new columns, and the values to distribute across those columns:

In [None]:
dfp = df.pivot(index="date", columns="symbol", values="diff")
dfp

* Summing the columns gives the total change in price during trading:

In [None]:
dfp.sum()

## What We're Not Covering
* Many more Python libraries out there for specific tasks or scientific areas, such as:
  * Seaborn: even prettier graphs with table data and other features
  * Xarray: N-D labeled arrays and datasets 
  * Sympy: symbolic math, reducing expressions, finding equations solutions as expressions of symbols
  * Scipy: large collection of mathematical functions and utilities for image manipulation, signal processsing, optimization, interpolation, etc.
  * scikit-learn: machine learning models, datasets, utilities
  * Pytorch, Tensorflow: deep learning
  * DASK: parallel computing
  * SimpleITK, Python-ITK: image analysis

# That's it! Questions?

## Next: Exercises

## In two weeks: Data structures and algorithms