# UBC MRI Research Python Workshop 2

## August 22 2017



1. Higher dimension numpy arrays
    * Indexing
    * Slicing
    * Boolean Masks
        * Exercise: Mask one array with another array of the same shape
   
2. Object-oriented programing
    * Writing classes
    * Initializing and manipulating objects
        * Exercise: Create an Image class
        
3. Matplotlib plotting
    * Plotting the object-oriented way
    * Changing plot attributes
    * Subplots
        * Exercise: Plot a 2D image
        
4. Curve fitting
    * Linear transform
        * Exercise: scipy.optimize.curve_fit()
        
5. Pandas
    * Importing and examining dataframes
    * Indexing dataframes
    * Condition indexing
    * Plotting
        * Exercise: Vancouver Open Data Catalogue
    

## Numpy indexing

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

### Slicing summary: (start:stop:step)

### Boolean masks

## Exercise: Mask one array with another array of the same shape
* Create two arrays with random digits
* Find all entries in array 1 where array 2 is larger than N
* Take the mean of the result

Options
* Investigate the different ways to create random arrays in numpy
* Take the threshold N as the 90th percentile of array 2

## Higher dimensional arrays

## Exercise: Take the mean across the 4th dimension (temporal averaging)

## Classes and Objects

Classes are a smart way to organize your code. Instead of looping funtions, define a class to describe subjects, timepoints, events, etc and give the class attributes and methods.

Our first class will describe a subject in our study. Our class will be called "Subject", and it's only attribute will be the subject ID of a given subject.

Let's give the class some information when it's first called

`__init__` is a special method. It is automatically called when the object is created. The first argument to `__init__` is always "self". "self" gives a method access to all the attributes of the object. Any other arguments are passed to the object when it is created, like when you run a subject.

With this simple class definition, we can create subject objects, pass subject IDs on creation, then access the subject ID on demand.

Let's expand the class to add some additional attributes, and a method which modifies those attributes

Finally, let's do some data validation. 

In the constructor, we'll check whether "data" is a list. If not we'll raise an error.

We'll also convert the "date" string into a date object that python understands.

Let's imagine that there was a calibration error for all data collected in 2016, so we need to increase all data values by 1 for dates in 2016 but not in 2017. We can add this to the cleandata() method.

First, let's make a new subject but pass the wrong type of data:

Now let's create two subjects with identical data, but with acquisition dates in different years

## Class inheritence

Classes can inherit from each other. So, you can write a general Subject class that contains all the typical attributes of a research subject, then write a sub-class to customize it for your specific study

The above class inherits everything from the Subject class and adds nothing. We can do better! Let's add an attribute that's a list of scans acquired for this subject.

To do this, we need to modify the __init__ command. If we wanted, we could just write a new definition of __init__; but that would lose the work we did in the base class. Instead, we will define a new __init__ but bring in all the attributes from the base class as well.

Another change we will make is that the new attribut `scans` will be optional. We do this by assigning a default values in the __init__ definition. When this is done, the use can either set the value of scans themself or leave it blank.

## Exercise: Image Object

Create a class that defines a 3D image object. 
* Define a class called something like Image
* Write a method called "generate_image()" or similar that generates a 3D matrix of random values and assigns it as an attribute
* Write a method called "generate_mask()" that generates a 3D matrix of the same size as your first image. The mask should be all zeros except for a region of ones. Your mask can be simple or complex. Assign the mask as an attribute
* Write a method that takes the mean of the image matrix where mask values are 1

Things to think about:
* Which methods should be run automatically, and which should the user call?
* What other methods can we write?

# Plotting with MatPlotLib

There are two ways to interact with MatPlotLib: the scripting interface (pyplot), or the object-oriented interface. Both produce the same results and are useful in different situations. This tutorial will mostly use the object-oriented technique since I like it more, but when looking things up online keep in mind that both exist

There are two main objects in MPL: The figure and the axis. Each figure is a separate image. Each axis contains one or more datasets visualizations. A figure can have any number of axes in it, but each axis belongs to a single figure.

The function `plt.subplots(n)` creates a figure with `n` axes arranged vertically. We'll start with one axis and then make it more complicated.

First, let's invent some data. Let's make 1000 evenly spaced points between 0 and 4$\pi$ on the x axis, and a cosine function as the y data:

Now, make the figure and axes objects and plot the data

So right now we have access to three major objects: The figure (`f`), the axis (`ax`), and the line (`cosline`). We can modify how the plot looks

Let's start again with a new figure with 2 axes. Let's generate some random data for the second axis

Note that `ax` is now an array of axes. We access them with `ax[0]` and `ax[1]`

We can make subplots in different arrangements simply

Or make arrangments more complicated

## Exercise: Plot a 2D array with `matshow()`

* Create a 2D or 3D array
* Draw the 2D array (or a slice of the 3D array) with matshow
* Experiment with changing the properties of the plot

Optional:
* Add additional axes with more information

# Curve Fitting

Let's generate an exponential decay, and add some noise to the data.

Set the "true" values for A and T, then generate some sample data

## Linear transform fit
If we know the data is exponential, it's quickest to transform the data and do a linear fit


$$ 
S = Ae^{{-t}/{T}}
$$
$$
\log{S} = \log{A} - \frac{t}{T} 
$$

Therefore when we plot log of signal vs time, 

$$
slope=-1/T
$$

Generate some new data based on our measured A and T values

## Exercise: curve_fit
Use `scipy.optimize.curve_fit()` to fit the same ydata directly to the exponential decay function that we defined. Plot the fit. Is it better or worse than the linear fit?

## Exercise: Write an image class that can fit a curve across the time dimension

* Write an image generator method that generates a 4D image that contains a exponential decay timeseries along each voxel. Add some noise to make it realistic
* Keep the mask generator method from last time
* Write a method that computes the decay constant in each masked voxel
* Write a method that omputes the mean time constant in masked voxels and assign to to an attribute
* Write a method that plots some orthogonal slices
* Write a method that plots a histogram of the time constant distribution in masked voxels

Some tips:
* Take one step at a time
* Test frequently
* Use google! Anything that seems tricky probably has a simple solution

## Pandas
[pandas](http://pandas.pydata.org/) is the data analysis package in Python. It provides a [DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) object which acts like a spreadsheet. Let's import the package and some data:

The Vancouver Police Deparment publishes crime data through City of Vancouver's Open Data Catalogue. Let's import the data (prepared and posted at math.ubc.ca/~pwalls) using the `pandas.read_csv()` function:

Examine the top few lines to the dataframe

User the `info` method to learn about the columns in the dataframe

Use the DataFrame method unique to see the different types of crimes in the dataset:

Notice that we select columns using brackets and the column name. There are some crimes that do not include the longitude and latitude coordinates due to privacy. Let's do a query and select the rows where the X coordinate is 0:

To access individual cells, we can use Datafram methods `.loc` or `.iloc`

Plot the data

We can save a modified dataframe as CSV (or XLS or...)

## Exercise: Vancouver Open Data

Choose your own dataset from the (Vancouver Open Data catalogue)[http://data.vancouver.ca/datacatalogue/index.htm]. Filter and plot the data.