# Lecture 2: Introduction to fMRI data & data types in Python

## Goals

- **Neuroscience / Neuroimaging concepts**
    - Nifti file format for fMRI data
- **Datascience / Coding concepts**
    - Jupyter notebook basics
    - Review of data types and arrays
    - Lists and Tuples
    - Arrays vs. tables    
    - Operations with arrays
    - Loading and saving data
    - Memory management

## Key Terms and Concepts

`command/edit mode`

`int`, `float`, `string`, `boolean`

`list`, `tuple`, `array` 

`mutable/immutable`

`dtype`, `shape`

`transpose`

`np.save`, `np.load`, `nibabel.load`, `nibabel.save`

`memory maps`

# A few words about Jupyter Notebooks

Since we'll be using Jupyter notebooks for the entire course, it is good to become familiar with them as quickly as possible.

You can read a notebook just like any other website, but you can also interact with it. To interact with it, you select one (or more) cells to act on. When a cell is selected, there are two modes the notebook can be in. The first is `command` mode, and the second is `edit` mode.

To get into `command` mode, press `Esc`. You know you're in `command` mode because the box outlining the current cell is **BLUE**. From here, press `h` to see a list of all the possible commands. From this mode you can more easily create, modify and move cells. You probably will not use this as much. 
> Try this now in the cell below! 

To get into `edit` mode, press `Enter`. You know you're in `edit` mode because the box outlining the current cell is **GREEN**. From here you can edit code and text, and run your code. You will be in this mode most of the time.
> Try this now in the cell below!

You are currently reading text that is not code and will have observed that different formats and font sizes have been used. This cell is written in **Markdown**.
> Try selecting this cell and going into edit mode (press `Enter`, green frame). Then press `Shift + Enter` to render the Markdown again

**Tab-completion**: To make your life easier, tab-completion is a feature of notebooks that will try and complete a **name** or **function** that you are typing for you. Once you start typing, you can press `Tab` and the notebook will create a little textbox that has all the possible options that the notebook knows about. This makes it faster to write code.
> Try this out in the cell below!

In [193]:
# click into this cell (it will be green), then press ESC to enter command mode (turns blue).
# Then press ENTER for edit mode (green)

In [194]:
a_name = 'Hello!'

After evaluating the above cell with `Shift + Enter`, the current cell should be blue
Press `a` to create an empty cell above this one. Then go into *edit mode* with `Enter`

Then start typing the name created above by pressing: `a` then `_`, and then press the `Tab` key to tab-complete the name. Next type "im" and then press the Tab key. What happens?

### Comments
An important, and often overlooked, part of writing code is including useful English (or other human language) **comments**. They can be used for many reasons, although most commonly they are used to explain in plain human language what the code below the comment does. They can also be used to write reminders that you still need to work on some part of your code.

To write a comment in Python, you simply put a `#` (pronounced hashtag, duh!) before any human language comment you want to write, like this:

In [3]:
# Here's a simple comment

You can even put comments after a line of code, like this:

In [7]:
another_name = 'Comment Practice' # This is another comment to show you they can go after code too!

### Displaying content and the print function

Before we begin, let's review some ways to see what **names** contain.
1. If you write a name at the end of a cell, some content representing it will be displayed
2. If you need to perform more complex display of text, the print function is appropriate. We'll use it throughout this class to print values out. We can use it to print simple things.

In [9]:
# jupyter notebooks display the output of the last operation in a cell
a_name

'Hello!'

In [10]:
# The print function displays the input string
print("We're finally writing some code!")

We're finally writing some code!


In [11]:
print("We're finally writing some code!")
a_name # content of a_name also displays, because it's the last operation. What is the difference?

We're finally writing some code!


'Hello!'

In [13]:
a_name # content of a_name is not displayed
print("We're finally writing some code!")

We're finally writing some code!


We can also use it to print out **names**

In [14]:
my_first_name = 'Michael'
print(my_first_name)

Michael



The print function is powerful and can be used to combine simple strings with **names**. To do this, just put commas in between the simple strings and the **names**. Note that Python will put a space after each comma.

In [16]:
my_language = 'Python'
print('My name is', my_first_name, 'and I can program in', my_language)

My name is Michael and I can program in Python


# Review of Data Types in Python

So far in the data8 lecture you've learned that Python stores data in several different ways, known as **Data Types**. You've learned about 4 basic data types: 
- Integers
- Floats
- Strings 
- Booleans

plus 2 kinds of collections: 
- Arrays 
- Tables

which are also data types. Here we will do a quick review of all of these data types. We'll also cover 2 new types of containers:

- Lists
- Tuples

You also learned that data can be stored into **names**, so that they can be referenced later. Names are crucial to programming, and we will be using them extensively throughout this class.

## Basic Data Types

### Integers

Integers are a data type that stores whole numbers. They can be positive, negative or zero. 

They are often used to count things or **index** into arrays, which we'll learn about in a little bit.

In [27]:
5 # just writing 5 in a cell leads to output 5

5

In [29]:
type(5) # What is the type of 5? Answer: int (That is what Python calls Integers)

int

In [30]:
five = 5 # Store 5 in the name five

In [31]:
five_plus_neg_five = five + -5 # store the result of five + -5 in the name five_plus_neg_five
print(five_plus_neg_five)

0


In [21]:
type(five_plus_neg_five)

int

In [32]:
# While adding integers up leads to new integers, dividing by them gives something else!
5 / 5

1.0

In [33]:
type(5 / 5)

float

In [34]:
7 / 3

2.3333333333333335

In [35]:
# Because most people intuitively expect 7/3 to yield 2.3333333 (representing 2 + 1/3)
# and not the rounded-down integer version, and this led to many unexpected and hard-to-find
# errors, Python now outputs floats after integer division.
# To obtain integer division, you need to put two slashes:
7 // 3

2

### Floats

Floats, or floating point numbers, are a data type that stores numbers with decimal points. They can store up to 15 or 16 numbers after the decimal point, also known as **significant digits**.

While they are very precise, they are not perfect. Sometimes rounding erors do occur when using numbers with many significant digits.

In [195]:
pi = 3.141592653589
pi

3.141592653589

In [37]:
type(pi)

float

**Functions** can be **called** by using the function name, followed by paranthesis with the **argument** in between those paranthesis. Here we'll find the absolute value using the abs() function.

In [38]:
abs(-543.234)

543.234

In [39]:
type(abs(-543.234))

float

Mathmatical operations that use a float and an int will always result in a float

In [40]:
five_plus_neg_five * pi

0.0

In [41]:
type(five_plus_neg_five * pi)

float

#### Breakout Session

Let's practice making some integer and float **names**.
- Create a **name** that is the sum of 8, 16 and 40 and name it six.
- Now take the base 2 logarithm of this **name**, using the `math.log()` function. Note that this is a function from a new module, so you'll need to import it using `import math`. What is this new value?
- What type is this log value?
- Now divide it by 4. What type is this new value?


In [42]:
### STUDENT ANSWER



6.0
<class 'float'>
<class 'float'>


### Strings

Strings are a data type that store text in the form of a sequence of letters, numbers and punctuation symbols.

In [43]:
"This is a string. Duh!"

'This is a string. Duh!'

In [44]:
class_name = 'Data Science for Cognitive Neuroscience'

In [45]:
type(class_name)

str

In [48]:
five_string = str(five)

In [49]:
five_string, five

('5', 5)

In [50]:
type(five_string)

str

#### Formatting Strings

There are two ways to build text that include the current values of **names**. This can be useful when you want to check the value of a **name**, creating a customized filename, or label figure axes dynamically, for example. 

The first way is done using the `%` symbol: 

In [51]:
'This is the first way to make a formatted string that prints our class name here: %s' % (class_name)

'This is the first way to make a formatted string that prints our class name here: Data Science for Cognitive Neuroscience'

We see a string defined that has a special character, the `%` symbol in it, followed by the letter `s`. The `%` character tells Python you want to replace the  `%` with the value of a **name**, and the `s` indicates the variable will be a `string`. There are other letters you can use to insert an integer (`i`), float (`f`), or boolean(`b`). After that first string, there is another `%` symbol, followed by the **name(s)** you want to insert into the string. Those **name(s)** are surrounded by parentheses.

The second way is using the `format()` function: 

In [52]:
'This is the second way to make a formatted string that prints our class name here: {0}'.format(class_name)

'This is the second way to make a formatted string that prints our class name here: Data Science for Cognitive Neuroscience'

Here, we use the `{}` (called *curly brackets*) to indicate where we want the **name(s)** inserted. We put the variable number that should be inserted in between the curly brackets, starting with zero. We follow this with a `.` (period) and the `format()` function. The parameters to the `format(*)` function are the **names** to insert.

Some examples of formatting numbers:

In [62]:
print("If we divide 7 by 3, we get %f" % (7 / 3))
print("If we divide 8 by 3, we get %1.10f" % (8 / 3)) # We can say how many decimals we'd like
print("If we divide 8 by 3, we get %i" % (8 / 3)) # If we tell it to print an integer, it will round down



If we divide 7 by 3, we get 2.333333
If we divide 8 by 3, we get 2.6666666667
If we divide 8 by 3, we get 2


### Booleans

Booleans are a data type that store a simple `True` or `False` value. They are used when doing **comparisons** and in **conditional** (if) statements which you'll learn more about in a couple of weeks. 

In [63]:
False

False

In [64]:
type(False)

bool

In [65]:
pi_is_gt_3 = pi > 3
print(pi_is_gt_3)

True


In [67]:
type(pi_is_gt_3)

bool

#### Breakout Session

Create some Boolean and String **names**, and do some string formatting.
- Use the `==` operator to see if the integer value 5 equals the `five_string` name we created above. Store it in a **name** called `five_string_equals_5`
- Convert the boolean **name** `five_string_equals_5` to a string.
- Format a string that displays the values of the two **names**: `five` and `five_string`. Use both ways of formatting a string to do this.
- Do the two **names** look the same when formatted in a string? Why or why not?

In [196]:
### STUDENT ANSWER


False
five is: 5 and five_string is: 5
five is: 5 and five_string is: 5


## Containers
Python has several built-in ways of representing collections of things. Two of these are `list`s and `tuple`s. Both of them are **sequences**: they store the values they contain in a specific order. 

### Lists
You can make a list out of any number of Python items by simply putting square brackets `[`, `]` around them and separating them with a comma. This creates a list. Lists have two special properties that differentiate them from the other types of containers we will use in this class.
1. Lists may contain objects that are of different types. 
2. Objects can be added or, or removed from, a list. Another way of saying this is that lists are **mutable**.

Although lists can contain different types of objects, they don't have to. Let's create a list of integers:

In [202]:
l1 = [1, 2, 3, 4, 5]
l1

[1, 2, 3, 4, 5]

Now let's create a list of objects with different types

In [203]:
l2 = [5, 'hello', five, five_string, pi]
l2

[5, 'hello', 5, '5', 3.141592653589]

You can append things to lists by calling the `.append` method with the new item:

In [204]:
l2.append(100)
l2

[5, 'hello', 5, '5', 3.141592653589, 100]

You can remove objects from a list by calling the `.remove()` method. This will remove the first appearance of the object you pass in.

In [205]:
l2.remove(5)
l2

['hello', 5, '5', 3.141592653589, 100]

### Tuples

Tuples are very much like lists in that they can store an ordered collection of objects of different types. In contrast, once a tuple has been created, you cannot change it (meaning add objects to or remove objects from it). Thus, tuples are called **immutable**, or not changeable.

Tuples use round brackets (parentheses `()`) instead of the square brackets that lists use. 

Let's make a tuple of different types

In [206]:
t  = (5, 'hello', five, five_string, pi)
t

(5, 'hello', 5, '5', 3.141592653589)

Now let's add another object to this tuple

In [207]:
t.append(100)

AttributeError: 'tuple' object has no attribute 'append'

**Oops! We can't append anything to a tuple because it is immutable!**

#### Breakout Session

* Create a list that contains the numbers 1 through 10 in order, and call it `list_ten`. Print the list.
* Now `append` the string `eleven` to the list. Print the list.
* Create a a tuple that contains your first name, last name, birthday month, birthday day (of the month), a boolean indicating whether you're a senior, and the number pi to as many significant digits as you know, and the `list_ten` name you just created. Call this `tuple_stuff` and print it.
* Now try to `append` the word `'NOPE'` to this tuple and make sure it does what you expect.

In [208]:
list_ten = [1,2,3,4,5,6,7,8,9,10]
print(list_ten)
list_ten.append('eleven')
print(list_ten)
tuple_stuff = ('Samy', 'Abdel-Ghaffar', 3, 30, False, 3.141592653589, list_ten)
print(tuple_stuff)
tuple_stuff.append('NOPE')

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'eleven']
('Samy', 'Abdel-Ghaffar', 3, 30, False, 3.141592653589, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'eleven'])


AttributeError: 'tuple' object has no attribute 'append'

### Arrays

Arrays are a data type that is a **collection** of objects of the same type. They can contain integers, floats, strings and booleans, among other data types. Arrays can be 1-dimensional (1-D), 2-dimesional (2-D) and 3-dimensional (3-D). In fact, they can have any number of dimensions you like, which is usually called N-dimensional. 

#### Arrays vs. Lists (or Tuples)
Although lists, tuples and arrays are all containers that can contain a collection of objects, there are some big differences between how they work and when you should use them. Here's a short list:

1. Contrary to lists and tuples, all of the values in an array are of the **same type**. 
2. Because arrays are of this same type, they are specially suited to mathematical operations, while lists and tuples are not. For example, you can add 1 to all the values of an array, but you cannot to a list or tuple.
3. Lists and tuples are always 1-Dimensional (although you can have a list of lists, for example), while arrays can have as many dimensions as you like.
4. Lists and arrays are both **mutable**, while tuples are **immutable**. 

#### Importing Python Modules

You have learned how to create arrays using the `make_array()` function in the data-8 lecture. This function creates 1-D arrays using the values you pass into it as arguments. While this is a very useful function, working with fMRI data requires the use of 2-D, 3-D and even 4-D arrays, so we will need to learn another way to create arrays. For this we will use the `numpy` module, which we have to `import`. You will about this module in data8, too.

In [209]:
import numpy as np

Remember from above: The &lt;TAB&gt; key can give you suggestions for how to complete a word you may have started. This is particularly useful for browsing through the functionality of a package.

Try typing `np.` and hitting the TAB key!

In [210]:
np.

SyntaxError: invalid syntax (<ipython-input-210-df0eca0bfa5c>, line 1)

If you want to find out more about a specific function, you can pull up a **help text** (the python community refers to these as docstrings) or even its code!

Just add a question mark (?) to the end of the function to obtain the docstring, which should give you enough instructions to be able to work with the function.

If you add two question marks (??), then ipython will show you the actual python code of that function. 

In [127]:
np.arange?

In [128]:
np.sum??

### Creating 1-D arrays

Perhaps the simplest way to create a 1-D array in `numpy` is to use the `np.arange()` **function**. As you've seen in lecture, it creates a range of numbers, specified by a starting point, ending point, and increment value.

In [211]:
years_since_millenium = np.arange(2000, 2018)
print(years_since_millenium)

[2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
 2015 2016 2017]


Uh-oh, where's 2018?

In [212]:
years_since_millenium = np.arange(2000, 2019)
print(years_since_millenium)

[2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
 2015 2016 2017 2018]


There we go. With `np.arange()`, the end point you specify is the one past the last item you want included. This is because Python is **zero-indexed**, meaning that it starts counting from 0 instead of 1.

Adding a third argument gives us the possibility of stepping in increments other than 1! Let's get every other year since the new millenium.

In [213]:
everyother_year_since_millenium = np.arange(2000, 2019, 2)
print(everyother_year_since_millenium)

[2000 2002 2004 2006 2008 2010 2012 2014 2016 2018]


But what if you want a range of numbers that has 5 values in it? Then you would use:

In [214]:
fives = np.arange(5)
print(fives)

[0 1 2 3 4]


While this is probably confusing when first learning Python, it makes a lot of sense once you get used to it.

Arrays can also be made from an arbitrary sequence of values, using a **list** or **tuple**:

In [215]:
some_values = np.array([8, 20, 14, 78, 34])
print(some_values)

[ 8 20 14 78 34]


There are also some functions to help create special kinds of arrays containing all zeros or ones.

In [216]:
ten_zeros = np.zeros(10)
ten_zeros

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [217]:
ten_ones = np.ones(10)
print(ten_ones)

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]


What happens if we do not explicitly specify the data type? A default is selected depending on the function. Let's check the data type of the ten_ones variable

In [218]:
ten_ones.dtype

dtype('float64')

Notice that the data type for this array of ones is a float! We can be explicit about the type we want by using 'dtype' when we call `np.zeros`.

In [219]:
ten_zeros_int = np.zeros(10, dtype='int')
ten_zeros_int

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Alhough we can use `dtype` with `np.arange`, there is a shortcut to specify the type we want. If we give `np.arange` a `float`, it will return a sequence of floats, and if we give it an `int`, it will return a sequence of ints.

In [220]:
ten_range = np.arange(10)
ten_range.dtype

dtype('int64')

In [221]:
ten_range_f = np.arange(10.)
ten_range_f.dtype

dtype('float64')

#### Creating 2-D Arrays

Making 2-D arrays can be done by combining several 1-D arrays together. Think of it as copying multiple rows of a spreadsheet together:

In [222]:
# Make the first row
row1 = np.arange(1,10,2)
print('The first row looks like:\n', row1)

# Make the second row
row2 = np.arange(2,11,2)
print('The second row looks like:\n', row1)

# Combine the two rows into a 2-D array
array_2D = np.array([row1, row2])
print('And the whole 2-D array looks like:\n', array_2D)

The first row looks like:
 [1 3 5 7 9]
The second row looks like:
 [1 3 5 7 9]
And the whole 2-D array looks like:
 [[ 1  3  5  7  9]
 [ 2  4  6  8 10]]


Note the `\n` in the `print()` statements? The `\` (pronounced backslash) is called an **escape** character, and tells the print function that the symbol after it has a special meaning. In this case, the `\n` tells Python to add a **newline** when printing, so that the array's data is not on the same line as the text.

We can also ask `numpy` to create arrays filled with random numbers, all ones, or all zeros:

In [223]:
random_vals_2D = np.random.randn(3, 5)
print(random_vals_2D)

zero_vals_2D = np.zeros((3, 5))
print(zero_vals_2D)

ones_vals_2D = np.ones((3, 5))
print(ones_vals_2D)

[[ 0.07844968  0.79106274  1.47904654  1.61977697 -0.7409853 ]
 [-0.79925064  1.03410983 -0.26874792 -1.47345682  0.46596931]
 [-1.78348283  0.01959603  1.48292404  0.72898018 -0.7408096 ]]
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]


**Note:** The `np.random.randn()` function's parameters are the size of the dimension(s) that you want, while the `np.zeros()` and `np.ones()` functions needa **tuple** which specifies the shape of the array to create. That's why there are two paranthesis `((` and `))` for those.

It is often important to know the shape of the N-D arrays you are working with. Let's print the shape of the array, which tells us how many dimensions the array has, and how big each of those dimensions are (e.g. number of rows and columns in a 2-D array):

In [224]:
print("This array has shape {}.".format(array_2D.shape))

This array has shape (2, 5).


The first value shown in this **tuple** is the number of rows, and the second value is the number of columns. 

#### Creating 3-D Arrays

Creating 3-D arrays using the `np.random()`, `np.zeros()` and `np.ones()` functions is very similar to how we did it for 2-D arrays. We only need to add one more number indicating the size of the 3rd dimensions:

In [225]:
random_vals_3D = np.random.randn(2, 4, 3)
print(random_vals_3D)

zero_vals_3D = np.zeros((2, 4, 3))
print(zero_vals_3D)

ones_vals_3D = np.ones((2, 4, 3))
print(ones_vals_3D)

[[[-0.48200246  0.64284256 -2.15410511]
  [-0.18652133  0.50110257  0.86426513]
  [-0.33539978  0.29995554  0.01905695]
  [ 1.44858276  0.66450506 -1.54264595]]

 [[ 0.59569081 -1.20102947 -1.631401  ]
  [-0.90307751 -0.34718475 -0.72045798]
  [-0.63070443  2.15802752 -0.03875022]
  [ 0.51336359 -1.59044582  0.03439368]]]
[[[ 0.  0.  0.]
  [ 0.  0.  0.]
  [ 0.  0.  0.]
  [ 0.  0.  0.]]

 [[ 0.  0.  0.]
  [ 0.  0.  0.]
  [ 0.  0.  0.]
  [ 0.  0.  0.]]]
[[[ 1.  1.  1.]
  [ 1.  1.  1.]
  [ 1.  1.  1.]
  [ 1.  1.  1.]]

 [[ 1.  1.  1.]
  [ 1.  1.  1.]
  [ 1.  1.  1.]
  [ 1.  1.  1.]]]


### Arrays vs. Tables

At this point you've learned about `Tables` in lecture class, and may be asking yourself: 'What is the difference between 2-D Arrays and Tables?' As you might expect, there are some similarities and some differences. We will outline some of the main similarities and differences below.

**Similarities**
- They are both collections of data that are stored in a 2-D spreadsheet like way, with rows and columns.
- Operations like sorting and selecting can be done on both (although the syntax is different).
- Both can have named columns (although generally N-D Arrays' columns are not named, as will be the case in this class).

**Differences**
- Tables can have columns that contain data of different type (e.g. column 1 is floats and column 2 is strings), while N-D Arrays have the same data type everywhere.
- Tables have advanced functions for selecting subsets of your data, where N-D Arrays have a more simple way to do that.
- Tables are useful when you have a group of variables that you've measured across a number of observations (or subjects, or trials, etc.). 
- Since N-D Arrays can have more than 2 dimensions, they are useful in representing higher dimensional data, such as fMRI brain data and time-series astro-physics data, among many others.

So you can see that even the similarities have some caveats which make them kinda differences!

>**We will be using N-D Arrays (and not Tables) throughout this class.**

###  Breakout session

Now you'll practice creating some arrays. Print out the values of each one you create below.

- Create a 1-D range called `A` that contains **even** numbers between 10 and 20, **inclusive** (meaning both 10 and 20 should be in the array)
- Create a 2-D Array called `B` with 4 rows and 2 colums. You can use whatever approach we've outlined so far.
- Create a 3-D array of zeros called `C` that has dimensions of 2 x 3 x 4. Make it be of data type `float32`.

In [70]:
### STUDENT ANSWER


[10 12 14 16 18 20]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]]


### Array Operations

One of the advantages to working with arrays is that we can do mathematical and logical operations on all the values in an array in one line of code. Many of the statistical techniques we'll be using in this class need to do math on all the data at the same time, so being able to do it in one command is very useful!

Let's start by looking at simple arithmetic on a 1-D array:

In [226]:
math_ops_array = np.arange(5)
print('The original array looks like this:\n', math_ops_array)

The original array looks like this:
 [0 1 2 3 4]


Add 5 to all the values in the array

In [227]:
math_ops_array_plus5 = math_ops_array + 5
print('But if we add 5 to it:\n', math_ops_array_plus5)

But if we add 5 to it:
 [5 6 7 8 9]


Remember that we said arrays are useful for mathematical operations, but not lists? Let's see what happens if we try to do the same thing using a list and not an array.

In [228]:
math_ops_list = [1,2,3,4,5]
print(math_ops_list)
math_ops_list_plus5 = math_ops_list + 5
print('But if we add 5 to it:\n', math_ops_array_plus5)

[1, 2, 3, 4, 5]


TypeError: can only concatenate list (not "int") to list

Now let's multiply all the values of `math_ops_array` by 3:

In [230]:
math_ops_array_times3 = math_ops_array * 3
print('Or multiply it by 3:\n', math_ops_array_times3)

Or multiply it by 3:
 [ 0  3  6  9 12]


Or raise all the values to the power 2:

In [231]:
math_ops_array_power2 = math_ops_array ** 2
print('Or raise all the values to the power 2:\n', math_ops_array_power2)

Or raise all the values to the power 2:
 [ 0  1  4  9 16]


Or we can do these operations on two arrays together. Let's create a second array to work with:

In [232]:
math_ops_array2 = np.arange(6, 11)
print('The second array looks like this:\n', math_ops_array2)

The second array looks like this:
 [ 6  7  8  9 10]


When doing operations on two arrays, the operation is applied **element-wise**. Element-wise operations done on each corresponding pair of items in the two arrays. For example, if we subtract two arrays, then the first items in both arrays are subtracted and put into a new array, and so on for all corresponding items in the two arrays.

**This implies that the two arrays must be the same shape.**

Let's subtract the second array from the first one:

In [233]:
math_ops_array_1minus2 = math_ops_array - math_ops_array2
print('The second array minus the first:', math_ops_array_1minus2)

The second array minus the first: [-6 -6 -6 -6 -6]


Divide the first array by the second

In [234]:
math_ops_array_divide = math_ops_array / math_ops_array2
print('And the first divided by the second:', math_ops_array_divide)

And the first divided by the second: [ 0.          0.14285714  0.25        0.33333333  0.4       ]


So what happens if the two arrays are not the same size?

In [235]:
math_ops_array3 = np.arange(4)
print(math_ops_array)
print(math_ops_array3)
math_ops_array_1minus3 = math_ops_array - math_ops_array3
print('The second array minus the first:', math_ops_array_1minus3)

[0 1 2 3 4]
[0 1 2 3]


ValueError: operands could not be broadcast together with shapes (5,) (4,) 

###  Breakout session

Now we practice using simple mathematical operations on arrays.

- Create a new array that is 10 times the square root of `math_ops_array`. Hint: use `np.sqrt()` to find the square root. Call it `math_ops_array4`.
- Create a new array that is a range from 10 to 20 **exclusive** (meaning it doesn't contain 20) and call it `math_ops_array5`.
- Add `math_ops_array` to `math_ops_array5` and call the result `math_ops_array6`
- Finally, subtract 5 times `math_ops_array` from `math_ops_array2` divided by 3 and store it in a name called `math_ops_array7`.

In [237]:
### STUDENT ANSWER



[  0.          10.          14.14213562  17.32050808  20.        ]
[10 11 12 13 14 15 16 17 18 19]
[  2.          -2.66666667  -7.33333333 -12.         -16.66666667]


**NOTE** Order of operations matter!! What are they again? Your aunt Sally can tell you...

P - Parenthesis - Please

E - Exponents - Excuse

M - Multiplication - My

D - Division - Dear

A - Addition - Aunt
 
S - Subtraction - Sally

# Loading and Saving Data


In order to store, share and make analyses reproducible, it is necessary to be able to save and load data to files for permanent storage. 

We will be looking at two main types of data storage. each used for different types of data:

1. `numpy` provides many functions for saving and loading N-D Arrays. We will use two of these functions to save and load non-fMRI data arrays.

2. fMRI data is generally stored in a specific format which we can read using the package `nibabel`.

### Loading Arrays from File

`numpy` uses the `.npy` file extension to save N-D Arrays. The `np.load()` function takes a filename and reads in the N-D Array stored in that file. Let's load a sample file that is stored in a shared folder on your datahub account. This file contains an array which indicates the types of stimuli used during an fMRI experiment.

In [245]:
# Download some data
!(mkdir -p /home/jovyan/tmp && wget https://berkeley.box.com/shared/static/6y3h0bk2fvfdlelvcq1y0mnqzdwl7rrm.npy /home/jovyan/tmp/experiment_design_run1.npy)

--2018-01-23 22:17:36--  https://berkeley.box.com/shared/static/6y3h0bk2fvfdlelvcq1y0mnqzdwl7rrm.npy
Resolving berkeley.box.com (berkeley.box.com)... 107.152.24.197, 107.152.25.197
Connecting to berkeley.box.com (berkeley.box.com)|107.152.24.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://berkeley.app.box.com/shared/static/6y3h0bk2fvfdlelvcq1y0mnqzdwl7rrm.npy [following]
--2018-01-23 22:17:37--  https://berkeley.app.box.com/shared/static/6y3h0bk2fvfdlelvcq1y0mnqzdwl7rrm.npy
Resolving berkeley.app.box.com (berkeley.app.box.com)... 107.152.25.199
Connecting to berkeley.app.box.com (berkeley.app.box.com)|107.152.25.199|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://public.boxcloud.com/d/1/gher6gth_Zz6nhDye-ZQTZvVgB-A0zzEilvbmbbbs3uJgy0LlCoVdufNGP_y_WFIvWKZd-n_Di-DKcoA1-da1GnyFy-x4dyuPqZZbqFTyggXIBBgHIhjd7wC9Jwm7HQcCv0TwQa5GeOBp6OEPktLccVofKEbvqd6pDBuVbB18HnVszS3HfsFNdp9O6ySvgIPpSTkAmlN8-Tm4VItc3

In [243]:
# Load the fMRI data from a numpy array file
fname_in = '/home/jovyan/experiment_design_run1.npy'
run1_stimuli_data = np.load(fname_in)
print(run1_stimuli_data.dtype)

float64


The first thing to look at whenever you load N-D Array data is its shape, which tells us the number of dimensions and size of each one. How many dimensions does this data array have?

In [244]:
# Print out the size of the data array we've loaded
print('The stimuli of this run is shaped: ', run1_stimuli_data.shape)

The stimuli of this run is shaped:  (120, 5)


How is this shape interpreted? The first axis indicates there are 120 TRs (or volumes) in this run. The second axis indicates there were 5 types of stimulus shown. So this array uses a 1 to indicate whether a given stimulus type was shown at a given TR.

### Saving Arrays to File
Now that we've extracted the first run's stimuli array, let's save it out to permanent file storage using the `np.save()` function.

In [246]:
# save out the float data
fname_out = '/home/jovyan/experiment_design_run1.npy'
np.save(fname_out, run1_stimuli_data)

>Now check your datahub file list to see if the save worked!

## Loading fMRI Data


In general, fMRI data is not stored as numpy arrays. As a matter of fact, there are many different file formats in which fMRI data can be saved. The scanner often saves each volume as a so-called DICOM file. Usually in a first step of fMRI data analysis, these DICOM files are collected together into so-called NIFTI files. Fortunately, there exists a python package  called `nibabel` which can read many neuroimaging file formats. Let's import the `nibabel` module here.

In [247]:
import nibabel

To read in a nifti file, we first use the `nibabel.load()` function, giving it the filename of the file to load. 

In [249]:
fname = '/data/cogneuro/fMRI/categories/s01_categories_01.nii.gz'
nii = nibabel.load(fname)

This object stores the infomation *about* the fMRI data stored in the file, but not the actual fMRI data itself. 
This meta-data can be accessed via attributes of the `nii` object.

In [250]:
print('nii.in_memory :', nii.in_memory)
print('nii.shape :', nii.shape)
print('voxel sizes :', nii.header.get_zooms())

nii.in_memory : False
nii.shape : (100, 100, 30, 120)
voxel sizes : (2.24, 2.24, 4.1300001, 1.0)


As we expected, the data is not yet in memory. To load the data into memory, we use a second function called `.get_data()`, this time on the **name** returned by the `load()` function.

In [251]:
data = nii.get_data()
print('data.dtype: ', data.dtype)
print('data.shape: ', data.shape)

data.dtype:  float32
data.shape:  (100, 100, 30, 120)


As we said above, the first thing to do when loading data is to look at it's shape. For this data, we happen to know (since we collected the data ourselves!) that the MRI scanner saves the data such that it's dimensions are (X, Y, Z, T), where: X, Y, and Z are the three dimensions in space, and T is time, in volumes. Thus, there are 120 volumes (120 time points). Each volume has 30 horizontal (AKA axial) slices with 100 x 100 pixels.

<img src="figures/slices.png" style="height: 200px;">

In [57]:
print('Each volume has {0} entries on the X axis, {1} on Y, {2} on Z. There are {3} volumes.'.format(
        data.shape[0], data.shape[1], data.shape[2], data.shape[3]))

Each volume has 100 entries on the X axis, 100 on Y, 30 on Z. There are 120 volumes.


### Transposing 4D fMRI data for convenience

When we work with fMRI data, it is in general more convenient to have the data in (T, Z, Y, X) format. The reasons why this convention is more convenient (like easier syntax and shortcuts while averaging over time and transfering data to a standard units, for example) will become more obvious as we go. 

Fortunately there is a simple command that will do this transformation for us. It is called **transpose**. Before we apply the transpose to fMRI data, let's learn exactly what it does on some sample data.

#### How to Transpose Arrays

The function to tranpose a `numpy` array is `transpose()`. However, since taking the transpose of an array is a very common operation in `numpy`, there is a shortcut to do it! Simply add `.T` after the **name** of your array:

First, let's remind ourselves what's in the sample 2-D array we created earlier, and what it's shape is.

In [65]:
print(array_2D)
print("This array has shape {}.".format(array_2D.shape))

[[ 1  3  5  7  9]
 [ 2  4  6  8 10]]
This array has shape (2, 5).


Now let's take the transpose of that array, store it in a new name, and see what it's new shape is.

In [66]:
array_2D_T = array_2D.T
print(array_2D_T)
print("This transposed array has shape {}.".format(array_2D_T.shape))

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]
This transposed array has shape (5, 2).


So what happened when you transposed this 2-D array?

And what do you think would happend if you transpose a 3-D array? Let's again remind ourselves of our 3-D array

In [None]:
print(random_vals_3D)
print("This array has shape {}.".format(random_vals_3D.shape))

And now tranpose that 3-D array.

In [68]:
random_vals_3D_T = random_vals_3D.T
print(random_vals_3D_T)
print("This transposed array has shape {}.".format(random_vals_3D_T.shape))

[[[ 0.76699569  0.42237766]
  [-1.89359676 -0.60820399]
  [-1.09358981  0.52084066]
  [ 1.26547644 -2.07877421]]

 [[-0.26987398  0.13515866]
  [-2.15653545  1.06297985]
  [-1.33739895 -1.29031273]
  [ 0.31060741  0.73958927]]

 [[ 0.53433737  0.48883929]
  [-0.69236487  0.02864403]
  [-0.38852138 -1.28465898]
  [ 1.19655109  1.08904781]]]
This transposed array has shape (3, 4, 2).


What we've seen is that the **tranpose** operator inverts the order of the dimensions of an array (e.g. the last dimension becomes the first dimension).

#### Transpose the fMRI Data 
Our aim in learning how to transpose an array is to transform fMRI data into a move convenient format, which we can do using `.T`. Remember that the shape of the axes of the fMRI data we loaded is currently ordered (X, Y, Z, T) and so the ordering we want is (T, Z, Y, X).

#### Breakout Session

Now let's transpose the fMRI data we've loaded and explore it's shape.

- Create the transposed matrix of the 4-D fMRI data array that we loaded (`data`). Call it `data_T`.
- Print the dimensions of `data`
- What do you think the dimensions of `data_T` will be?
- Print out the dimensions of `dataT`.

In [90]:
### STUDENT ANSWER
data_T = data.T
print('The shape of the data is: {0}'.format(data.shape))

# I think the dimensions will be (120,30,100,100)

print('The shape of the dataT is: {0}'.format(data_T.shape))
print('There are {0} volumes. Each volume has {1} entries on the Z axis, {2} on Y, {3} on X. '.format(
        data.shape[0], data.shape[1], data.shape[2], data.shape[3]))

The shape of the data is: (100, 100, 30, 120)
The shape of the dataT is: (120, 30, 100, 100)
There are 100 volumes. Each volume has 100 entries on the Z axis, 30 on Y, 120 on X. 


# Memory management [Optional - Time Permitting]

A big difficulty in data science in general is how to deal with large data sets. Let's look at a couple simple ways of reducing the amount of memory (RAM) that is used when dealing with large arrays.

### Python digression: Floating point vs integer numbers

`numpy` stores numbers in several different formats: numbers can be stored as boolean values (True or False); as integers (0, 1, 2...) or as floating-point numbers (2.3256..., 3.63212..., etc). This is a common aspect of all programming languages that deal with images or numbers. Different formats for numbers use different amounts of memory. For data types that allow decimals (e.g. `numpy`'s `float32` and `float64`), the more decimal places that are stored for each number in an array, the more memory the array takes up. 

Thus, converting to a less precise format (`np.float32`) can save memory, if precision is not critically important.

In [20]:
print(np.float64(np.pi))
print(np.float32(np.pi))

3.14159265359
3.14159


In [21]:
r64 = np.random.rand(30,100,100)
r32 = r64.astype(np.float32)
print('data type of `r64` is: ', r64.dtype)
print('data type of `r32` is: ', r32.dtype)

data type of `r64` is:  float64
data type of `r32` is:  float32


### Breakout Session

Take a look at the top right of your notebook. It should say **Mem:** xxx/1024 (MB). This tells you about the amount of memory available to you on datahub and the amount of it that you are using. If you cannot see this, go to the **View** menu and select *Toggle Toolbar*. Then it should appear.

In this breakout session, you are going to construct an array that breaks your memory limits.

1. Subtract the used memory from 1024, and multiply it by `(1024 * 1024)`. You have that many bytes free.
2. Divide this value by 8. You can store that many `float64`
3. Multiply that value by 0.75 and create an array of that many `ones`
4. Watch the memory go up
5. Make another array of that size and watch the kernel die

In [None]:
a = (4096 - 751) * (1024 * 1024)
print(a)
b = a // 8
print(b)
c = int(b * 0.75)
d = np.ones(c)
e = np.ones(c)

3507486720
438435840


## Memory maps
When data files are many gigabytes large, too large for computer memory, then we have to come up with ways of not having all of it in computer memory at once. This is such a common problem, that modern operating systems have come up with ways of doing this as best as possible without the computer programs even knowing that not all of the data they are working with is in live memory. One of the mechanisms by which this can happen is **memory mapped arrays**: These are arrays that are stored on the hard drive, but the relevant chunks of it are loaded into memory.

It turns out that if we **unzip** our nifti file, then nibabel can read it as a memory map:


In [253]:
!gunzip -c /data/cogneuro/fMRI/categories/s01_categories_01.nii.gz > /home/jovyan/s01_categories_01.nii


In [256]:
nii_nozip = nibabel.load("/home/jovyan/s01_categories_01.nii")
data_nozip = nii_nozip.get_data()

In [257]:
data_nozip

memmap([[[[  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ]],

        [[  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.        ],
         [  0.        ,   0.        ,   0.        , ...,   0.        ,
            0.        ,   0.   