# Lecture 1: Introduction to fMRI data & data types in Python

## Goals for today

We will go over some important concepts of data manipulation and visualization in fMRI, including: 

- Announcements
    - HW \#1 will be posted after class today.
    - Quiz \#1 will be given at the beginning of class next week.
    - Next week is the last class before the drop deadline. We will get your quizes graded before the drop deadline to give you some idea of difficulty of the course.
- Neuroscience/imaging concepts
    - Review of simple neural anatomy and fMRI jargon terms (from Lecture 1)
    - Overview of fMRI and the BOLD signal
    - Pros & Cons of fMRI
- Coding concepts
    - Loading and saving data
    - Review of data types and arrays
    - Working with arrays
- Datascience concepts
    - Multi-dimensional arrays vs. tables    
    - Plotting histograms

## Short overview of fMRI and the BOLD signal

### What is fMRI?
Functional Magnetic Resonance Imaging, or fMRI, is a neuroimaging technique used to measure brain activity over time. FMRI data is acquired using an MRI machine, just as would be used for many medical purposes. MRI machines use various **pulse sequences** to take 3-Dimensional images, and different **pulse sequences** capture different types of tissue or fluids in the human (or animal) body. fMRI uses a special **pulse sequence** designed to measure changes in the magnetic properties of the blood flow of the brain. Here's a picture of a MRI machine:

<img src="figures/mri_scanner.png" style="height: 400px;">

### The BOLD Signal
The functional signal we measure with fMRI is *not* an electrical neural signal (as in EEG, ECoG, or electrophysiology). It is a magnetic signal related to the properties of brain tissue and fluids, and it is dominated by the differences in magnetic fields created by blood flow. Blood flow is related to neural activity, because firing neurons need oxygen. The process of neural firing involves letting electrically charged ions into a cell and actively pumping them back out again. This is a is metabolically demanding process. So once a region of the brain becomes active (once the neurons start firing), metabolism in that region is high and initiates a complex set of processes to increase blood flow to the electrically active area. When blood comes into the brain it is oxygenated, meaning it has oxygen binded to the hemoglobin molecules contained in the blood. The neurons "take" the oxygen from these hemoglobin molecules to use in their metabolic process, leaving the hemoglobin in a deoxygenated state. When hemoglobin is oxygenated it is magnetically neutral, but when it is deoxygenated it is magnetic. So the size of the magnetic field induced by blood is proportional to the ratio of oxygenated to deoxygenated hemoglobin in the blood. The MRI machine is actually measuring this ratio of oxygenated to deoxygenated blood across the brain. This signal is called the **"Blood Oxygenation Level Dependent Response, or the BOLD signal"**. 

<img src="figures/deoxyhemoglobin.png" style="height: 400px;">

The vascular (blood) system of the brain responds to neural activity by flooding the region with way more oxygenated blood than is necessary, to be sure that the brain has all it needs. Thus, we are looking for an **INCREASE** in the BOLD signal (increase in ratio of oxygenated to deoxygenated hemoglobin) as an **indirect signal** to neural activity. The specific mechanisms that lead from neural activity to changes in blood flow are (a) not well understood, and (b) beyond the scope of this class. For now, just know that there are several ways to measure functional responses with MRI, and the specific one that we work with is the BOLD response. 

### fMRI Images

One fMRI image (fMRI volume) is acquired for a given unit of time called a r**epetition time (TR)**. A TR is typically 1-2 seconds *(usually 1.0, 1.5, or 2.0 seconds)*. Every image records the activity in the brain at a given point in time. The following image shows a single volume (or TR) of fMRI data (one two-second snapshot of brain activity).

<img src="figures/fig1.png" style="height: 400px;">

The dimensions of the brain volume measured by fMRI can vary, but are somewhere are 20-40 **slices** tall, with each slice being ~100 x 100 pixels. Each individual fMRI measurement unit is called a **voxel**, which is short for volumetric pixel. The voxels in the data we will be using in this course are about 2.4 x 2.4 x 4.0 mm$^3$ (X x Y x Z) in size. 

### fMRI Activity is Time Lagged

Once a neural event is triggered by a stimulus presentation the vascular system needs to respond to the need for glucose and oxygen in that specific brain area. This can take up to 1-2 seconds. Hence the hemodynamic response lags the triggered event by 1-2 seconds, which peaks around 5 seconds after the stimulus onset.

<img src="figures/lagged_activity.png" style="height: 400px;">

### Pros and Cons of fMRI

While fMRI is a powerful tool that has enabled a massive amount of new research questions to be answered, but it is by far a perfect technology. It is a complex technology, and all this complexity should always be a source of humility for anyone working with fMRI or trying to interpret fMRI results. Here we will outline a few of the largest **pros** and **cons** of this neuroimaging technique.

#### Cons
Let's start with the bad.
- It is an **indirect, slow measure** with poor **temporal resolution** - and these considerations strongly constrain the kinds of experiments you can do with fMRI and the conclusions you can draw from those experiments. For example, you can't say much about the timing of neural events, as 2 seconds is an eternity when it comes to the firing patterns of neurons.

<img src="figures/imaging_modalities.png" style="height: 400px;">

                                      Sejnowski et al., Nature Neurosci., 2014
- It is still very expensive, which means that many studies are grossly underpowered. This can lead to many issues, such as the reproducibility crises we're seeing in Psychology.
- Whatever experiment you want to do, it has to be done in a big, constraining, loud MRI scanner. This limits the types of experiments that can be done.

#### Pros
It's not all bad news though, there is a reason so many people use fMRI to do cognitive neuroscience reasearch:
- It is non-invasive! Meaning, we don't have to cut open people's skulls to measure their brain activity. Just think about that for a minute. We can actually put people into this machine and see what their brain is doing! It is seriously magical.
- It has much better spatial resolution than previous non-invasive neuroimaging modalities (e.g EEG or MEG). Though it is still fairly crude. Each value we get from the MRI scanner is the aggregated signal of up to 100,000 neurons! So this is a *relative Pro*!
- There are many different things the MRI scanner can measure when it comes to brain activity. The BOLD signal we mentioned is an indirect measure of neural activity, but MRI scanners can also be used to:
    - Measure the **white matter** tracts (i.e. the "wires" of the brain) using **Diffusor Tensor Imaging (DTI)**
    - Measure metabolic activity using **MR spectroscopy**, aka **Chemical Shift Imaging (CSI)**.
    - Measure when groups of neurons fire similarly, allowing for the mapping of brain **networks**. This is called **functional connectivity**.
    - Measure changes in brain tissue mass, called **MR morphology**

## Review of Data Types in Python

So far in lecture you've learned that Python stores data in several different ways, known as **Data Types**. You've learned about 4 basic data types (Integers, Floats, Strings & Booleans) plus 2 kinds of collections (Arrays and Tables), which are also data types. Here we will do a quick review of all of these data types.

You also learned that data can be stored into **names**, so that they can be referenced later. Names are crucial to programming, and we will be using them extensively throughout this class.

### A few words about Jupyter Notepads

Since we'll be using Jupyter notebooks for the entire course, I just want to mention a couple things about it.

You can read a notebook just like any other website, but you can also interact with it. To interact with it, you select one (or more) cells to act on. When a cell is selected, there are two modes the notebook can be in. The first is `command` mode, and the second is `edit` mode.

To get into `command` mode, press `Esc`. You know you're in `command` mode because the box outlining the current cell is **BLUE**. From here, press `h` to see a list of all the possible commands. From this mode you can more easily create, modify and move cells. You probably will not use this as much. 
> Try this now in the cell below!

To get into `edit` mode, press `Enter`. You know you're in `command` mode because the box outlining the current cell is **GREEN**. From here you can edit code and text, and run your code. You will be in this mode most of the time.
> Try this now in the cell below!

**Tab-completion**: To make your life easier, tab-completion is a feature of notebooks that will try and complete a **name** or **function** that you are typing for you. Once you start typing, you can press `Tab` and the notebook will create a little textbox that has all the possible options that the notebook knows about. This makes it faster to write code.
> Try this out in the cell below!

In [None]:
a_name = 'Hello!'

In [None]:
# start typing the name created above by pressing: "a" then "_", 
# and then press the Tab key to tab-complete the name.

# now type "im" and then press the Tab key. What happens?

### Comments
An important, and often overlooked, part of writing code is including useful Enlish (or other human language) **comments**. They can be used for many reasons, although most commonly they are used to explain in plain human language what the code below the comment does. They can also be used to write reminders that you still need to work on some part of your code.

To write a comment in Python, you simply put a `#` (pronounced hashtag, duh!) before any human language comment you want to write, like this:

In [None]:
# Here's a simple comment

You can even put comments after a line of code, like this:

In [None]:
another_name = 'Comment_Practice' # This is another comment to show you they can go after code too!

### The print function

Before we begin, let's review the print function, which we'll use throughout this class to print values out, so we can see what **names** contain. We can use it to print simple things:

In [None]:
print('We\'re finally writing some code!')

We can also use it to print out **names**

In [None]:
myFirstName = 'Sam'
print(myFirstName)


The print function is powerful and can be used to combine simple strings with **names**. To do this, just put commas in between the simple strings and the **names**. Note that Python will put a space after each comma.

In [None]:
myLanguage = 'Python'
print('My name is:', myFirstName, 'and I can program in:', myLanguage)

### Integers

Integers are a data type that stores whole numbers. They can be positive, negative or zero. 

They are often used to count things or **index** into arrays, which we'll learn about in a little bit.

In [None]:
5

In [None]:
type(5)

In [None]:
five = 5

In [None]:
five_plus_neg_five = five + -5
print(five_plus_neg_five)

In [None]:
type(five_plus_neg_five)

In [None]:
5 / 5

In [None]:
type(5 / 5)

### Floats

Floats, or floating point numbers, are a data type that stores numbers with decimal points. They can store up to 15 to 16 numbers after the decimal point, also known as **significant digits**.

While they are very precise, they are not perfect. Sometimes rounding erors do occur when using numbers with many significant digits.

In [None]:
pi = 3.151592653589
pi

In [None]:
type(pi)

**Functions** can be **called** by using the function name, followed by paranthesis with the **argument** in between those paranthesis. Here we'll find the absolute value using the abs() function.

In [None]:
abs(-543.234)

In [None]:
type(abs(-543.234))

Mathmatical operations that use a float and an int will always result in a float

In [None]:
five_plus_neg_five * pi

In [None]:
type(five_plus_neg_five * pi)

#### Breakout Session

Let's practice making some integer and float **names**.
- Create a **name** that is the sum of 8, 16 and 40 and name it six.
- Now take the base 2 logarithm of this **name**, using the `math.log()` function. Keep in mind this is a new module, so you'll need to import it using `import math`. What is this new value?
- What type is this log value?
- Now divide it by 4. What type is this new value?


### Strings

Strings are a data type that store text in the form of a sequence of letters, numbers and punctuation symbols.

In [None]:
"This is a string. Duh!"

In [None]:
class_name = 'Data Science for Cognitive Neuroscience'

In [None]:
type(class_name)

In [None]:
five_string = str(five)

In [None]:
type(five_string)

#### Formatting Strings

There are two ways to build text that include the current values of **names**. This can be useful when you want to check the value of a **name**, creating a customized filename, or label figure axes dynamically, for example. 

The first way is done using the `%` symbol: 

In [None]:
'This is the first way to make a formatted string that prints our class name here: %s' % (class_name)

We see a string defined that has a special character, the `%` symbol in it, followed by the letter `s`. The `%` character tells Python you want to replaec the  `%` with the value of a **name**, and the `s` indicates the variable will be a `string`. There are other letters you can use to insert an integer (`i`), float (`f`), or boolean(`b`). After that first string, there is another `%` symbol, followed by the **name(s)** you want to insert into the string. Those **name(s)** are surrounded by paranthesis.

The second way is using the `format()` function: 

In [None]:
'This is the second way to make a formatted string that prints our class name here: {0}'.format(class_name)

Here, we use the `{}` (pronounced curly-brackets) to indicate where we want the **name(s)** inserted. We put the variable number that should be inserted in between the curly-brackets, starting with zero. We follow this with a `.` (period) and the `format()` function. The parameters to the `format(*)` function are the **names** to insert.

### Booleans

Booleans are a data type that store a simple True or False value. They are used when doing **comparisons** and in **conditional** (if) statements which you'll learn more about in a couple of weeks. 

In [None]:
False

In [None]:
type(False)

In [None]:
pi_is_pos = pi > 3
print(pi_is_pos)

In [None]:
type(pi_is_pos)

#### Breakout Session

Create some Boolean and String **names**, and do some string formatting.
- Use the `==` operator to see if the integer value 5 equals the `fiveString` name we created above. Store it in a **name** called `fiveEquals5`
- Convert the boolean **name** `fiveEquals5` to a string.
- Format a string that displays the values of the two **names**: `five` and `fiveString`. Use both ways of formatting a string to do this.
- Do the two **names** look the same when formatted in a string? Why or why not?

### Arrays

Arrays are a data type that is a **collection** of other values. They can contain integers, floats, strings and booleans, among other data types. All of the values in an array are of the same type, however. They are **sequences**, meaning they store the values they contain in a specific order. Arrays can be 1-dimensional (1-D), 2-dimesional (2-D) and 3-dimensional (3-D). In fact, they can have any number of dimensions you like, which is usually called N-dimensional. 

You've learned how to create arrays using the `make_array()` function in lecture. This function creates 1-D arrays using the values you pass into it as arguments. While this is a very useful function, working with fMRI data requires the use of 2-D, 3-D and even 4-D arrays, so we will need to learn another way to create arrays. For this we will use the `numpy` module, which we have to `import`. You've learned a little about this module when you saw some of the **functions** that work on arrays, like `np.sum`

In [None]:
import numpy as np

#### Creating 1-D arrays
Perhaps the simplest way to create a 1-D array in `numpy` is to use the `np.arange()` **function**. As you've seen in lecture, it creates a range of numbers, specified by a starting point, ending point, and increment value.

In [None]:
years_since_millenium = np.arange(2000,2017)
print(years_since_millenium)

Uh-oh, where's 2017?

In [None]:
years_since_millenium = np.arange(2000,2018)
print(years_since_millenium)

There we go. With `np.arange()`, the end point you specify is the one past the last item you want included. This is because Python is **zero-indexed**, meaning that it starts counting from 0 instead of 1. So if you want a range of numbers that has 5 values in it, for example, you would use:

In [None]:
fives = np.arange(5)
print(fives)

While this is probably confusing when first learning Python, it makes a lot of sense once you get used to it.

Arrays can also be made from an aribitrary sequence of values:

In [None]:
some_values = np.array([8, 20, 14, 78, 34])
print(some_values)

#### Creating 2-D Arrays

Making 2-D arrays can be done by combinging several 1-D arrays together. Think of it as copying multiple rows of a spreadsheet together:

In [None]:
# Make the first row
row1 = np.arange(1,10,2)
print('The first row looks like:\n', row1)

# Make the second row
row2 = np.arange(2,11,2)
print('The second row looks like:\n', row1)

# Combine the two rows into a 2-D array
array_2D = np.array([row1, row2])
print('And the whole 2-D array looks like:\n', array_2D)

Note the `\n` in the `print()` statements? The `\` (pronounced backslash) is called an **escape** character, and tells the print function that the symbol after it has a special meaning. In this case, the `\n` tells Python to add a **newline** when printing, so that the array's data is not on the same line as the text.

We can also ask `numpy` to create arrays filled with random numbers, all ones, or all zeros:

In [None]:
random_vals_2D = np.random.randn(3, 5)
print(random_vals_2D)

zero_vals_2D = np.zeros((3, 5))
print(zero_vals_2D)

ones_vals_2D = np.ones((3, 5))
print(ones_vals_2D)

**Note:** The `np.random.randn()` function's parameters are the size of the dimension(s) that you want, while the `np.zeros()` and `np.ones()` functions need what's called a **tuple**, which is a sequnce of values surrounded by `()` (paranthesis). That's why there are two paranthesis `((` and `))` for those.

It is often important to know the size of the N-D arrays you are working with. Let's print the shape of the array, which tells us how many dimensions the array has, and how big each of those dimensions are (e.g. number of rows in a 2-D array):

In [None]:
print("This array has shape {}.".format(array_2D.shape))

The first value shown in this **tuple** is the number of rows, and the second value is the number of columns. 

#### Creating 3-D Arrays

Creating 3-D arrays is using the `np.random()`, `np.zeros()` and `np.ones()` functions is very similar to how we did it for 2-D arrays. We only need to add one more number indicating the size of the 3rd dimensions:

In [None]:
random_vals_3D = np.random.randn(2, 4, 3)
print(random_vals_3D)

zero_vals_3D = np.zeros((2, 4, 3))
print(zero_vals_3D)

ones_vals_3D = np.ones((2, 4, 3))
print(ones_vals_3D)

### Arrays vs. Tables

At this point you've learned about `Tables` in lecture class, and may be asking yourself: 'What's the difference between 2-D Arrays and Tables?' Well, as you might expect, there are some similarities and some differences. We will outline some of the main similarities and differences below.

**Similarities**
- They are both collections of data that are stored in a 2-D spreadsheet like way, with rows and columns.
- Operations like sorting and selecting can be done on both (although the syntax is different).
- Both can have named columns (although generally N-D Arrays' columns are not named, as will be the case in this class).

**Differences**
- Tables can have columns that contain data of different type (e.g. column 1 is floats and column 2 is strings), while N-D Arrays have the same data type everywhere.
- Tables have advanced functions for selecting subsets of your data, where N-D Arrays have a more simple way to do that.
- Tables are useful when you have a group of variables that you've measured across a number of observations (or subjects, or trials, etc.). Since N-D Arrays can have more than 2 dimensions, they are useful in representing higher dimensional data, such as fMRI brain data and time-series astro-physics data, among many others.
- TODO - Michael, any other input on this?

So you can see that even the similarities have some caveats which make them kinda differences!

**We will be using N-D Arrays (and not Tables) throughout this class.**

###  Breakout session

Now you'll practice creating some arrays. Print out the values of each one you create below.
- Create a range called A that contains even numbers between 10 and 20, inclusive (meaning both 10 and 20 should be in the array)
- Create a 2-D Array called B with 4 rows and 2 colums. You can use whatever approach we've outlined so far.
- Create a 3-D array of zeros called C that has dimensions of 2 x 3 x 4.

## Loading and Saving Data

In order to store, share and make analyses reproducible, it is necessary to be able to save and load data to files for permanent storage. `numpy` provides many **functions** for saving and loading N-D Arrays. Here we will learn about two of the easiest to use.

### Loading Arrays from File

`numpy` use the `.npy` file extension to save N-D Arrays. The `np.load()` **function** takes a filename and reads in the N-D Array stored in that file. Let's load a sample file that is stored in a shared folder on your datahub account. This file contains the data from one fMRI **"run"** (also referred to as a scan) that was stored as a N-D Array file. 

In [None]:
# Load the fMRI data from a numpy array file
fname_in = '/data/cogneuro/s01_categories_01.npy'
data = np.load(fname_in) 
print(data.dtype)

Once we've loaded the file into memory, we need to make sure the data type is 32-bit floats. Many of the operations we want to use do so on 32-bit data, so let's convert it from the 16-bit float data it is now into 32-bit floats.

In [None]:
data = data.astype('float32')

The first thing to look at whenever you load N-D Array data is it's shape, which tells us the number of dimensions and size of each one. How many dimensions does this data have?

In [None]:
# Print out the size of the data array we've loaded
print('This run of fMRI data has shape: ', data.shape)

### Saving Arrays to File
Now that we've converted our N-D Array data into float format, let's save it out to permanent file storage using the `np.save()` function.

In [None]:
# save out the float data
fname_out = '/home/jovyan/s01_categories_01_float.npy'
np.save(fname_out, data)

>Now check your datahub file list to see if the save worked!

## Working with Arrays

### Array Operations

One of the advantages to working with arrays is that we can do mathmatical and logical operations on all the values in an array in one line of code. Many of the statistical techniques we'll be using in this class need to do math on all the data at the same time, so being able to do it in one command is very useful!

Let's start by looking at simple arithmatic on a 1-D array:

In [None]:
# Create an array to play with
math_ops_array = np.arange(5)
print('The original array looks like this:\n', math_ops_array)

In [None]:
# Add 5 to all the values in the array
print('But if we add 5 to it:\n', math_ops_array + 5)

In [None]:
# Multiply all the values by 3
print('Or multiply it by 3:\n', math_ops_array * 3)


In [None]:
# Raise all the values to the power 2
print('Or raise all the values to the power 2:\n', math_ops_array ** 2)


Or we can do these operations on two arrays together:

In [None]:
# Create a second array to play with
math_ops_array2 = np.arange(6, 11)
print('The second array looks like this:\n', math_ops_array2)

In [None]:
print('The second array minus the first:', math_ops_array - math_ops_array2)

In [None]:
print('And the first divided by the second:', math_ops_array / math_ops_array2)

###  Breakout session

Now we practice using simple mathematical operations on arrays, and get a sneak peak of the next section on the transpose function. 
- Create a new array that is 10 times the `mathOpsArray` minus the square root of the second `mathOpsArray2`. Hint: use `np.sqrt()` to find the square root. Call it `mathOpsArray3`.
- Create a 2-D arry out of all 3 mathOpsArrays you've created, and call it `mathOpsArray2D`.
- Then, use the np.transpose function to transpose that array and call the new array `mathOpsArray2D_T`. 
- Print `mathOpsArray2D`, the shape of `mathOpsArray2D`, `mathOpsArray2D_T` and the shape of `mathOpsArray2D_T`.  
- What does the transpose function do?

### Transposing Arrays

Since taking the transpose of an array is a very common operation in `numpy`, there is a shortcut to do it! Simply add `.T` after the **name** of your array:

In [None]:
# Remind our selves what's in the sample 2-D array we created earlier
print(array_2D)

# And what it's shape is
print("This array has shape {}.".format(array_2D.shape))

In [None]:
# Now take the transpose of this array
array_2D_T = array_2D.T
print(array_2D_T)
print("This transposed array has shape {}.".format(array_2D_T.shape))

What happens when you transpose a 3-D array?

In [None]:
# Remind ourselves what's in the sample 3-D array we created earlier
print(random_vals_3D)

# And what its shape is
print("This array has shape {}.".format(random_vals_3D.shape))

In [None]:
# Now take the transpose of this array
random_vals_3D_T = random_vals_3D.T
print(random_vals_3D_T)
print("This transposed array has shape {}.".format(random_vals_3D_T.shape))

The order of the dimensions is inverted (e.g. the last dimension becomes the first dimension).

#### Breakout Session

Now let's explore what transposing arrays of higher dimensions does.
- Create the transposed matrix of the 4-D fMRI data array that we loaded (`data`). Call it `dataT`.
- Print the dimensions of `data`
- What do you think the dimensions of dataT will be?
- Print out the dimensions of dataT.

We happen to know that the MRI scanner saves the data such that it's dimensions are (X, Y, Z, T), where: X,Y, and Z are the three dimensions in space, and T is time, in volumes. Thus, there are 120 volumes (120 time points). Each volume has 30 horizontal or transverse or axial slices with 100 x 100 pixels.

<img src="figures/slices.png" style="height: 200px;">

In [None]:
print('Each volume has {0} entries on the X axis, {1} on Y, {2} on Z. There are {3} volumes.'.format(
        data.shape[0], data.shape[1], data.shape[2], data.shape[3]))

#### Transposing 4D fMRI data for convenience

When we work with fMRI data (often called **"images"**), it is in general more convenient to have the data in (T, Z, Y, X) format. The reasons why this convention is more convenient (like easier syntax and shortcuts while averaging over time and transfering data to a standard units, for example) will become more obvious as we go. 

In [None]:
# Keep data transposed
data = data_T

del data_T
print(data.shape)
print('There are {0} volumes. Each volume has {1} entries on the Z axis, {2} on Y, {3} on X. '.format(
        data.shape[0], data.shape[1], data.shape[2], data.shape[3]))

#### Quick Intro to Array Slicing
As we've seen the data we have loaded is 4-D data, with 120 volumes, 30 entries on the Z-axis, 100 on the Y-axis, and 100 on the X-axis. What if we want to look at the data of just the 35th volume, in just the 10th Z-axis value? To do this we use **Array Slicing** (sometimes called **indexing**). We will give a quick example here, because you'll need to do a little bit of this for your homework. We'll go into greater detail in the next lecture.

**Array slicing** is done using a combination of the `[]` operators (pronounced **square-brackets**) and numbers or the `:` operator (pronounced **colon**). To get a slice from an N-D array, you add the **square-brackets** after the **name**. You then indicate which values you want from the data, either using number(s) or the `:` to indicate that you want all of the values. Here is an example: 

In [None]:
axial_slice = data[35,10,:,:]
print('The size of this slice is:', axial_slice.shape)

- What volume did we select? 
- And what values in the Z-Axis, Y-axis and X-axis did we select? 
- What is the shape of this slice of the data?

## Exploring the data
So far we've learned that fMRI data is stored in N-D Arrays, how to do simple arithmetic operations on those arrays, and that transposing the fMRI data makes it easier to work with. Now it's time to start exploring the data!

Eventually we'll want to do some pretty cool statistical analyses and tests, and then visualize the findings from those analyses, but we need to learn to crawl before we can run. In general, the first thing you want to do with any data set you'll be working with (in neuroscience, or any other field) is to "look" at it. Sometimes this literally means looking at the raw numbers in an excel spreadsheet, but that is only for the simplest of cases. 

When you have large datasets, such as with fMRI brain data, you need to find ways of exploring that data that are manageable to comprehend. Today we'll look at two ways of doing so, by generating **Descriptive Statistics** and **Histograms** of our data, in order to get a feeling for what it "looks" like.

### Descriptive Statistics of the fMRI data
When dealing with a dataset, there are certain values we can look for that give us an idea of what's in the dataset, or what it "looks" like. 

For example, let's say you are a scientist on a spaceship that is exploring an alien planet, and you discover a new life-form. The first thing you might want to do is to measure the height of a number of specimens that you find (say 100 of them). If you wanted to convey to someone back home the size of these creatures, you might think to describe several things about their heights:
- The shortest one you found.
- The tallest one you found.
- The average height of all of them.

And so on. These kinds of values that we can calculate from a dataset are called **descriptive statistics**, and are a powerful way to get an idea of what your data "looks" like with just a few numbers. You will learn more about these in later Data 8 lectures, but it is useful for our purposes in this connector to discuss these now.

Using the analogy of the shortest alien above, let's see how to find the smallest fMRI value in our dataset using the `np.min()` function:

In [None]:
print('The minimum value of our 3 volumes of data is:', np.min(data))

#### Breakout Session

Now try to create some other descriptive statistics of our data set.

- Using np.max, print the maximum value of the data.
- Using np.mean, print the average value of the data.
- Using np.median, print the middle value of the data. 

### Plotting HIstograms
The old adage goes, "A picture is worth a thousand words". In this case, it may be worth a thousand numbers. You've just learned a bit about plotting histograms in lecture, so let's see how we can use histograms to get a better idea of what our data "looks" like.

First, we need to introduce the `flatten()` function. Let's see what it does:

In [None]:
print('Our example 2D array looks like:\n', array_2D)
print('And our flattened 3D array looks like:\n', array_2D.flatten())

So what does flatten do?

It seems it turns a 2-D array (or any higher dimensional array, as it turns out) into a 1-D array by collapsing all the rows, one after another. 

Flattening our N-D arrays is a necessary first step before plotting a histogram because a histogram requires a 1-D array, so let's do that now:

In [None]:
print('The shape of our original data:', data.shape)
data_flat = data.flatten()
print('The shape of the flattened data:', data_flat.shape)

Many of the plots that we will make in this class we be done using the `matplotlib` Python module. There is a very useful sub-module within `matplotlib` called `pyplot` that we will use now. So let's load it:

In [None]:
import matplotlib.pyplot as plt  # for visualization

We'll also need to set a few values in order to tell Python that we want all of our plots to appear in this Jupyter notebook, and not a separate window:

In [None]:
# Use "Magic" commands to set values that allow matplotlib plotting in the notebook as opposed to in a separate window
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Now we're reading to plot the histogram. We'll use the `plt.hist()` function for this. We'll also set the x-axis label (`plt.xlabel()`) and y-axis (`plt.ylabel()`) labels so the plot is interpretable.

In [None]:
plt.hist(data_flat, bins=50)
plt.xlabel('fmri signal')
plt.ylabel('count of measurements')

### Breakout session:

- What does this tell you about the data? 
- What are the axes on this plot? 
- Why are there so many zeros?