Quick Coffee Break!
---
<img src="https://www.publicdomainpictures.net/pictures/150000/velka/coffee-break-1454539196eJw.jpg">

# Python Basics
---
This notebook is meant get you started on using Python for wrangling data, with an emphasis on arrays. `Numpy` and `Pandas` are Python's numerical computing library that is especially useful for dealing with large datasets, like in neuroimaging. At the end of this tutorial, you should have a general sense of how Python deals with data and what arrays & dictionaries are good for, though you'll probably need to put in some time working with them to really get comfortable with using them.

## Why Python?

### Easy to learn
Programming is hard, so, in an absolute sense, no programming language is easy to learn unless you already have prior programming experience. But, comparatively speaking, Python's high-level nature (see next section), readable syntax, and use of semantic whitespace make the language easier to pick up than many others. For example, below is a (deliberately uncommented) definition of a simple Python function that converts a string of English words to (crummy) Pig Latin:

In [1]:
def pig_latin(text):
    ''' Takes in a sequence of words and converts it to (imperfect) pig latin. '''
    
    word_list = text.split(' ')
    output_list = []

    for word in word_list:

        word = word.lower()

        if word.isalpha():
            first_char = word[0]
        
            if first_char in 'aeiou':
                word = word + 'ay'
            else:
                word = word[1:] + first_char + 'yay'

            output_list.append(word)
    
    pygged = ' '.join(output_list)
    return pygged

The above function won't actually produce completely valid pig latin (assuming that there's such a thing as "valid pig latin"), but that's okay. It does something passable:

In [None]:
test1 = pig_latin("let us see if this works")

print(test1)

Pig Latin aside, the code is fairly easy to read ("easy" is relative, of course; I'm not suggesting that a novice programmer with no Python experience should be able to scan the code and immediately understand what's going on at every step!). There are several reasons for this. First, the code is written at a high level of abstraction (more on this below), so that each line of code maps onto a fairly intuitive operation like "take the first character of this word", and not onto a less intuitive lower-level operation like "reserve 1 byte of memory for a character I'm going to hand you in a moment". Second, the control structures (i.e., for-loops, if-then conditionals, etc.) use words like `in`, `and`, and `not`, rather than mysterious-looking operators. Third, Python's strict control of indentation (more on this later) imposes a level of discipline that keeps code readable while also preventing certain very common kinds of errors. And fourth, the Python community's strong emphasis on adhering to style conventions and writing "Pythonic" code means that Python programmers, more so than those working in many other languages, tend to use consistent naming conventions, line lengths, programming idioms, and many other similar features that collectively make it easier to read someone else's code (though admittedly this is more a feature of the community rather than the language itself).

### High-level
Python features a high level of abstraction. Many operations that must be invoked explicitly in lower-level languages (e.g., C or C++) are performed implicitly in Python. For example, you almost never have to explicitly allocate memory or collect garbage in Python—it's all done for you. Put simply, Python lets you write code faster than in many other languages.

### Dynamic
Python code is interpreted at run-time: there's no compilation process (well, this isn't entirely true, but close enough), and code is read line-by-line when executed. The upside of this is it eliminates a common choke-point in development (i.e., waiting for code to compile), and facilitates very fast iteration. It also means variables can be dynamically typed (more on that below). The downside is that, as with other dynamic languages, Python is often considerably slower than compiled languages—at least when performing operations that can't be easily optimized and/or bound to pre-existing code written in a compiled language. (You wouldn't, for example, want to write a 3d game engine in Python.)

### General-purpose
In contrast to many other dynamic programming languages designed to fill specific niches, Python is well suited for a very wide range of applications. It features a comprehensive standard library (i.e., the functionality available out-of-the-box when you install Python) and an enormous ecosystem of third-party packages. It also supports multiple programming paradigms to varying extents (object-oriented, functional, etc.). Consequently, Python is used in many areas of software development (data science, back-end web development, DevOps, scripting engines, etc.).

## Variables and basic types
Now that we've done a bit of evangelizing for Python (we'll do some more at the end!), let's look at the actual mechanics of the language. (If you have a fair bit of experience in other programming languages, you'll probably find the next few sections very basic, and might want to skip ahead.)

### Declaring variables
In Python, we declare a variable by assigning it a value with the `=` sign:

In [None]:
my_favorite_number = 3

Notice that when we initialized the above variable and assigned it a value (`3`), we didn't have to declare its *type* anywhere. In a statically typed language like C++, we'd have to explicitly indicate that the variable holds an integer (e.g., `int my_favorite_number = 3`). In Python, we just assign the value to the variable.

This is known as *[duck typing](https://en.wikipedia.org/wiki/Duck_typing)*, in reference to the idea that in languages like Python, you don't need to know ahead of time whether something is or isn't a duck: when you see an object that looks like a duck and behaves like a duck, you just assume it's a duck when you interact with. If something goes wrong, and your interaction fails, then you know the object isn't a duck.

### Printing variables
We can examine the contents of a variable at any time using the built-in `print()` function:

In [None]:
print(my_favorite_number)

If we're working in an interactive python shell (or an environment wrapped around one, like a Jupyter notebook), we may not even need to call `print()`, as we'll automatically get the output of the last line evaluated by the Python interpreter:

In [None]:
# this line won't be printed, because it isn't the last line in the notebook cell to be evaluated
"this line won't be printed"

# but this one will
my_favorite_number

### Built-in types

If you're coming to Python from another language, you're probably used to working with different types of variables—things like strings, booleans, integers, and so on. Python is no different, and provides us with a large number of [built-in types](https://docs.python.org/3/library/stdtypes.html). Let's take a quick look at some of these. We're assuming a little bit of prior programming experience here, so I won't bother to explain what a string or an integer is; the main thing is to just learn to recognize what different types look like in Python, and how they can be used.

#### Integers

In [None]:
# assign an integer to a variable
age_in_years = 30

In [None]:
# arithmetic works as you would expect
age_in_years / 2

#### Floats

In [None]:
# A float
almost_pi = 3.14

In [None]:
# arithmetic on floats also works as you'd expect
almost_pi + 10

In [None]:
# round() is a built-in function that rounds numbers.
# notice that it returns an integer and not a float,
# even if the input was a float.
# how can you tell this at a glance?
round(almost_pi)

#### Booleans
Booleans operate pretty much the same in Python as in other languages; the main thing to recognize is that they can only take on the values `True` or `False`. Not `true` or `false`, not `'true'` or `'false'`; not `1` or `0`.

In [None]:
enjoying_tutorial = True

As you probably know, we can perform logical operations that will evaluate to a boolean:

In [None]:
# Is the length of the string 'apple' greater than 2?
len('apple') > 2

In [None]:
# Is the product of the first two numbers equal to the third?
719 * 1.0002 == 2000

#### None
In addition to the usual suspects, Python also has a type called `None`. `None` is special, and indicates that no value has been assigned to a variable or returned by a function. It's roughly equivalent to many other languages' `null` value.

In [None]:
name = None

Note: `None` is NOT the same thing as `False`!

In [None]:
None == False

#### Strings
Strings come with a lot of useful built-in methods in Python ([see for yourself](https://docs.python.org/3/library/string.html)!). Let's explore just a few...

In [None]:
# A string
country = "Madagascar"

In [None]:
# How long is the string?
len(country)

In [None]:
# Convert to lowercase
# you can also try lower() or capitalize()
country.upper()

In [None]:
# Count the number of occurrences of the passed substring
country.count('a')

In [None]:
# Replace matching substrings with another value
country.replace('car', 'truck')

## Collections
Most code we're going to want to write in Python will require more than just strings and integers. We're going to need more complex data structures, or *collections*, that can hold other objects (like strings, integers, etc.) and enable us to easily manipulate them in various ways. Python provides built-in support for many common collections, and others can be found in various modules in the standard library (e.g., [collections](https://docs.python.org/3/library/collections.html)).

### Lists
Lists are the most common collection we'll work with in Python. A list is an ordered, heterogeneous collection of objects.

By *ordered* we mean that a list retains a memory of the position each of its elements was inserted in. The order of elements won't change unless we explicitly change it. This allows us to access individual elements in the liset directly, by specifying their *index*.

By *heterogeneous*, we mean that a list can contain elements of different types. A list doesn't have to contain all strings or all integers; it can contain a mix of them, as well as all kinds of other types.

#### List initialization
To create a list, we enclose one or more values between square brackets (`[` and `]`). Elements are separated by commas.

In [None]:
# Notice the different types--lists are heterogeneous!
random_stuff = [11, "apple", 7.14, "banana"]

#### List indexing
To access the $i^{th}$ element in a list, we enclose the index $i$ in square brackets. Note that Python uses 0-based indexing (i.e., the first element in the sequence has index 0), and not 1 as in some other data-centric languages (MATLAB, R, etc.). See this useful page and its [notes on the 0/1 based indexing war](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes).

In [None]:
# Returns the second element in the list
random_stuff[1]

#### List slicing
We can access sub-lists containing multiple contiguous elements using the colon (`:`) operator.

In [None]:
# First number indicates the start position;
# second indicates the end position. Note that
# the start is inclusive and the end is exclusive.
# In this example, we get back the 2nd and 3rd
# elements, but not the 4th.
random_stuff[1:3]

#### Assigning values to list elements
To overwrite an element at a given index, we just assign a value to it:

In [None]:
print("First element before re-assignment:", random_stuff[0])

random_stuff[0] = 14

print("First element after re-assignment:", random_stuff[0])

#### Appending to a list
We can add a single element to a list via the `.append()` method.

In [None]:
# Append an element
random_stuff.append(88)

# Now our list has changed
random_stuff

### Dictionaries (dict)
Dictionaries are another extremely commonly used data structure in Python. A dictionary (or dict) is a mapping from keys to values; we can think of it as a set of key:value pairs, where the keys have to be unique. Many other languages have structures analogous to Python's dictionaries, though they're usually called something like *associative arrays* or *hashtables*.

#### Dictionary initialization
Dictionary initialization looks like this:

In [None]:
fruit_prices = {
    'apple': 0.65,
    'mango': 1.5,
    'strawberry': '$3/lb',
    'durian': 'unavailable',
    5: 'just to make a point'
}

Note that both the keys and values are heterogeneously typed (observe the last pair, where the key is an integer).

#### Dictionary indexing
Dictionaries are indexed by key. The syntax is identical to that used for list indexing:

In [None]:
# Returns the stored value associated with the key 'mango'
fruit_prices['mango']

However, dictionaries *cannot* be indexed by position, because unlike lists, they're unordered. When you create a dictionary, or add new key:value pairs to an existing dictionary, the order of insertion isn't explicitly tracked. This means you can't ask for, e.g., "the 4th key:value pair in the dictionary". The following example fails, with a `KeyError` telling us there is no such key in the dictionary:

In [None]:
fruit_prices[0]

#### Updating a dictionary
Updating a dictionary uses the same indexing syntax, except we now make an explicit assignment:

In [None]:
# Add a new entry for orange
fruit_prices['orange'] = 0.5

# Overwrite the existing value for mango
fruit_prices['mango'] = 2.25

In [None]:
# Let's look at the dict again...
fruit_prices

### Tuples
Tuples are very similar to lists in Python. The two are easy to confuse, and in practice, you can use a list in most places where you can use a tuple (though there are some important exceptions we won't cover here). The main difference between lists and tuples is that lists are *mutable*, meaning, they can change after initialization. Tuples are *immutable*; once a tuple has been created, it can no longer be modified.

We initialize a tuple in much the same way as a list, except we use parentheses instead of square brackets:

In [None]:
# Tuples are initialized with parentheses, not brackets
my_tuple = ('a', 12, 4.4)

## Everything in Python is an object
The discussion so far might give you the impression that some data types in Python are basic or special in some way. It's natural to think, for example, that strings, integers, and booleans are "primitive" data types—i.e., that they're built into the core of the language, behave in special ways, and can't be duplicated, or modified. And this is true in many other programming languages. For example, in Java, there are exactly 8 primitive data types. If you get bored of them, you're out of luck. You can't just create new ones—say, a new type of string that behaves just like the primitive strings, but adds some additional stuff you think would be kind of cool to have.

Python is different: it doesn't *really* have any primitive data types. Python is a deeply object-oriented programming language, and in Python, *everything is an object*. Strings are objects, integers are objects, booleans are objects. So are collections. So are dictionaries. Everything is an object. We'll explore some of the deeper implications of this later. For now, let's focus on what it means for the way we write Python code. 

### The dot notation
Let's start with the dot (`.`) notation we use to indicate that we're accessing data or functionality inside a method. You've probably already noticed that there are two kinds of constructions we've been using in our code to do things with variables. There's the functional syntax, where we pass an object as an argument to a function:

In [None]:
len([2, 4, 1, 9])

And then there's the object-oriented syntax that uses the dot notation, which we saw when looking at some of the functionality implemented in strings:

In [None]:
phrase = "aPpLeS ArE delICIous"

phrase.lower()

If you have some experience in another object-oriented programming language, the dot syntax will be old hat to you. But if you've mostly worked in data-centric languages (e.g., R or Matlab), you might find it puzzling.

What's happening in the above example is that we're calling the method `lower()` *on* the `phrase` object itself. You can think of the `.` as expressing a relationship of belonging, or roughly translating as "look inside of". So, when we write `phrase.lower()`, we're essentially saying, "try to call the `lower()` method that's contained inside of `phrase`. (I'm being a bit sloppy here for the sake of simplicity, but that's the gist of it.)

Note that `lower()` works on strings, but it isn't a built-in function in Python. We can't just call `lower()` on the air around us:

In [None]:
lower()

And neither is `lower()` a method that's available on *all* objects. For example, this won't work:

In [None]:
num = 6

num.lower()

Integers, as it happens, don't contain a method called `lower()`. And neither do most other types. Strings in Python *do* contain a method called `lower()`, and what that method does is return a lower-cased version of the string on which we called the method. But that functionality is a feature of the string type itself, and *not* of the Python language in general.

Later, we'll see how we go about defining new types (or classes), and specifying what methods they have. For the moment, the main point to take away is that almost all functionality in Python is going to be accessed via objects. The dot notation is ubiquitous in Python, so you'll need to get used to it quickly if you're used to a purely functional syntax.

#### Inspecting objects
One implication of everything being an object in Python is that we can always find out exactly what data an object contains, and what methods it implements, by inspecting it in various ways.

We won't look very far under the hood of objects in this tutorial, but it's worth knowing about a couple of ways of interrogating objects that can make your life easier.

First, you can always see the type of an object with the built-in `type()` function:

In [None]:
msg = 'Hello World!'

type(msg)

Second, the built-in `dir()` function will show you all of the attributes and methods implemented on an object. Be warned that this will often be a long list, and that some of the attribute names you see (mainly those that start and end with two underscores) will look a little wonky. We'll talk about those briefly later.

In [None]:
dir(msg)

That's a pretty long list! Any name in that list is available to you as an attribute in the object (e.g., `my_var.values()`, `my_var.__class__`, etc.). Notice that the list contains all of the string methods we experimented with earlier (including `lower`), as well as many others.

## Namespaces and imports
Python is a high-level, dynamic programming language, which people often associated with flexibility and lack of precision (e.g., you don't have to type your variables when you declare them in Python). But in some ways, Python is actually much more of a stickler than most other dynamic languages about the way Python developers write their code. We just saw that Python is very serious about how you indent your code. Another thing that's characteristic of Python is that it takes *namespacing* very seriously.

If you're used to languages like, say, R or MATLAB, you might expect to have hundreds of different functions available to call as soon as you fire up an interactive prompt. In Python, the *built-in namespace*—i.e., the set of functions you can invoke when you start running Python—is [very small](https://docs.python.org/3/library/functions.html). This is by design: Python expects you to carefully manage the code you use, and it's particularly serious about making sure you maintain orderly namespaces.

In practice, this means that any time you want to use some code that's not available to you in your current [scope](https://en.wikipedia.org/wiki/Scope_(computer_science)), you need to explicitly *import* it from whatever module it's currently in, via an `import` statement. Python's import system often annoys beginners, because it forces them to write additional lines of code that other languages dson't. But once you get used to it, you'll find that it substantially increases code clarity and almost completely eliminates naming conflicts and confusion.

### Importing a module
Conventionally, all import statements in a Python file are consolidated at the very top (though there are some niche situations where this isn't possible). Here's what the most basic usage of `import` looks like:

In [None]:
import numpy as np    # We are renaming the variable we are importing with as keyword

By convention, numpy is imported as `np` for brevity. This is a general convention in Python; most widely-used packages have standard abbreviations that everyone in the community uses. While Python itself won't complain if you write, say, `import numpy as my_favorite_numerical_library`, we strongly recommend sticking with the conventional abbreviations, as they make it easier for everyone else to understand what your code is doing at a glance.

## Numpy

In most disciplines, data analysis consists of large tabular data--meaning, two-dimensional tables where data are structured into rows and columns, with each observation typically taking up a row, and each column representing a single variable. 

For example, consider a typical "resting state" fMRI study, in which participants simply lie in the scanner at rest while the machine around them does its thing. Suppose we have 20 participants, each scanned for roughly 30 minutes, with a repetition time (TR)—i.e., the duration of acquisition of each fMRI volume—of 1 second. If the data are acquired at an isotropic spatial resolution of 2mm (i.e., each brain "voxel", or 3-dimensional pixel is 2 mm along each dimension), then the resulting dataset might have approximately 20 x 1800 x 100 x 100 x 100 = 36 billion observations. That's a lot of data! Moreover, each subject's data has a clear 4 dimensional structure—the 3 spatial dimensions, plus time. If we wanted to, we could also potentially represent subjects as the 5th dimension, though that involves some complications, since at least initially, different subjects' brains won't be aligned with one another—we'd need to spatially register them for that.

It may be helpful to visualize a single subject's data to get a better sense of what the data look like. Of course, most of us mere mortals don't naturally think in 4 dimensions, so we'll need to cheat a little bit. We'll take advantage of the fact that the 3 spatial dimensions have an obvious structure to them, and then we'll concatenate consecutive 3d volumes along a time axis to get the 4th dimension. Here's the idea (image from the [nilearn docs](https://nilearn.github.io/building_blocks/manual_pipeline.html)):

<br />

![](images/niimgs.jpg)

<br />

We will typically want to access this data in pretty specific ways. That is, rather than applying an operation to every single voxel in the brain, at every single point in time, we usually want to pull out specific *slices* of the data, and only apply an operation to those slices. Say for example we're interested in a voxel in the amygdala. How would we access only that voxel, at every time point?

A very naive approach that we could implement in pure Python would be to store all our data as a nested series of Python lists: each element in the first list would be a list containing data for one time point; each element within the list for each time point would itself be a list containing the 2d slices at each x-coordinate; and so on. Then we could write a series of 4 nested for-loops to sequentially access every data point (or voxel) in our array. For each voxel we inspect, we could then determine whether the voxel is one we want to work with, and if so, apply some operation to it.

Basically, we'd have something like this (note that this is just pseudocode, not valid Python code, and you can't execute this snippet!):

```python
for t in time:
    for x in t:
        for y in x:
            for z in y:
                if z is in amygdala:
                    apply_my_function(z)
```

## Where to next?
The material covered above should be sufficient to get you started working with numpy, but we're really only scratching the surface. In a second notebook, we'll cover a few more important concepts, as well as some useful tips and tricks. But for a much more thorough introduction to numpy, we recommend working through the [numpy chapter](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) of Jake Vanderplas's exceptional [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html), available free online.

### Pandas
Numpy is the core numerical library in Python, and many other widely used scientific computing libraries build directly on its array structures. One of the most popular such libraries is [pandas](https://pandas.pydata.org/), a data analysis library that supplements numpy's array structures with data-oriented structures like data frames, as well as extensive functionality for performing common data processing and analysis operations on tabular data.

We don't have enough time in the schedule to cover pandas, but we strongly recommend working through a tutorial or two online. Here, again, it's hard to do better than the [pandas chapter](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) in Jake Vanderplas's book.

Just to give you a sense of what's possible in pandas, here's a series of examples illustrating just a sliver of the functionality the library provides:

In [52]:
import pandas as pd

# read_csv() is a workhorse text-reading function that can handle almost
# any kind of tabular plain-text representation of data. Pandas also has
# a range of utilities for reading from other formats, e.g., read_excel(),
# read_sas(), etc.
data = pd.read_csv('data/abide2.tsv', sep='\t')

In [53]:
# Inspect the first 5 rows of the file
data.head(5)

Unnamed: 0,site,subject,age,age_resid,sex,group,fsArea_L_V1_ROI,fsArea_L_MST_ROI,fsArea_L_V6_ROI,fsArea_L_V2_ROI,...,fsCT_R_p47r_ROI,fsCT_R_TGv_ROI,fsCT_R_MBelt_ROI,fsCT_R_LBelt_ROI,fsCT_R_A4_ROI,fsCT_R_STSva_ROI,fsCT_R_TE1m_ROI,fsCT_R_PI_ROI,fsCT_R_a32pr_ROI,fsCT_R_p24_ROI
0,ABIDEII-KKI_1,29293,8.893151,13.642852,2.0,1.0,2750.0,306.0,354.0,2123.0,...,3.362,2.827,2.777,2.526,3.202,3.024,3.354,2.629,2.699,3.179
1,ABIDEII-OHSU_1,28997,12.0,16.081732,2.0,1.0,2836.0,186.0,354.0,2261.0,...,2.809,3.539,2.944,2.769,3.53,3.079,3.282,2.67,2.746,3.324
2,ABIDEII-GU_1,28845,8.39,12.866264,1.0,2.0,3394.0,223.0,373.0,2827.0,...,2.435,3.321,2.799,2.388,3.148,3.125,3.116,2.891,2.94,3.232
3,ABIDEII-NYU_1,29210,8.3,13.698139,1.0,1.0,3382.0,266.0,422.0,2686.0,...,3.349,3.344,2.694,3.03,3.258,2.774,3.383,2.696,3.014,3.264
4,ABIDEII-EMC_1,29894,7.772758,14.772459,2.0,2.0,3080.0,161.0,346.0,2105.0,...,2.428,2.94,2.809,2.607,3.43,2.752,2.645,3.111,3.219,4.128


In [54]:
# Summarize the properties of the first 6 rows in the dataset
data.iloc[:, :6].describe()

Unnamed: 0,subject,age,age_resid,sex,group
count,1004.0,1004.0,1004.0,1004.0,1004.0
mean,29278.616534,15.101264,15.102124,1.238048,1.538845
std,374.424343,9.433702,5.363841,0.426101,0.498737
min,28675.0,5.128,-5.390924,1.0,1.0
25%,28974.75,9.280137,13.215053,1.0,1.0
50%,29247.5,11.66758,14.909247,1.0,2.0
75%,29542.25,18.015,16.620996,1.0,2.0
max,30167.0,64.0,44.644232,2.0,2.0


In [55]:
# Select a column by name
data['age']

0        8.893151
1       12.000000
2        8.390000
3        8.300000
4        7.772758
5        8.270000
6       18.750000
7       22.000000
8       13.249315
9       12.900000
10      12.068493
11      12.558904
12      10.810959
13       7.090000
14      12.558904
15      10.019178
16      21.000000
17       6.499658
18      10.000000
19      13.076712
20      12.605479
21      10.663014
22      10.640000
23      51.000000
24      20.000000
25      16.366900
26       5.960000
27      28.083333
28      11.900000
29      15.300000
          ...    
974     19.000000
975     11.090411
976     18.750000
977      8.931507
978     11.821918
979     19.000000
980      9.578082
981     12.764384
982     11.353425
983     10.427397
984      8.723288
985     18.750000
986     20.607800
987     12.000000
988     19.583333
989     14.000000
990     18.000000
991     32.624200
992     15.290000
993     26.000000
994     17.560000
995     12.200000
996      5.603000
997      8.745205
998     15

In [56]:
# Select values at specific rows and columns
data.loc[[2, 5], ['age', 'sex']]

Unnamed: 0,age,sex
2,8.39,1.0
5,8.27,1.0


In [57]:
# Sort data on columns: group first, then age
sorted_data = data.sort_values(['group', 'age'])

sorted_data.head(5)

Unnamed: 0,site,subject,age,age_resid,sex,group,fsArea_L_V1_ROI,fsArea_L_MST_ROI,fsArea_L_V6_ROI,fsArea_L_V2_ROI,...,fsCT_R_p47r_ROI,fsCT_R_TGv_ROI,fsCT_R_MBelt_ROI,fsCT_R_LBelt_ROI,fsCT_R_A4_ROI,fsCT_R_STSva_ROI,fsCT_R_TE1m_ROI,fsCT_R_PI_ROI,fsCT_R_a32pr_ROI,fsCT_R_p24_ROI
251,ABIDEII-NYU_2,29170,5.128,13.444107,1.0,1.0,2729.0,293.0,316.0,2536.0,...,2.937,2.861,2.344,2.894,3.126,2.744,3.149,2.379,2.925,2.58
518,ABIDEII-NYU_1,29224,5.22,10.618139,2.0,1.0,3495.0,182.0,495.0,3041.0,...,3.426,3.747,2.585,3.092,3.261,2.683,3.393,1.878,3.398,2.965
291,ABIDEII-NYU_2,29167,5.255,13.571107,1.0,1.0,2436.0,185.0,350.0,2225.0,...,2.863,3.457,2.494,2.655,3.4,2.771,3.21,2.023,3.426,3.241
231,ABIDEII-NYU_2,29174,5.295,13.611107,1.0,1.0,3224.0,201.0,473.0,2500.0,...,2.952,3.625,2.699,3.352,3.843,3.222,3.566,2.644,3.173,3.003
246,ABIDEII-NYU_1,29189,5.32,10.718139,1.0,1.0,3261.0,222.0,423.0,2679.0,...,3.285,3.797,2.858,3.092,3.538,3.307,3.15,3.283,2.925,2.927


In [58]:
# Mean values of first five columns whose names start with 'fsArea'
data.filter(like='fsArea').iloc[:, :5].mean()

fsArea_L_V1_ROI     3161.415339
fsArea_L_MST_ROI     226.266932
fsArea_L_V6_ROI      395.445219
fsArea_L_V2_ROI     2613.914343
fsArea_L_V3_ROI     1747.647410
dtype: float64

In [59]:
# Variance of the 'fsCT_R_TGv_ROI' column, grouped separately by
# every combination of sex and group in the dataset
groups = data.groupby(['sex', 'group'])

groups['fsCT_R_TGv_ROI'].var()

sex  group
1.0  1.0      0.263925
     2.0      0.169335
2.0  1.0      0.231037
     2.0      0.213257
Name: fsCT_R_TGv_ROI, dtype: float64

# Resources/further reading
This tutorial provided a high-level look at some of the main features of the Python language—some basic, some more advanced. To really develop a working familiarity with the language, you will, of course need to roll up your sleeves and start writing some code. One of the best ways to learn is to pick a small problem that actually interests or matters to you in some way (e.g., parsing some text data you have lying around), and google for help every time you run into problems (there's no shame in consulting the internet! All programmers do it!).

If you prefer to have more structure, there are hundreds of excellent, and mostly free, resources online to help you on your way. A few good ones:

* CodeAcademy offers interactive programming courses for many languages and tools, including [Python](https://www.codecademy.com/learn/learn-python). (The Python 3 course costs money, but the Python 2 course is free, and the changes to the language aren't huge.)
* [A Whirlwind Tour of Python](http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf) is an excellent intro to Python by [Jake VanderPlas](https://staff.washington.edu/jakevdp/); Jupyter notebooks are available [here](https://github.com/jakevdp/WhirlwindTourOfPython)
* Another excellent and free online book is Allen Downey's ["Think Python"](http://greenteapress.com/wp/think-python-2e/)