# Overview

organized by *Paul Squires and Shannon Tubridy*

thanks to *Todd Gureckis* for providing open licensed materials


- This notebook introduces the _list_ datataype

- _lists_ allow us to store multiple individual values or pieces of information in a single container rather than each thing stored in its own variable



## Lists






Lists are a kind of variable that can contain a number of individual sub-elements like a list of numbers, a list of months, a list of phone numbers, and so on.


![lists.png](attachment:lists.png)

Python knows a number of _compound_ data types, which are used to group together other values. The most versatile is the [*list*](https://docs.python.org/3.5/library/stdtypes.html#typesseq-list), which can be written as a sequence of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

In [None]:
# a list of strings
groceries = ['milk', 'coffee', 'cereal', 'apple', 'donuts']

print(groceries)

In [None]:
# get one of the items from the list:
groceries[-1]

In [None]:
# lists can also contain numbers
list_of_nums = [2,5,1,9]
print(list_of_nums)

In [None]:
list_of_nums[3]

Lists can contain mixed kinds of data like numbers and strings

In [None]:
mixed_list = ['treatmentA', 21, 'june', 26]
print(mixed_list)

# the numbers are still numbers and not strings
print(mixed_list[1])
print(mixed_list[1]/3)


**Lists are indexed**

Similar to strings, lists have individual elements that can be accessed using square bracket indexing but now each index position is a whole element of the list rather than a single character. 

The cell code above this had an example to get 21 out of the variable called mixed_list (`mixed_list[1]`).

The first element in a list is 0, the last is -1 or `len(list)-1`, and slicing works the same as strings. 


We can also use the len() function on a list the same way as a string. In the string case it will tell you how many characters in the string, in the list case it will tell you how many elements in the list (but not how many characters in any individual list element).

In [None]:
exp_conditions = ['treatmentA', 
                  'treatmentB', 
                  'shamA', 
                  'shamB', 
                  'control', 
                  'rand']

print(exp_conditions)

# get the first item in the list
print(exp_conditions[0])



In [None]:
# get the last item using reverse indexing
idx = -1
print(exp_conditions[idx])


In [None]:
length = len(exp_conditions)
print(length)

print(len('some string'))

In [None]:
# get the last item using length
length = len(exp_conditions)

print(f'there are {length} items in exp_conditions')

idx = length-1
print(f'last index position is {idx}')
print(exp_conditions[idx])
print(exp_conditions)

Once we index a specific element of a list we get back, or are returned, whatever that element is. So if it's a string we could index that substring. For example:

In [None]:
exp_conditions

In [None]:
# get an element from the list and put it in a variable:
second_item = exp_conditions[1]
print(f'1: {second_item}')


In [None]:
# use the variable from above to get one of the individual
# elements of the string
third_letter = second_item[2]
print(f'2: {third_letter}')


In [None]:
exp_conditions

In [None]:
string = exp_conditions[1]
string[2]

In [None]:

# or do it all at once:
print(f'3: {exp_conditions[1][0]}')

The last example #3 in the above cell works because Python first evaluates `exp_conditions[1]` and gets 'treatmentB' as a result and then it attaches the [2] to whatever the output was and uses it to grab whatever is in index position 2 (the third position).

**Slicing** a list, or getting range of index positions all at once works similar to strings:

In [None]:
# slice a list
exp_conditions = ['treatmentA', 
                  'treatmentB', 
                  'shamA', 
                  'shamB',
                  'control', 
                  'rand']

# get items in index positions 1 through 3
print(exp_conditions[1:4])

**Reminder**: slicing with a start and stop number gives you the item in the start position and then all positions up to BUT NOT INCLUDING the stop. That's why we have output from idx positions 1,2, and 3 in the cell above this. If the stop is a colon symbol it gives everything through and including the last character.

In [None]:
exp_conditions[3:]

In [None]:
list_of_nums = [6, 8, 8, 3, 44, 2, 15]
print(list_of_nums)

idx=0
print(f'idx={idx}, {list_of_nums[idx]}')

idx=-2
print(f'idx={idx}, {list_of_nums[idx]}')

print(list_of_nums[:4])

**Use index position to assign new values**

In [None]:
a = 'string'
a[0]

In [None]:
list_of_nums = [6, 8, 8, 3, 44, 2, 15]
print(list_of_nums)

list_of_nums[3]=99
print(list_of_nums)

In [3]:
exp_conditions = ['treatmentA', 
                  'treatmentB', 
                  'shamA', 
                  'shamB', 
                  'control', 
                  'rand']
print(exp_conditions)

exp_conditions[1]=99
print(exp_conditions)

['treatmentA', 'treatmentB', 'shamA', 'shamB', 'control', 'rand']
['treatmentA', 99, 'shamA', 'shamB', 'control', 'rand']


**Appending items to a list**

To add new items to a list use list.append()

It will add whatever is inside of append to a new index position at the end of the current list

In [1]:
days = ['mon', 'tues']
print(days)
print(f'days has {len(days)} elements')

# print('\n') simply prints a new line and I'm 
# using it to make the output separated
print('\n')


# use append to add an entry to the end of the list:
days.append('wed')
print(days)
print(f'days has {len(days)} elements')

['mon', 'tues']
days has 2 elements


['mon', 'tues', 'wed']
days has 3 elements


Append is especially useful when you do things in loops. In the next example we are looping over a list of participant IDs one at a time and putting them into a variable `uid`. With each participant ID we make a new filename for them and append it to a list.

Our next unit will dive into for loops and how they work, but for now just take a look at what happens in the next cell:

In [None]:
# make a list of participant numbers:
participants = ['sub-12', 'sub-13', 'sub-23', 'sub-29', 'sub-1000']

# loop over, or iterate through, the items in the list
# each time through the loop we'll grab an element from
# participants and put it in a variable called uid
# and then do whatever is indented underneath the for line
# for each item in the list

for s in participants:
    print(f'uid: {s}')
    
print('loop is over')


In [None]:
filenames = []
len(filenames)

In [2]:
participants = ['sub-12', 'sub-13', 'sub-23', 'sub-29']

# first we make an empty list called filenames
# then we can append to it during the loop
filenames = []

print(f'filenames before the loop: {filenames}')

for uid in participants:
    print(f'uid: {uid}')
    
    # make a filename that combines a specific user id
    # with some other info:
    fname = f'{uid}_task-treatmentA_date-101120.txt'
    
    # add the filename for the current uid to the 
    # filenames list variable:
    filenames.append(fname)

# printing \n in a string means make a new line on the output
print('\n')
print('filenames after the loop:')
print(filenames)

print('\n')
print('using indexing to print the first entry in filenames:')
print(filenames[0])


filenames before the loop: []
uid: sub-12
uid: sub-13
uid: sub-23
uid: sub-29


filenames after the loop:
['sub-12_task-treatmentA_date-101120.txt', 'sub-13_task-treatmentA_date-101120.txt', 'sub-23_task-treatmentA_date-101120.txt', 'sub-29_task-treatmentA_date-101120.txt']


using indexing to print the first entry in filenames:
sub-12_task-treatmentA_date-101120.txt


#### Break up a string based on some characters using split()


If we had filenames that looked like the ones we just made, we could extract useful information from them like subject ID, experimental treatment, and date using .split() and taking advantage of the use of _ as a field separator.

Split will break the string up and give us a list containing the pieces.

In [None]:
fname = filenames[0]
print(f'the original name: {fname}')

In [None]:
# use .split() to break fname up wherever
# the underscore appears:
fname_parts = fname.split('_')
print(f'file name parts: {fname_parts}')

`split()` is a _method_ or _function_ that is attached to an individual string object. It takes in a character or set of characters and then splits the original string whereever that character or set of characters appears. The split() command will use the split character to produce individual substrings and output them all to a list. The split character itself gets dumped from the output.

In [None]:
# fname_parts is a list, elements are id, exp group, and date
print(f'ID is {fname_parts[0]}')
print(f'group is {fname_parts[1]}')
print(f'{fname_parts[2]}')

#using str.replace() to get rid of .txt
date = fname_parts[2]
print(date)
print(date.replace('.txt',''))

## Further Reading and Resources

This is a collection of useful python resources including videos and online tutorials.  This can help students who have less familiarity with programming in general or with python specifically.

- A nice, free textbook <a href="https://www.digitalocean.com/community/tutorials/digitalocean-ebook-how-to-code-in-python?refcode=4d2af78748bd&utm_source=ebook&utm_medium=ebook&utm_campaign=pythonebook">"How to Code in Python"</a> by Lisa Tagliaferri
- CodeAcademy has a variety of courses on data analysis with Python.  There is a free tutorial on Python 2.0.  Although this class uses Python 3.0 and there are minor difference, a beginning programmer who didn't want to pay for the code academy content might benefit from these tutorials on basic python syntax: [Python 2.0 tutorial](https://www.codecademy.com/learn/learn-python)
- Microsoft has an [Introduction to Python](https://docs.microsoft.com/en-us/learn/modules/intro-to-python/?WT.mc_id=python-c9-niner) video series.  Each video is about 10 minutes long and introduces very basic python features.
- A six hour (free) video course on <a href="https://www.youtube.com/watch?v=_uQrJ0TkZlc">basic Python programming</a> on youtube