# Basic Data Structures

## Topics:
- List review and advanced topics
- Tuples





## More Lists

A __list__ provides a way of storing an ordered series of values in a structure referenced by a single variable. 


We can define a list as a succession of elements separated by commas and enclosed in square brackets __[ ]__. For example, we can create a list of characters as follows:

In [1]:
# How to create a list
li1 = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

Here, we have defined a variable, __li1__, as a list.

In [2]:
# How to access the data in a list
print li1[2]

Wednesday


In [3]:
# See the contents of the whole list
print li1

['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']


In [4]:
# How do I check if li1 is a list?
print type(li1)

<type 'list'>


### Types within types

Lists are interesting because they are containers that can hold arbitrary types inside of them.

Here our lists are holding strings (type str).  However, they can hold any type, and not all entries in a list need to be the same type. We would call this a "heterogenous" list.


In [5]:
print type(li1[0])

<type 'str'>


In [6]:
mixed_list = ["Beethoven", 27, 2.718];

print type(mixed_list[0])  # Print the type of the first item
print type(mixed_list[1])  # ..................the second item
print type(mixed_list[2])  # etc..

<type 'str'>
<type 'int'>
<type 'float'>


You can also define lists using other variables.  When doing this, it's as if you took whatever was stored in the variable and typed it in instead.

In [37]:
# For example, these both create identical lists

sample_list1 = ['Bach', 'Beethoven', 'Mozart']

# same as

a = 'Bach'
b = 'Beethoven'
c = 'Mozart'

sample_list2 = [a, b, c]

print sample_list1
print sample_list2

['Bach', 'Beethoven', 'Mozart']
['Bach', 'Beethoven', 'Mozart']


Lists have lots of really useful features. 

One is that they are __ordered__, which means the order of items in a list __does not change__ (this is not true for dictionaries, as we will see later). This means you can access individual items in a list or entire sections by indexing or slicing. 

You can also manipulate your list using built-in methods (more on what this means later this week). For example, we can add to the list by using the __append()__ method.

## Adding to a List

In [8]:
#add single item to list - using 'append'
li1 = [4, 8, 15, 16, 23]
print 'Before'
print li1

li1.append(42)

print 'After'
print li1

Before
[4, 8, 15, 16, 23]
After
[4, 8, 15, 16, 23, 42]


In [9]:
#combine two lists - using 'extend'
li1 = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']
li2 = ['Friday', 'Saturday', 'Sunday']

li1.extend(li2)

print "After using extend"
print "li1: ", li1

After using extend
li1:  ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']


### Note: extend vs append

`a.append(5)` increases the length of list `a` by one, putting the int 5 in the final slot

`a.extend(b)` takes all of the elements in list `b` and adds them to the end of list `a`.  Here `b` must be another list

What happens when we use append on two lists? "append" adds its argument to the end of the list as a single element... but what if that element is a list? 

In [1]:
li1 = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']
li2 = ['Friday', 'Saturday', 'Sunday']
li1.append(li2)
print li1

['Monday', 'Tuesday', 'Wednesday', 'Thursday', ['Friday', 'Saturday', 'Sunday']]


In [None]:
Woah... list-ception! We just put a list in a list. 

In [3]:
print li1[0]
print li1[1]

Monday
Tuesday


In [2]:
print li1[4]

['Friday', 'Saturday', 'Sunday']


In [10]:
#combine two list by concatenation
li1 = ['one', 'two']
li2 = ['three', 'four']

li4 = li1 + li2 # Note: This creates a NEW list, so we assign it to a variable

print "li1: ", li1
print "li2: ", li2
print "li4: ", li4

li1:  ['one', 'two']
li2:  ['three', 'four']
li4:  ['one', 'two', 'three', 'four']


## Other useful list methods
You can find more documentation for python lists [here](https://docs.python.org/2/tutorial/datastructures.html)

The above commands illustrate several of the most common ways to grow lists:

1) The list method __append()__, which adds a single item to the end of a list

2) The list method __extend()__, which adds a whole list to the end of the list you ask to extend itself

3) The list concatenation operator, which stitches two things together to make a new whole, without changing either original list.

The __insert()__ method is another way to add to a list. This method takes two arguments (in order): an index to insert at, and the object to insert. You can also insert by slicing, something like this: __li[2:2] = [var_to_insert]__. The first one is somewhat clearer, so it might be preferred unless you have very particular reasons for doing the other one.

In [6]:
#using insert
li1 = ['Tuesday', 'Wednesday', 'Thursday', 'Friday']

print 'Before Insert'
print li1
print

li1.insert(0,'Monday')

print 'After Insert at index 1'
print li1
print

Before Insert
['Tuesday', 'Wednesday', 'Thursday', 'Friday']

After Insert at index 1
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']



In [7]:
li1.insert(3,"Wednesday afternoon")
print li1

['Monday', 'Tuesday', 'Wednesday', 'Wednesday afternoon', 'Thursday', 'Friday']


## Removing items from a list

In [38]:
#remove any item from the list with 'del'
li= ['One','Ring','to','rule','them','all']
print li

del li[3]

print 'After using del:'
print li

['One', 'Ring', 'to', 'rule', 'them', 'all']
After using del:
['One', 'Ring', 'to', 'them', 'all']


In [39]:
#remove the last item from the list with pop
print 'Before pop'
print li
word=li.pop()
print 'After pop'
print li
print "Thing we popped:", word

Before pop
['One', 'Ring', 'to', 'them', 'all']
After pop
['One', 'Ring', 'to', 'them']
Thing we popped: all


Here, we are removing things from lists in two ways:

1) The built-in function __del__ removes a particular item from the list.

2) The list method __pop()__ removes the last item from the list and returns the variable.

## Changing lists in place

In addition to adding things to lists and taking them away again, we can also __change lists in place__.

In [23]:
#create list of zeros
noLi = 4*[0]

# This is equivalent to:
#  noLi = [0] + [0] + [0] + [0]

#  Once you understand how the addition operator works, the multiplication operator
#  works in an analogous way.
#  3*4 is the same as adding 3 to itself 4 times, so
#  [3] * 4 is equivalent to adding the list [3] to itself 4 times.

print noLi

[0, 0, 0, 0]


In [24]:
#modify items in list
mice_brain = 10
rat_brain = 20
human_brain = 500

noLi[1] = rat_brain
noLi[2] = human_brain
noLi[3] = mice_brain

print noLi

[0, 20, 500, 10]


In [25]:
#sort list
print 'sorted list!'
noLi.sort()
print noLi

sorted list!
[0, 10, 20, 500]


In [29]:
#reverse order
print 'reverse the list'
noLi.reverse()
print noLi

reverse the list
[500, 20, 10, 0]


In [9]:
# Sorting a list in place vs. returning a sorted value and leaving the original unsorted
noLi = [100, 20, 500, 10]
print 'sorted() vs .sort()'
another_noLi = sorted(noLi)
print "original:",noLi
print "Sorted:",another_noLi
noLi.sort()
print "Original", noLi

sorted() vs .sort()
original: [100, 20, 500, 10]
Sorted: [10, 20, 100, 500]
Original [10, 20, 100, 500]


In [40]:
#sort string list
li= ['One','Ring','to','rule','them','all']
li.sort()

print li
# Why is it not sorting in alphabetical order?

['One', 'Ring', 'all', 'rule', 'them', 'to']




## Characterizing lists

In [27]:
#figure out how long a list is with 'len'
li= ['One','Ring','to','rule','them','all']
print len(li)

5


In [28]:
# max and min
noLi = [0, 10, 20, 500]
print len(noLi)
print 'Max =', max(noLi)
print 'Min =', min(noLi)

4
Max = 500
Min = 0


In [29]:
#find where something is stored
li= ['One','Ring','to','rule','them','all']
idx = li.index('rule')
print idx
print li[idx]

2
superheroes


Here, we have started to characterize our lists.

1) The built-in functions __len()__, __max()__ and __min()__ tell us how many items are in the list and the maximum and minimum values in the list.

2) The list method __index()__ tells us where an item is in the list.

3) We can iterate over each item in the list and print it using the syntax __for x in mylist__:

## Lists and loops

We've already seen in a previous lecture that it is easy to loop through a list

In [31]:
months = ["January", "February", "March",
         "April", "May", "June", "July",
         "August", "September", "October",
         "November", "December"]

# This loop for printing the months
for x in months:
    print x

January
February
March
April
May
June
July
August
September
October
November
December


What if we want access to the number (position in the list) for each month?

In [40]:
for i in range(len(months)):
    print i+1, months[i]

1 January
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
11 November
12 December


Another useful short-hand is the enumerate method:

Often in a loop, you might find that you need the number of the element, as well as the element itself

To avoid having to do:
```python
for i in my_list:
    my_element = my_list[i]
    # Some code involving 'i' and 'my_element'
```

The **enumerate** function lets you loop over both at once

In [41]:
for i, month in enumerate(months):
    print i+1, month

1 January
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
11 November
12 December


### Transforming/Filtering lists using list comprehensions
What if you wanted to change the list so that all the letters in the names of the months are in CAPS?

In [35]:
# One way to do this
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

caps_months = [];
for month in months:
    caps_months.append(month.upper())  # Remember, if you have a string, then the '.upper()' method will return an all-caps version
    
print caps_months

['JANUARY', 'FEBRUARY', 'MARCH', 'APRIL', 'MAY', 'JUNE', 'JULY', 'AUGUST', 'SEPTEMBER', 'OCTOBER', 'NOVEMBER', 'DECEMBER']


In [36]:
# Another example - create a list of upper-case month names that start with the letter 'J'

j_months = [];
for month in months:
    if month[0] == 'J':
        j_months.append(month.upper())
        
print j_months


['JANUARY', 'JUNE', 'JULY']


# Tuples

A tuple is essentially a list that you can not change. You can index, slice them and add them together to make new tuples but not use __sort()__, __reverse()__, delete or remove items from them. If you ever have a tuple that you want to change, you have to turn it into a list.


In [45]:
SNP = ('chrII', '378445')
print type(SNP)
print
print SNP
print SNP[0]

<type 'tuple'>

('chrII', '378445')
chrII


In [None]:
# Tuples

In [46]:
#Let's see if python will let us change the SNP tuple:
SNP[0] = 'chrV'
print SNP

TypeError: 'tuple' object does not support item assignment

In [47]:
#What if we first coerce the tuple to a list?
SNP = list(SNP)
print type(SNP)
SNP[0] = 'chrV'
print SNP

<type 'list'>
['chrV', '378445']


Now that the tuple was converted into a list, we could change the first element without an error. 

## So...what's the point of a Tuple then?

Tuples, as other immutable objects, are lighter than lists. Using tuples allow programmers to optimize their code.

In [1]:
import timeit
# Create 10 million lists
print 'list time: ', timeit.timeit("['Bach', 'Beethoven', 'Mozart']", number=10000000)

# Create 10 million tuples
print 'tuple time: ', timeit.timeit("('Bach', 'Beethoven', 'Mozart')", number=10000000)



list time:  1.10348081589
tuple time:  0.129873991013


Just creating a tuple is substantially faster than creating a list!

Most of the time, you'll just use a list.  However, you need to be aware of tuples because they are commonly used by functions to return multiple values.

In [64]:
def maxmin(numeric_list):
    min_val = min(numeric_list)
    max_val = max(numeric_list)
    return max_val, min_val

out = maxmin([34, 4, 54, 103, 6])

# What exactly is out?
print type(out)

print out[0]
print out[1]

<type 'tuple'>
103
4


In [65]:
# However, even then you don't need to work with the tuple directly usually

largest, smallest = maxmin([34, 4, 54, 103, 6])
print largest
print smallest

103
4


# Summary So Far...

__Lists are:__

1) ordered collections of arbitrary variables.

2) accessible by slicing.

3) can be grown or shrunk in place.

4) mutable (can be changed in place).

5) defined with list = [X,Y]

List methods include: 

__append(x)__: Add 'x' to the end of the list  
__extend(Z)__: Add the contents of list 'Z' to the end of the list  
__insert(x)__: Add item 'x' to the start of the list (or to a specified position)  
__pop()__: Remove an item from the end of the list  
__reverse()__: Reverse the list (in-place)  
__index(x)__: Find the location (index) of x in the list  
__sort()__: Sort the list (in-place)  

Built in functions include: 

__sorted(Z)__:  Get a sorted copy of 'Z' (doesn't modify Z itself)  
__len__:  Get the number of items in the list (i.e., its length)  
__max__:  Get the maximum of all list items  
__min__:  Get the minimum of all list items  
__type__: Get the type of a python variable  

Questions?

## Advanced Type:  List of Lists

Let's first start by making a couple of lists.

In [52]:
# Things related to research
time_wasters = ['facebook', 'xanga', '9gag', 'reddit'] # instead of working, this is what we do
lab_space = ['wet lab', 'cold room', 'shared space'] # potentially where we waste time

print 'time_wasters:', time_wasters
print 'lab_space:', lab_space

time_wasters: ['facebook', 'xanga', '9gag', 'reddit']
lab_space: ['wet lab', 'cold room', 'shared space']


Incidentally, making a list of lists is fairly simple -- we can just create a new list variable and fill it with lists that we've already defined. Another way would be to manually input everything ourselves.

In [53]:
research = [time_wasters, lab_space]

# time_wasters and lab_space are both lists already?  So what's in research now?

print research

[['facebook', 'xanga', '9gag', 'reddit'], ['wet lab', 'cold room', 'shared space']]


In [54]:
print 'Number of items in `research`', len(research)
print type(research[0])
print type(research[1])

Number of items in `research` 2
<type 'list'>
<type 'list'>


In [58]:
# Or you could define a list of lists all in one statement

research = [
    ['facebook', 'xanga', '9gag', 'reddit'],
    ['wet lab', 'cold room', 'shared space'],
]
# each list within the main list is contained in its own square brackets
print research

[['facebook', 'xanga', '9gag', 'reddit'], ['wet lab', 'cold room', 'shared space']]


## Retrieving Elements in List of Lists
Getting elements in a list of lists is similar to getting elements in a list. The difference is that we add another index. Let's first see what happens when we try to retrieve the first and second elements of "research".

In [59]:
# Let's get the first list in research
List_a = research[0]
List_b = research[1]

print 'List_a has ', List_a
print 'List_b has ', List_b

List_a has  ['facebook', 'xanga', '9gag', 'reddit']
List_b has  ['wet lab', 'cold room', 'shared space']


Let's now try to retrieve '9gag' and 'cold room' from each list. The natural way, now that we have 2 different lists is simply to index them, but it can be a pain if you have a lot of lists nested within a list. We can use to sets of indexing instead.

In [60]:
# Long way
# Print the 3rd item OF the first list
List_a = research[0]
print(List_a[2])

# Print the 2nd item OF the second list
List_b = research[1]
print(List_b[1])


9gag
cold room


In [61]:
# Faster way without creating new variables for each nested list
a = (research[0])[2]
b = research[1][1]

print a
print b

9gag
cold room


This works for as many nested lists you have; just keep using as many indices until you get what you want

In [1]:
# E.g.

big_List = [
    [
        [1,2,3],
        [6,5,4],
        [7,8,2]
    ],
    [
        [11,12,13],
        [15,15,15],
        [83,94,19]
    ]
]

print big_List[1][2][0]

83


## Operations on Nested Lists
Like regular lists, all other list operations still work. Let's add a list of hangout places to *research*.

In [62]:
# Make a list of hangouts
time_wasters = ['facebook', 'xanga', '9gag', 'reddit']
lab_space = ['wet lab', 'cold room', 'shared space']
research = [time_wasters, lab_space]

hangout = ['Jupiter','Gardens','SF']
research.append(hangout) # research should now have a sublist of hangouts

print research
print "Number of items in research:", len(research)

[['facebook', 'xanga', '9gag', 'reddit'], ['wet lab', 'cold room', 'shared space'], ['Jupiter', 'Gardens', 'SF']]
Number of items in research: 3


What happens if you modify the *hangout* sublist under *research*?

In [63]:
# Add Starbucks to the hangout sublist under research
hangout.append('Starbucks')

print research[2]
print hangout

['Jupiter', 'Gardens', 'SF', 'Starbucks']
['Jupiter', 'Gardens', 'SF', 'Starbucks']


Notice how there's a change in hangout as well? That's because the 2 lists are the same exact one, just in 2 different locations. If this is a problem, simply copy of the elements of the *hangout* list to add to *research*.

In [64]:
# Using a copy of 'hangout' instead - Method 1
# Reinitialize the original research list
time_wasters = ['facebook', 'xanga', '9gag', 'reddit']
lab_space = ['wet lab', 'cold room', 'shared space']
research = [time_wasters, lab_space]

research.append([]) # add a new empty list ot be filled
research[2].append(hangout[0]) # start adding stuff from hangout list
research[2].append(hangout[1])
research[2].append(hangout[2])
research[2].append('Peets') # add a new thing - this ISNT added to hangouts since we created a new list
print hangout
print research[2] # Now they're different!

['Jupiter', 'Gardens', 'SF', 'Starbucks']
['Jupiter', 'Gardens', 'SF', 'Peets']


In [65]:
time_wasters = ['facebook', 'xanga', '9gag', 'reddit']
lab_space = ['wet lab', 'cold room', 'shared space']
research = [time_wasters, lab_space]

# Method 2
research.append(hangout[:])  # Using [:] makes a copy
research[2].append('Peets')
print hangout
print research[2]

['Jupiter', 'Gardens', 'SF', 'Starbucks']
['Jupiter', 'Gardens', 'SF', 'Starbucks', 'Peets']


In this case, we modified *research* without altering *hangout*.

*PythonTutor.com example*

## Nested Loops
These are very useful for generating nested data structures or pulling out data from nested data structures. As implied, these are simply loops within loops. First, let's make 2 lists that we want to work with.

In [67]:
# Create two lists of letters and numbers
letters = ['a','b','c','d']
numbers = [1,2,3,4]

Then we make a function that will create each pairwise combination of letters and numbers

In [68]:
def combo(list_a, list_b):
    for i in list_a:      # 'i' holds the item in list_a
        for j in list_b:  # 'j' holds the item in list_b
            print i, j
            
combo(letters, numbers)

a 1
a 2
a 3
a 4
b 1
b 2
b 3
b 4
c 1
c 2
c 3
c 4
d 1
d 2
d 3
d 4


In this case, changing the order of the lists simply changes the order in which it prints. It's up to you how you want your data to look.

In [69]:
combo(numbers, letters)

1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
3 a
3 b
3 c
3 d
4 a
4 b
4 c
4 d


## Use With Nested Loops
How does this work with nested data structures? Let's use the original *research* list and find all the lab spaces and ways we can procrastinate.

In [70]:
# reinitialize research
time_wasters = ['facebook', 'xanga', '9gag', 'reddit']
lab_space = ['wet lab', 'cold room', 'shared space']
research = [time_wasters, lab_space]

for i in research[0]:
    for j in research[1]:
        print 'procrastinate with {} in the {}' .format(i, j)
            

procrastinate with facebook in the wet lab
procrastinate with facebook in the cold room
procrastinate with facebook in the shared space
procrastinate with xanga in the wet lab
procrastinate with xanga in the cold room
procrastinate with xanga in the shared space
procrastinate with 9gag in the wet lab
procrastinate with 9gag in the cold room
procrastinate with 9gag in the shared space
procrastinate with reddit in the wet lab
procrastinate with reddit in the cold room
procrastinate with reddit in the shared space
