# Basic Python Containers: Lists, Dictionaries, Sets, Tuples

**Based on lecture materials by Milad Fatenejad, Joshua R. Smith, Will Trimble, and Anthony Scopatz**

Python would be a fairly useless language if it weren't for the compound data types. The main two are **lists** and **dictionaries**, but we'll discuss **sets** and **tuples** as well. I'll also go over reading text data from files. 

## Lists

A list is an ordered, indexable collection of data. Lets say you have collected some current and voltage data that looks like this:

    voltages:
    -2.0
    -1.0
     0.0
     1.0
     2.0

    currents:
    -1.0
    -0.5
     0.0
     0.5
     1.0

So you could put that data into lists like

In [18]:
voltages = [-2.0, -1.0, 0.0, 1.0, 2.0]

currents = [-1.0, -0.5, 0.0, 0.5, 1.0]

**voltages** is of type list:

In [2]:
type(voltages)

list

Python lists are indexed from zero.  Therefore, to find the value of the first item in `voltages`:

In [3]:
voltages[0]

-2.0

And to find the value of the third item

In [4]:
voltages[2]

0.0

Lists can be indexed from the back using a negative index. This expression gives us the last element of `currents`:

In [5]:
currents[-1]

1.0

and the next-to-last

In [6]:
currents[-2]

0.5

You can "slice" items from within a list. Lets say we wanted the second through fourth items from `voltages`

In [7]:
voltages[1:4]

[-1.0, 0.0, 1.0]

Slicing lists from the beginning until a given index or from a given index to the end 
is so common a task that it has a shortcut.  If you omit the index to the left of 
the colon, it is interpreted as 0.   If you omit the index to the right of the colon, 
it is interpreted as `len(<list>)`.  So this expression takes all the elements starting 
from the third and continuing to the end of the list:

In [9]:
print voltages[2:]
print currents[:2]

[0.0, 1.0, 2.0]
[-1.0, -0.5]


In [10]:
len(voltages)

5

Note: when taking slices, the elements returned **include** the **starting** index but **exclude** the **final** index.  

In [11]:
print voltages[0:len(voltages)]  # this expression gives the entire list.
print voltages[:]                # and this is the shortcut for the above.

[-2.0, -1.0, 0.0, 1.0, 2.0]
[-2.0, -1.0, 0.0, 1.0, 2.0]


The [Python documentation--An informal Introduction to Python](http://docs.python.org/2/tutorial/introduction.html#lists) contains more examples and more details about how to create and access instances of this utilitarian data structure.

Print out the subset of the voltages list -1.0, 0.0, 1.0 using list indexing

In [12]:
# this is a comment
print voltages[1:4]

[-1.0, 0.0, 1.0]


### Append and Extend

Just like strings have methods, lists do too.

In [13]:
dir(list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

One useful method is append. Lets say we want to stick the following data on the end of both our lists.

    voltages:
     3.0
     4.0

    currents:
     1.5
     2.0

If you want to append items to the end of a list, use the append method.

In [19]:
print voltages
voltages.append(3.0)
print voltages

[-2.0, -1.0, 0.0, 1.0, 2.0]
[-2.0, -1.0, 0.0, 1.0, 2.0, 3.0]


In [20]:
print voltages
voltages.append(4.0)
print voltages

[-2.0, -1.0, 0.0, 1.0, 2.0, 3.0]
[-2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0]


In [21]:
voltages

[-2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0]

You can see how that approach might be tedious in certain cases. If you want to concatenate a list onto the end of another one, use extend.

In [22]:
print currents
currents.extend([1.5, 2.0])
print currents

[-1.0, -0.5, 0.0, 0.5, 1.0]
[-1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0]


In [23]:
currents



[-1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0]

In [24]:
l = ['a', 'b']
l.append(['c', 'd'])
print l

['a', 'b', ['c', 'd']]


### Length of Lists



Sometimes you want to know how many items are in a list. Use the len command.

In [26]:
print len(voltages)
print len('abcdefg')

7
7


### Heterogeneous Data



Lists can contain hetergeneous data.

In [27]:
data_list = ["experiment: current vs. voltage", 
        "run", 47,
        "temperature", 372.756, 
        "current", [-1.0, -0.5, 0.0, 0.5, 1.0], 
        "voltage", [-2.0, -1.0, 0.0, 1.0, 2.0],
        ]

In [31]:
print data_list
print data_list[6]
print data_list[6][0]

['experiment: current vs. voltage', 'run', 47, 'temperature', 372.756, 'current', [-1.0, -0.5, 0.0, 0.5, 1.0], 'voltage', [-2.0, -1.0, 0.0, 1.0, 2.0]]
[-1.0, -0.5, 0.0, 0.5, 1.0]
-1.0


We've got strings, ints, floats, and even other lists in there. 

append 5.0 to both voltages and currents

In [32]:
voltages.append(5.0)
currents.append(5.0)
print voltages, currents

[-2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0] [-1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 5.0]


In [34]:
data_list[6].append('hi there')
print data_list[6]
print data_list

[-1.0, -0.5, 0.0, 0.5, 1.0, 'hi there', 'hi there']
['experiment: current vs. voltage', 'run', 47, 'temperature', 372.756, 'current', [-1.0, -0.5, 0.0, 0.5, 1.0, 'hi there', 'hi there'], 'voltage', [-2.0, -1.0, 0.0, 1.0, 2.0]]


## Assigning Variables to Other Variables

Something that might cause you headaches in the future is how python deals with assignment of one variable to another. When you set a variable equal to another, both variables point to the same thing. Changing the first one ends up changing the second. Be careful about this fact.

In [35]:
a = [1, 2]

In [41]:
b = a

In [37]:
a.append(10)

In [38]:
b

[1, 2, 10]

To actually make a copy of a list or a dict, use the `list()` and `dict()` builtin methods.

In [39]:
a = [1, 2]
b = list(a)
a.append(10)
print a
print b

[1, 2, 10]
[1, 2]


As a diagnostic tool, if you are ever uncertain if an assignment is making two different variables, 
you can test whether two variables are actually distinct with the `is` operator:

In [42]:
a is b

True

In [43]:
3 is 3.0

False

In [44]:
a[0] = 'hi'
print a

['hi', 2, 10]


There's a ton more to know about lists, but let's press on. Check out [Dive Into Python](http://www.diveintopython.net/toc/index.html) or the [Python documentation](http://docs.python.org/2/) for more info.

## Tuples

Tuples are another of Python's basic container data types. They are very similar to lists but with one major difference. Tuples are **immutable**. Once data is placed into a tuple, the tuple cannot be changed. You define a tuple as follows:

In [45]:
tup = ("red", "white", "blue")

In [46]:
type(tup)

tuple

In [47]:
tup[0] = 'pink'

TypeError: 'tuple' object does not support item assignment

You can slice and index the tuple exactly like you would a list. Tuples are used in the inner workings of python, and a tuple can be used as a key in a dictionary, whereas a list cannot as we will see in a moment.

See if you can retrieve the third element of **tup**:

In [48]:
 tup[2]

'blue'

In [51]:
a = 'red',
type(a)

tuple

## Sets



The Python set type is similar to the idea of a mathematical set: it is an unordered collection of unique things. Consider:

In [52]:
fruit = {"apple", "banana", "pear", "banana"}
print fruit

set(['pear', 'banana', 'apple'])


You have to use a list to create a set.

Since sets contain only unique items, there's only one banana in the set fruit.

You can do things like intersections, unions, etc. on sets just like in math. Here's an example of an intersection of two sets (the common items in both sets).

In [53]:
bowl1 = {"apple", "banana", "pear", "peach"}

In [54]:
bowl2 = {"peach", "watermelon", "orange", "apple"}

In [55]:
print bowl1
print bowl2

set(['pear', 'banana', 'peach', 'apple'])
set(['orange', 'watermelon', 'apple', 'peach'])


In [56]:
bowl1 & bowl2

{'apple', 'peach'}

In [57]:
bowl1 | bowl2

{'apple', 'banana', 'orange', 'peach', 'pear', 'watermelon'}

You can read more in the [sets documentation](http://docs.python.org/2/library/sets.html).

## Dictionaries

A Python dictionary is a unordered collection of key-value pairs.  Dictionaries are by far the most important data type in Python. The key is a way to name the data, and the value is the data itself. Here's a way to create a dictionary that contains all the data in our data.dat file in a more sensible way than a list.

In [58]:
data_dict = {"experiment": "current vs. voltage",
        "run": 47,
        "temperature": 372.756, 
        "currents": [-1.0, -0.5, 0.0, 0.5, 1.0], 
        "voltages": [-2.0, -1.0, 0.0, 1.0, 2.0],
        }

In [59]:
print data_dict

{'experiment': 'current vs. voltage', 'run': 47, 'temperature': 372.756, 'currents': [-1.0, -0.5, 0.0, 0.5, 1.0], 'voltages': [-2.0, -1.0, 0.0, 1.0, 2.0]}


This model is clearly better because you no longer have to remember that the run number is in the second position of the list, you just refer directly to "run":

In [60]:
data_dict["run"]

47

If you wanted the voltage data list:

In [62]:
data_dict["voltages"][0]

-2.0

Or perhaps you wanted the last element of the current data list

In [63]:
data_dict["currents"][-1]

1.0

Once a dictionary has been created, you can change the values of the data if you like.

In [64]:
data_dict["temperature"] = 3275.39

You can also add new keys to the dictionary.  Note that dictionaries are indexed with square braces, just like lists--they look the same, even though they're very different.

In [66]:
data_dict["user"] = "Johann G. von Ulm"
print data_dict["user"]

Johann G. von Ulm


In [67]:
# print out the element with key of experiment
print data_dict['experiment']

current vs. voltage


Dictionaries, like strings, lists, and all the rest, have built-in methods. Lets say you wanted all the keys from a particular dictionary.

In [68]:
data_dict.keys()

['run', 'temperature', 'voltages', 'experiment', 'user', 'currents']

also, values

In [69]:
data_dict.values()

[47,
 3275.39,
 [-2.0, -1.0, 0.0, 1.0, 2.0],
 'current vs. voltage',
 'Johann G. von Ulm',
 [-1.0, -0.5, 0.0, 0.5, 1.0]]

In [70]:
print data_dict

{'run': 47, 'temperature': 3275.39, 'voltages': [-2.0, -1.0, 0.0, 1.0, 2.0], 'experiment': 'current vs. voltage', 'user': 'Johann G. von Ulm', 'currents': [-1.0, -0.5, 0.0, 0.5, 1.0]}


In [72]:
import pprint
pprint.pprint(data_dict)

{'currents': [-1.0, -0.5, 0.0, 0.5, 1.0],
 'experiment': 'current vs. voltage',
 'run': 47,
 'temperature': 3275.39,
 'user': 'Johann G. von Ulm',
 'voltages': [-2.0, -1.0, 0.0, 1.0, 2.0]}
