# Notebook 3 - Advanced Data Structures

So far, we have seen numbers, strings, and lists. In this notebook, we will learn three more data structures, which allow us to organize data. The data structures are `tuple`, `set`, and `dict` (dictionary). 

## Tuples
A tuple is like a list, but is immutable, meaning that it cannot be modified.

In [2]:
tup = (1,'a',[1,2])
tup

(1, 'a', [1, 2])

In [3]:
print tup[1]
print tup[2][0]

a
1


In [4]:
tup[1] = 3

TypeError: 'tuple' object does not support item assignment

In [5]:
# We can turn a list into a tuple, and visa versa
print list(tup)
print tuple(list(tup))

[1, 'a', [1, 2]]
(1, 'a', [1, 2])


In [6]:
# We can have a tuple with a single item
single_tup = (1,)
print single_tup

(1,)


## Sets
We consider the `set` data structure. The `set` is thought of similarly to how it is defined in mathematics: it is unordered, and has no duplicates. Let's contrast with the data structures we have seen thus far.

* A `list` is an ordered, mutable collection of items
* A `tuple` is an ordered, immutable collection of items
* A `str` is an ordered, immutable collection of characters
* A `set` is an unordered, mutable collection of distinct items

In [10]:
some_numbers = [0,2,1,1,2]
my_set = set(some_numbers) # create a set out of the numbers

print my_set

set([0, 1, 2])


We observed that, by turning a list into a set, we automatically removed the duplicates. This idea will work on any collection.

In [11]:
my_string = 'aabbccda'
my_set = set(my_string)

print my_set

set(['a', 'c', 'b', 'd'])


In [12]:
'a' in my_set
'a' in my_set and 'e' not in my_set

True

Suppose we wanted to remove `'a'` from `my_set`, but don't know the syntax for it. 

Fortunately, there is built-in help features.

* Typing `my_set.<tab>` lists different member functions of `my_set`.
* The function `help(my_set)` also lists different functions, along with explanations.

In [13]:
my_set.

SyntaxError: invalid syntax (<ipython-input-13-527143c3e54b>, line 1)

In [14]:
help(my_set)

Help on set object:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |  
 |  Build an unordered collection of unique elements.
 |  
 |  Methods defined here:
 |  
 |  __and__(...)
 |      x.__and__(y) <==> x&y
 |  
 |  __cmp__(...)
 |      x.__cmp__(y) <==> cmp(x,y)
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __iand__(...)
 |      x.__iand__(y) <==> x&=y
 |  
 |  __init__(...)
 |      x.__init__(...) initializes x; see help(type(x)) for signature
 |  
 |  __ior__(...)
 |      x.__ior__(y) <==> x|=y
 |  
 |  __isub__(...)
 |      x.__isub__(y) <==> x-=y
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __ixor__(...)
 |      x.__ixor__(y) <==> x^=y
 |  
 |  __le__(...)
 |    

## Dictionaries

These are *very* useful!

Given $n$ values, a list `l` store $n$ values, which can be accessed by `l[i]` for each $i = 0,\ldots,n-1$. 

A *dictionary* is a data structure that allows us to access values by general types of keys. This is useful in designing efficient algorithms and writing simple code.

In [15]:
# Create a dictionary of produce and their prices
product_prices = {} 

# Add produce and prices to the dictionary
product_prices['apple'] = 2
product_prices['banana'] = 2
product_prices['carrot'] = 3

# View the dictionary
print product_prices


{'carrot': 3, 'apple': 2, 'banana': 2}


Dictionaries behave in ways similar to a list.

In [16]:
# Print the price of a banana
print product_prices['banana']

# Check if 'banana' is a key in the dictionary.
print 'banana' in product_prices

# Check if `donut` is a key in the dictonary.
print 'donut' in product_prices

2
True
False


Dictionaries allow us to access their keys and values directly.

In [17]:
# View the keys of the dictionary
produce = product_prices.keys()
print produce

['carrot', 'apple', 'banana']


In [18]:
# The keys are a list
type(produce)

list

In [19]:
# Using list comprehensions, we can find all produce that
# have 6 characters in their name.
print [name for name in product_prices if len(name) == 6]

# Python knows that we want to iterate through the keys of product_prices.
# Equivalently, we can use the following syntax.
print [name for name in produce if len(name) == 6]

['carrot', 'banana']
['carrot', 'banana']


In [20]:
# We can find all produce that have a price of 2 dollars.
print [name for name in product_prices if product_prices[name] == 2]

['apple', 'banana']


In [21]:
# Similarly, we can access the values of the dictionary
print product_prices.values()

[3, 2, 2]


Dictionaries don't have to be indexed by strings. It can be indexed by numbers.

In [22]:
my_dict = {1: 5, 'abc': '123'}
print my_dict

{1: 5, 'abc': '123'}


Dictionaries can be created in several ways. We have seen two so far.

* Creating an empty dictionary with `{}`, then adding (key,value) pairs, one at a time.

* Creating an dictionary at once as `{key1:val1, key2:val2, ...}`

There are more ways to create dictionaries, that are convenient in different situations. We will see one more way.

* Dictionary comprehension
* Combining lists

In [23]:
# Create a dictionary, with a key of i^2 and a value of i for each i in 0,...,1000
squared_numbers = {i**2: i for i in range(10)}

print 81 in squared_numbers
print squared_numbers[81]

True
9


In [24]:
names = ['alice','bob','cindy']
sports = [['Archery', 'Badmitton'], ['Archery', 'Curling'], ['Badmitton', 'Diving']]

# Create a dictionary mapping names to sports
print {names[i]:sports[i] for i in range(len(names))}

{'bob': ['Archery', 'Curling'], 'alice': ['Archery', 'Badmitton'], 'cindy': ['Badmitton', 'Diving']}


In [25]:
# Alternative approach
print dict(zip(names,sports))

{'bob': ['Archery', 'Curling'], 'alice': ['Archery', 'Badmitton'], 'cindy': ['Badmitton', 'Diving']}


## Exercise
### Part 1
Obtain the list of common English words from the 'english_words.txt' file. 

### Part 2
Create a dictionary called `length_to_words` that maps an integer `i` to the list of common English words that have that have `i` letters.

Example: If the words were `['and','if','the']`, then the dictionary would be `{2: ['if'], 3: ['and','the']}`.

Question: Why is a dictionary the correct choice for this data structure?

### Part 3
Create a dictionary named `length_to_num_words` that maps each length in `length_to_words` to the number of words with that length.

Example: If the words were `['and','if','the']`, then the dictionary would be `{2: 1, 3: 2}`.

In [36]:
# Part 1
file_name = 'english_words.txt'
words = [line.rstrip() for line in open(file_name)]

In [42]:
# Part 2
word_lengths = set([len(word) for word in words])
length_to_words = {wl: [word for word in words if len(word) == wl] for wl in word_lengths}

In [44]:
# Part 3
length_to_num_words = {wl: len(length_to_words[wl]) for wl in word_lengths}

In [45]:
length_to_num_words

{1: 26,
 2: 396,
 3: 672,
 4: 1125,
 5: 1382,
 6: 1509,
 7: 1466,
 8: 1166,
 9: 909,
 10: 610,
 11: 376,
 12: 209,
 13: 101,
 14: 39,
 15: 10,
 16: 3,
 18: 1}