# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Iterable Data Structures](#Iterable-Data-Structures)
	* [Lists](#Lists)
		* [Adding items](#Adding-items)
		* [Concatenation and Arithmetic](#Concatenation-and-Arithmetic)
		* [Indexing](#Indexing)
	* [Sets](#Sets)
		* [Adding items](#Adding-items)
		* [Arithmetic](#Arithmetic)
	* [Dictionaries](#Dictionaries)
		* [Adding keys and values](#Adding-keys-and-values)
		* [Using .items()](#Using-.items%28%29)
		* [Useful member methods](#Useful-member-methods)
		* [len](#len)
		* [sum](#sum)
		* [sorted](#sorted)
		* [min and max](#min-and-max)
	* [Slicing](#Slicing)
* [Choosing Data Structures](#Choosing-Data-Structures)
	* [Nested data structures](#Nested-data-structures)
		* [Aside: Introspection](#Aside:-Introspection)


# Learning Objectives:

After completion of this module, learners should be able to:

* understand and apply Python idioms for iteration data structures
* understand how to avoid writing *C in Python* loops

# Iterable Data Structures

One of the central tenets of Python philosophy is to *think about the data*. This is in contrast to thinking about the structure. This principle is most evident in working with *container* data structures like `dict`, `list`, `tuple` and `set`. These data structures are also called *iterables* because we can loop over them.

Python is different from many programming languages in that programs less often use the *index* positions of items in a sequence, and instead usually just loop through the items themselves. In some programming languages, such as C, either the only way or the most convenient way to access items in a collection is by naming their index position in the collection. Python loops need not care about this most times.

We'll introduce the word *Pythonic* to mean programs that follow idiomatic Python syntax, meaning there is usually one obvious way to implement an algorithm. *Pythonic* coding can be applied at all levels of a program, not just iteration. 

## Lists

Here a list of numeric `types` is stored in the list `numbers`. Unlike other languages these numeric types are objects not keywords and can be treated like any other object.

In [None]:
# iterate over elements in lists or tuples
types = [int,float,complex]
for m_type in types:
    print(m_type(1))

### Adding items

New objects of any type can be *appended* to the list using the `.append()` member method. Here I'm appending the object `str`, which is used to make or cast strings.

In [None]:
types.append(str)
for m_type in types:
    print(type(m_type(1)),m_type(1))

### Concatenation and Arithmetic

Lists can be added (concatenated) to other lists with the `+` operator.

In [None]:
new_list = types + [list, set, dict]
for m_type in new_list:
    print(m_type)

### Empty lists

An empty list is generated using `[]`. Afterwards, items can be appended to the list.

In [None]:
another_list = []
another_list.append(42)
another_list.append('string')
another_list

### Indexing

Although note required to be able to *iterate* over a `list`, we can index a list using `[]`. All indexing starts at zero. We'll use the `[]` notation to perform slicing in the next section.

In [None]:
new_list[0]

In [None]:
new_list[2]

In [None]:
# we can use -1 to index the last item
new_list[-1]

In [None]:
# high negative numbers work backwards
# list is 3rd from the end
new_list[-3]

By using the `*` operator a list can concatenate itself.

In [None]:
another_list = [0] * 5
for i in another_list:
    print(i)

## Sets

`sets` are also iterable.

In [None]:
numbers = {3,4,3,2,5,8,9,2,3,4,0}
for n in numbers:
    print(n)

Notice that sets don't support indexing. The only way to index a set by integer position would be to first cast it to a list (`list(numbers)`), which in many cases is not necessary.

In [None]:
#this cell will produce a TypeError
numbers[0]

### Adding items

New items can be added to the `set` by using the `.add()` member function. Again, any type can be added. Notice that I add the number `2.0` to the set, but I still only see one `2` in the output. This is beause `2.0 == 2` so only the integer version appears since it was already there.

In [None]:
numbers.add(2.0)
numbers.add(7)
numbers.add(2.1)
for n in numbers:
    print(n)

### Arithmetic

Sets support full many aspects of [Set Theory](https://en.wikipedia.org/wiki/Set_theory) such as join using the `|` operator. In this cell we are creating the union of names in `users` and `admin`. Note that `Fred` is an administrator, but not a user.

In [None]:
users = {'Dave', 'Bob', 'Alice', 'Doug'}
admin = {'Bob', 'Alice', 'Fred'}
users | admin

We can use the `-` operator to take the difference between two sets. `Dave` and `Doug` are not administrators. We'll see an interesting use of this operator in a later example.

In [None]:
users - admin

## Dictionaries

Dictionaries provide a key, value mapping such that unique `keys`, usually strings, map to exactly one `value` that can be of any type. A `value` object, object however, can map to multiple keys.

Dictionaries are iterable over the `keys`. `.keys()` returns a set-like object.

In [None]:
md = {'state':'MD', 'population':5.796e6}

Dictionaries are indexed using `[]` notation with the `key`.

In [None]:
md['state']

In [None]:
# iterate over the keys of a dictionary
for key in md:
    print(key,md[key])

In [None]:
# equivalent to cell above
for key in md.keys():
    print(key,md[key])

### Adding keys and values

A new `key`, `value` pair can be added simply by assigning it using the `[]` notation.

In [None]:
md['flower'] = 'Sunflower'
for key in md:
    print(key,md[key])

### Using .items()

The most convenient and *Pythonic* way to iterate over keys and values of a dictionary is by using the `.items()` member function. `dict.items()` returns a list-like collection of tuples of key,value pairs. In the `for` loop each key, value pair is assigned to the name `key` and `value` using the principle of tuple expansion.

In [None]:
# An example of a for loop over dict.items(), 
# which unpacks into a (key, value) pair
goog = {'acquired': '2015-01-15',
        'broker': 'Roberto Cruz',
        'price': 521.78,
        'shares': 100,
        'symbol': 'GOOG'}
for key, value in goog.items():
    print(key.ljust(10), ":", value)

Remember to look at the `help` output of data structures for more useful functions. All of the tuple expansion rules can be used with the `in` keyword.

### Useful member methods

There are several useful methods builtin to Python that perform operations on container objects. Three of the most heavily used methods are `sum()`, `len()` and `sorted()`.

These three methods take as input the container object.

### len

`len` returns the number of elements in the container.

In [None]:
len(numbers)

In [None]:
len(types)

In [None]:
len(md.keys())

In [None]:
# this is equivalent to md.keys()
len(md)

### sum

In [None]:
# add all of the numbers from out set object
sum(numbers)

### sorted

`sorted` takes as input a container object and returns a new container with the objects in lexical or numeric order. This only works on `lists` and `tuples`, `sets` are already ordered. Only objects that support `>` and `<` operators can be ordered.

In [None]:
sorted(md.keys())

In [None]:
# we can mix integers and floats
more_numbers = [20, 423.2, -1.2, 2, 4, 3, 10, 8]
sorted(more_numbers)

### min and max

The `min()` and `max()` functions can take as input any container that has values which can be sorted.

In [None]:
max(numbers)

In [None]:
# mixed integers and floats
min(more_numbers)

## Slicing

*Slicing* is a core concept to Python containers. Slicing allows direct access to subsets of a container that would only be possible through looping operations in some other languages.

We'll begin by using `range` to make a list of integers.

In [None]:
# here is a list of the integers 0 - 9
integers = list(range(10))
integers

A slice is a set of two index positions, *start* and *end* within the list separated by a `:` in the `[]` notation we saw earlier. This allows us to select a linear region of the list starting from the *start* position and ending one element before the *end* position.

The difference between *end* and *start* will be the number of elements returned.

**Remember**: the last element returned by the slice the `end - 1` position.

In [None]:
# slice the first three elements
integers[0:3]

In [None]:
integers[2:6]

In [None]:
integers[4:-2]

We can change the *stride* length with a third integer in our slice. The default stride is `1`.

In [None]:
# return the even positions starting at 0 through position 5
integers[0:6:2]

If we leave an element of the slice blank it will assume either the first or the last element.

In [None]:
# slice every 3 elements starting at position 2 to the end
integers[2::3]

In [None]:
# start at 0 and slice every 2 elements up to position 7
integers[:7:2]

In [None]:
# stride by 4 from start to finish
integers[::4]

Finally, since we have the ability to index backwards starting at `-1` we can easily reverse a list and do other interesting slices by specifying a negative stride.

In [None]:
# this magically reverses start and end by default
integers[::-1]

In [None]:
# even positions from 8 to 2 backwards by 2
integers[-2:-10:-2]

# Choosing Data Structures

The choice of data structures to express a programming problem has a direct impact on the available options for iteration. Our goal is to choose a data structure that makes iteration easy to read and understand.

One way of thinking about this is to *let the data structure iterate itself*. This approach almost always leads to more readable code and it is what experienced Python programmers expect to see when reading new code.

Let's work through an example data set using the 2016 Federal Budget. The raw data is available from the White House on [Github](https://github.com/WhiteHouse/2016-budget-data). Five individual lists of the same length have been created to store contributions to the US Federal Budget from 2010 to 2016.

In [None]:
# Total Federal budget from 2010 to 2016
federal=[3457079000., 3603059000., 3536951000., 3454647000., 3506089000., 3758577000., 3999467000.]

# Subfunction Titles
# Education (elementary through higher education)
edu=[ 94169000.,  67584000.,  59605000.,  41882000.,  60917000., 104189000., 69939000.]

# Basic Research
research=[11730000., 12434000., 12458000., 12479000., 12011000., 12271000., 12824000.]

# Social Security
soc=[706737000., 730811000., 773290000., 813551000., 850533000., 896294000., 944338000.]

# Military
defense=[666703000., 678064000., 650851000., 607795000., 577897000., 567703000., 586479000.]

We might want to know in what years did education and research spending reach 3% of the total budget. In order to do that we also need another list to store the years.

In [None]:
# We will also need a separate list for the years
years=[2010,2011,2012,2013,2014,2015,2016]

In [None]:
for i in range(len(years)):
    percent = (edu[i] + research[i])/federal[i] * 100
    if percent >= 3:
        print(years[i],percent)

The above choice of data structures and looping can be classified as *writing C in Python.* This is **not recommended** and we'll see a more *Pythonic* way in the next section.

## Nested data structures

Instead of keeping track of 6 separate list objects and requiring that the developer *remember* that they are indexed from 2010 to 2016 we can nest a collection of dictionaries in a list. Our goal is to create a single data structure that can iterate itself without any prior knowledge of the contents or reasons for choosing the data structure.

Here we're introducing `zip`, which takes an arbitrary number of container objects and iterates them collectively. Using `zip` is considered a more *Pythonic* way of iterating in lock-step through multiple container objects at once rather than using a positional indexer.

In [None]:
# Each entry of the list is the yearly budget amounts
budget = []
for year, f, r, s, e, d, in zip(years,federal,research,soc,edu,defense):
    # Each year is a dictionary
    budget.append({
        'year':year,
        'federal':f,
        'research':r,
        'soc':s,
        'edu':e,
        'defense':d
    })

`budget` is now a list of dictionaries and we can use this single data structure to determine the years where research and education spending exceeded 3%.

In [None]:
for year in budget:
    percent = (year['edu'] + year['research'])/year['federal'] * 100
    if percent >= 3:
        print(year['year'],percent)

### Aside: Introspection

With data structures like `budget` a Python programmer can use idiomatic Python syntax to inspect the object. The programmer wants to know what type the container is and what it contains. In this way the programmer is letting the data teach him or her about itself. No prior knowledge or assumptions were required.

In [None]:
# What is budget
type(budget)

In [None]:
# What does it contain
for thing in budget:
    print(type(thing))

In [None]:
# Dictionaries generally have keys that mean something.
for thing in budget:
    print(thing.keys())

In [None]:
# All of the dictionaries look the same.
# What do they contain?
for key,value in budget[0].items():
    print("%s: %s" % (key,type(value)))

By running these few lines of code, which could easily be done in a REPL or Jupyter notebook the programmer now knows that `budget` is a list of dictionaries that contain numerical values for the same set of 6 keys. The best part is that dictionary keys are self-documenting and meaning can be derived from them.