## Iterables

Fundamental in data processing

## The Python `list`

`[]`

We can use the Python bulit-in function `range` to make a list of numbers:

In [None]:
x = list(range(100000))

x[:10]

We can see how many items are in our list using the Python builtin `len`:

In [None]:
len(x)

We can see if an item is in our list:

In [None]:
42 in x

The Python list can hold objects of different types
- this gives the programmer flexibility
- cost of memory used by a list

Lists reserve space for more objects than they have

Another common Python iterable is the numpy array, which will use less memory:
- objects being the same si
e means you can lay out memory more efficiently

https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

In [None]:
import sys

sys.getsizeof(x)

In [None]:
import numpy as np

sys.getsizeof(np.array(x))

Numpy arrays are also quicker due the operations being written in C:
- numpy = C with Python bindings

### Iterating over a list

In [None]:
x = [2, 4, 6]

y = []
for item in x:
    y.append(item * 2)
    
y

## List comprehensions

Another example of being Pythonic.  The list comprehension will **return a new list**.

Don't worry if the list comprehension syntax isn't immediately intuitive - you will get it eventually :)

In [None]:
y = [item * 2 for item in x]
y

We can put a conditional inside the list comp:

In [None]:
y = [item * 2 for item in x if item == 4]
y

## Iteration turns integral into a sum

In [None]:
from scipy.integrate import quad

def integrand(x):
    return x**2

ans, err = quad(integrand, 0, 1)
ans

In [None]:
import numpy as np

x = np.linspace(0, 1, 1000000)

step = x[1] - x[0]

f = [integrand(v) for v in x]

area = sum([step*v for v in f])
area

## Common patterns with looping

Appending to an empty list:

In [None]:
from random import gauss

data = []
for _ in range(5):
    data.append(gauss(0, 1))

data

Appending dicts to lists and making a pandas `DataFrame`:

In [None]:
from random import random as uniform

import pandas as pd

data = []
for _ in range(5):
    data.append(
        {'standard-normal': gauss(0, 1),
         'uniform': uniform()}
    )

for d in data:
    print(d)
pd.DataFrame(data)

In [None]:
data = {'standard-normal': [], 'uniform': []}
for _ in range(5):
    data['standard-normal'].append(gauss(0, 1))
    data['uniform'].append(uniform())

    print(data)
    
#pd.DataFrame(data)
data = pd.DataFrame(data, index=list(range(5)))

In [None]:
data

In [None]:
data.shape

In [None]:
for n in range(data.shape[1]):
    print(data.iloc[:, n])
    print(' ')

## `zip()`

Looping over two things at the same time:

In [None]:
f = list(range(0, 6))
s = list(range(6, 12))

assert len(f) == len(s)

for first, second in zip(f, s):
    print(first, second)

## `enumerate()`

Enumerate gives us an integer index as we enumerate:

In [None]:
x = list(range(100, 105))

for idx, item in enumerate(x):
    print(idx, item)

We can also start the index at a value other than zero:

In [None]:
for idx, item in enumerate(x, 2):
    print(idx, item)

In [None]:
data

## List algebra

In Python we can do interesting things with list addition & multiplication:

In [None]:
data = [
    0, 1, 0, 1, 1
]

data * 2

In [None]:
data + data

## Exercise

Create a **Cartesian product** - all the combinations between two lists:

In [None]:
colors = ['white', 'black']
sizes = ['small', 'medium', 'large']

## Indexing

Python uses **zero-based indexing**.

Index the first element at `0`:

In [None]:
x = list(range(100000))

x[0]

And the last at `-1`:

In [None]:
x[-1]

## Slicing

We can select slices using similar notation:

In [None]:
x[4:8]

## Strings are iterables

We can slice them:

In [None]:
gita = 'The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA'

gita[:38]

We can also add them together:

In [None]:
bohr = 'Prediction is very difficult, especially if it is about the future - NEILS BOHR'

quotes = gita + ', ' + bohr
quotes

The above is a csv (comma separated value) string:

In [None]:
import csv

list(csv.reader([quotes]))

Above we can see a problem - we have commas in the quotes.  

## Writing data to files

First - see [using an API]()
- assuming knowledge of context management

We can use Python's `open` bulit-in to write to a file:

In [None]:
quotes = [bohr, gita]
with open('./quotes.txt', 'w') as dump:
    for line in quotes:
        dump.write(line)
        dump.write('\n')

Run bash commands to print file (`cat`) and then remove it (`rm`):

In [None]:
!cat quotes.txt
!rm quotes.txt