# 1. Iterables and iterators

## What are iterables?


- Any object that you can loop over with a for loop
- Any object that can return its members one at a time
- For instance: lists, tuples, sets, dictionaries, strings, files, generators, etc.

In [8]:
x = ['Anna',8]
for item in x:
    print(item)

Anna
8


## What happens in a for loop?

In a for loop, the `iter()` function is called on the iterable object, and then the `next` function is called on the iterator object until a `StopIteration` exception is raised.

In [9]:
x = ['Anna', 28]
x_iterator = iter(x)
while True:
    try:
        print(next(x_iterator))
    except StopIteration:
        break

Anna
28


Let's look at this process step by step:

- Each iterable object has a "double underscore" method or dunder methos called `__iter__()` that **returns an iterator object**. 
- Alternatively, we can use the `iter()` function. It will call the `__iter__()` method of the list.

In [10]:
x_iterator = x.__iter__() 
x_iterator = iter(x)        # Same as above, but more common
type(x_iterator)

list_iterator

- Each **iterator object** has a dunder method called `__next__()` that **returns the next item in the sequence.**
- Alternatively, we can use the `next()` function.
- If we reach the end of the sequence, it raises a `StopIteration` exception.
- Since we don't want to run into this exception, the for loop automatically catches it and stops the iteration.

In [11]:
x = ['Anna', 28]
x_iterator = iter(x)
print(next(x_iterator))
print(next(x_iterator))
print(next(x_iterator))

Anna
28


StopIteration: 

# 2. Generators

## Motivation

**What happens if the iterable is huge?**, e.g. a list of 1 billion numbers

- It will take up a lot of memory, or maybe not even fit in memory
- It will take a long time to generate the list

**Generators are a helpful alternative to lists in this case.**

- Instead of storing all items in memory, generators produce items one at a time.
- Generators only compute values "on the fly" when needed (_lazy evaluation_).

In [12]:
def square_function(numbers):
    result = []
    for number in numbers:
        result.append(number**2)
    return result

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

In [21]:
def square_generator(numbers):
    for number in numbers:
        yield number**2

- `square_function` first creates a list of squares and then **returns** the list. If the list is huge, it will take long to generate the list and consume a lot of memory.
- `square_generator` is a generator function. It **yields** the square of each number one at a time. It doesn't store the squares in memory. It only computes the next square when it is needed. If we need the next square, we call `next()` on the generator object.

In [22]:
results = square_function(range(10))
results

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [23]:
my_generator = square_generator(range(10))
print(next(my_generator))
print(next(my_generator))

0
1


## Generator functions

- Generator functions are functions that use the `yield` keyword, typically inside a loop
- Multiple `yield` statements are allowed
- Generator functions are therefore suitable for more complex scenarios compared to generator expressions

In [24]:
def transform(numbers):
    for number in numbers:
        yield number+1

    for number in numbers:
        yield number*100

my_generator = transform([1,2,3])
for item in my_generator:
    print(item)

2
3
4
100
200
300


## Generator expressions

- The syntax of generator expressions are similar to list comprehensions, but with parentheses instead of square brackets
- Generator expressions are more concise and easier to read than generator functions
- Generator expressions are suitable for simple scenarios

In [25]:
# List comprehension vs Generator expression
x = [number**2 for number in range(10)]
y = (number**2 for number in range(10))

## Use Cases

### A: Streaming Data

In [27]:
import time
import requests

def iss_position_stream():
    url = "http://api.open-notify.org/iss-now.json"
    while True:
        response = requests.get(url)
        if response.status_code == 200:
            data = response.json()
            position = data['iss_position']
            latitude = position['latitude']
            longitude = position['longitude']
            yield (latitude, longitude)
        else:
            yield None
        time.sleep(2) 


In [None]:
position = iss_position_stream()
for i, position in enumerate(position): 
    print(f"ISS Position: Latitude {position[0]}, Longitude {position[1]}")
    if i == 3:
        break

### B: Working with huge data

- **Pandas can read data in chunks.** 
- **These chunks behave much like generators:** They only load a chunk of data into memory at a time. We can use the `next()` function to load the next chunk. Or we can use a for loop to iterate over the chunks.
- They are useful when working with huge datasets that don't fit into memory.

In [29]:
import pandas as pd

First, we read in the entire data set about movies at once. Afterwards we just filter out movies from before 1910. If this were a huge data set, it would take a long time to load or might even not fit into memory.

In [31]:
movies = pd.read_csv('movies.csv')
movies['year'] = pd.to_numeric(movies.title.str[-5:-1], errors='coerce')
movies[movies.year < 1910]

Unnamed: 0,movieId,title,genres,year
869,82337,Four Heads Are Better Than One (Un homme de tê...,Fantasy,1898.0
1614,129851,Dickson Greeting (1891),(no genres listed),1891.0
1830,32898,"Trip to the Moon, A (Voyage dans la lune, Le) ...",Action|Adventure|Fantasy|Sci-Fi,1902.0
5716,140545,Fantasmagorie (1908),(no genres listed),1908.0
6415,148705,A Hand Shake (1892),(no genres listed),1892.0
6423,98981,"Arrival of a Train, The (1896)",Documentary,1896.0
8810,94951,Dickson Experimental Sound Film (1894),Musical,1894.0
8981,117909,The Kiss (1900),Romance,1900.0


Now, we read in the data set in single chunks. Each chunk fits decently into memory. For each chunk we can filter out the relevant movies from before 1910, and the write the filtered data to a new file. This way we can work with huge data sets that don't fit into memory.

In [32]:
chunks = pd.read_csv('movies.csv', chunksize=10)      # The small chunk size is just for demonstration purposes
for chunk in chunks:
    chunk['year'] = pd.to_numeric(chunk.title.str[-5:-1], errors='coerce')
    chunk = chunk[chunk.year < 1910]
    chunk.to_csv('movies_before_1910.csv', mode='a', header=False)

### C: Infinite Iterators

In [33]:
def count():
    i = 0
    while True:
        yield i
        i += 1

In [34]:
for number in count():
    print(number)
    if number > 10:
        break

0
1
2
3
4
5
6
7
8
9
10
11
