# Chapter 9. Python toolbox

# 9.1 Using iterators in PythonLand

## Introduction to iterators

The reason that we can loop over specific objects is that they are iterables (lists, strings and range objects, among others).

- An iterable is an object that has an associated iter method.
    - Once this iter method is applied to an iterable, an iterator object is created.
- An iterator is defined as an object that has an associated next method that produces the consecutive values.
    - To create an iterator from an iterable, all we need to do is use the function iter and pass it the iterable.
- Once we have the iterator defined, we pass it to the function next and this returns the first value.
    - Calling next again on the iterator returns the next value until there are no values left to return and then it throws us a StopIteration error.

```
word = 'Da'
it = iter(word)

next(it)
# result = 'D'

next(it)
# result = 'a'

next(it)
# StopIteration error displayed
```

### Iterating at once with ``*``

- You can also print all values of an iterator in one fell swoop using the star operator, referred to as the splat operator in some circles.
- This star operator unpacks all elements of an iterator or an iterable.
- Be warned, however, once you do so, you cannot do it again as there are no more values to iterate through. We would have to redefine our iterator to do so.

```
word = 'Data'
it = iter(word)

print(*it)
# result
# D a t a

print(*it)
# No more values to go through!
```

### Iterating over dictionaries

- Dictionaries and file connections are iterables as well.
- To iterate over the key-value pairs of a Python dictionary, we need to unpack them by applying the items method to the dictionary

```
pythonistas = {'hugo':'bowne-anderson', 'francis':'castro'}
for key, value in pythonistas.items():
    print(key, value)
# results printed below
# hugo bowne-anderson
# francis castro
```

### Iterating over file connections

- File connections are iterables as well.

```
file = open('file.txt')
it = iter(file)

print(next(it))
# This is the first line

print(next(it))
# This is the second line
```

### Iterators as function arguments

- There are also functions that take iterators and iterables as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively.

```
# Create a range object: values
values = range(10, 21)

# Print the range object
print(values)

# Create a list of integers: values_list

values_list = list(values)
# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)
```

## Playing with iterators

### 1. Using enumerate()

- ``enumerate()`` is a function that takes any iterable as argument, such as a list, and returns a special enumerate object, which consists of pairs containing the elements of the original iterable, along with their index within the iterable.
- We can use the function list to turn this enumerate object into a list of tuples and print it to see what it contains.

```
avengers = ['hawkeye', 'iron man', 'thor, 'quicksilver']
e = enumerate(avengers)

print(type(e))
# <class 'enumerate'>

e_list = list(e)
print(e_list)
# [(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]
```

#### 2. enumerate() and unpack

- It is also an iterable and we can loop over it while unpacking its elements using the clause for index, value in ``enumerate(iterable)``.
- It is the default behavior of enumerate to begin indexing at 0.
    - However, you can alter this with a second argument, start,

```
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers):
    print(index, value)
# 0 hawkeye
# 1 iron man
# 2 thor
# 3 quicksilver
```

```
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers, start=10):
    print(index, value)
# 10 hawkeye
# 11 iron man
# 12 thor
# 13 quicksilver
```

### 3. Using ``zip()``

- ``zip()``, which accepts an arbitrary number of iterables and returns an iterator of tuples.

```
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
z = zip(avengers, names)

print(type(z))
# <class 'zip'>

z_list = list(z)
print(z_list)
# [('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]
```

### 4. ``zip()`` and unpack

- We could use a for loop to iterate over the zip object and print the tuples.

```
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
for z1, z2 in zip(avengers, names):
    print(z1,z2)

# hawkeye barton
# iron man stark
# thor odinson
# quicksilver maximoff
```

### 5. Print zip with ``*``

- We could also have used the splat operator to print all the elements!

```
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
z = zip(avengers, names)
print(*z)

# hawkeye barton
# iron man stark
# thor odinson
# quicksilver maximoff
```

## Using iterators to load large files into memory

Let's say that you are pulling data from a file, database or API and there's so much of it, just so much data, that you can't hold it in memory.

- One solution is to load the data in chunks, perform the desired operation or operations on each chuck, store the result, discard the chunk and then load the next chunk
- To surmount this challenge, we are going to use the pandas function ``read_csv`()``, which provides a wonderful option whereby you can load data in chunks and iterate over them.
- All we need to do is to specify the chunk using the argument ``chunksize``.

### 1. Iterating over data

- Use case: a csv with a column called 'x' of numbers and we want to compute the sum of all the numbers in that column.

```
import pandas as pd
result = []
for chunk in pd.read.csv('data.csv', chunksize=1000):
    result.append(sum(chunk['x']))
total = sum(result)
print(total)
```

- Note that we need not have used a list to store each result - we could have initialized total to zero before iterating over the file and added each sum during the iteration procedure

```
import pandas as pd
total = 0
for chunk in pd.read.csv('data.csv', chunksize=1000):
    total += sum(chunk['x'])
print(total)
```

# 9.2 List comprehensions and generators

## List comprehensions

- Loops are inefficient, both computationally and in terms of coding time and space
- You can create a new list of numbers (i.e. that's the same as the old list, except that each number has 1 added to it) in one line of code.

###  1. A list comprehension

- List comprehensions collapse for loops for building lists into a single line and the required components are:
    1) an iterable 
    2) an iterator variable that represents the members of the iterable
    3) an output expression.

- Using a for loop

```
nums = [12, 8, 21, 3, 16]
new_nums = []
for num in nums:
    new_nums.append(num+1)
print(new_nums)
```

- Using a list comprehension

```
nums = [12, 8, 21, 3, 16]
new_nums = [num + 1 for num in nums]
print(new_nums)
```

- List comprehension with ```range()``

```
new_nums = [num + 1 for num in range(11)]
print(new_nums)
```

###  2. Nested loops

- Within the square brackets, place the desired output expression followed by the two required for loop clauses.
- While it keeps to a single line of code, we sacrifice some readability of the code as a tradeoff

```
pairs_2 = [(num1, num2) for num1 in range (0, 2) for num2 in range(6, 8)]
print(pairs_2)
# [(0, 6), (0, 7), (1, 6), (1, 7)]
```

- One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python.

```
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]

# Print the matrix
for row in matrix:
    print(row)
```

## Advanced comprehensions

### Conditionals in iterables

- We can filter the output of a list comprehension using a conditional on the iterable

```
[num ** 2 for num in range(10) if num % 2 == 0]
# [0, 4, 16, 36, 64]
```

### Conditionals in the output expression

- We can also condition the list comprehension on the output expression.

```
[num ** 2 if num % 2 = 0 else 0 for num in range(10)]
# [0, 0, 4, 0,  16, 0, 36, 0, 64, 0]
```

### Dictionary comprehensions

- We can also write dictionary comprehensions to create new dictionaries from iterables.
- The syntax is almost the same as in list comprehensions and there are 2 differences.
    1) We use curly braces instead of square brackets.
    2) The key and value are separated by a colon in the output expression.

```
pos_neg = {num: -num for num in range(9)}
print(pos_neg) # dictionary, with {key1: value1,...,key_n: value_n}
```

## Introduciton to generator expressions

### 1. Generator expressions

- A generator is like a list comprehension except it does not store the list in memory: it does not construct the list, but is an object we can iterate over to produce elements of the list as required.
- The square brackets are replaced with round parentheses.
- Example of a list comprehension:
```
[2 * num for num in range(10)]
```

- Example of a generator:

```
(2 * num for num in range(10))
```

### 2. Printing values from generators

- We can also pass a generator to the function list to create an iterable.

```
result = (num for num in range(6))
print(list(result))
```

### 3. Conditionals in generator expressions

- Anything we can do in a list comprehension such as filtering and applying conditionals.

```
even_nums = (num for num in range(10) if num % 2 == 0)
print(list(even_nums))
```

#### 4. Generator functions

- Functions that, when called, produce generator objects.
- Written with the syntax of any other user-defined function, however instead of returning values using the keyword return, they yield sequences of values using the keyword yield.

```
# sequence.py

def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1

result = num_sequence(5)
print(type(result))
# <classs 'generator'>

for item in result:
    print(item)
# 0
# 1
# 2
# 3
# 4
```