# Python Data Science Toolbox (Part 2)

## Chapter 1: Using iterators in PythonLand

### Introduction to iterators

#### Iterating with a for loop
* We can iterate over a list using a for loop

In [53]:
employees = ['Nick', 'Lore', 'Hugo']

for employee in employees:
    print(employee)

Nick
Lore
Hugo


#### Iterators vs. iterables
* Iterable
    * Examples: lists, strings, dictionaries, file connections
    * An object with an associated `iter()` method
    * Applying `iter()` to an interable creates an iterator
* Iterator
    * Produces next value with `next()`
    
#### Iterating over iterables: next()
* Calling next on the iterator returns the next value until there is nothing else left to return

In [1]:
word = 'Da'
it = iter(word)
next(it)

'D'

In [2]:
next(it)

'a'

In [3]:
next(it)

StopIteration: 

#### Iterating at once with *
* You can also print all the values of an iterator in one fell swoop using the star operator
* This is referred to as the "splat" operator in some circles

In [4]:
word = 'Data'
it = iter(word)
print(*it)

D a t a


#### Iterating over dictionaries

In [5]:
pythonistas = {'hugo':'bowne-anderson', 'francis':'castro'}
for key, value in pythonistas.items():
    print(key, value)

hugo bowne-anderson
francis castro


#### Iterating over file connections

In [10]:
file = open('datasets/huck_finn.txt')
it = iter(file)
print(next(it))





In [12]:
print(next(it))

The Project Gutenberg EBook of Adventures of Huckleberry Finn, Complete



### Playing with iterators

#### Using enumerate()
* `enumerate()` is a function that takes any iterable as an argument, and returns a special "enumerate" object, which consists of pairs containing the elements of the original iterable, along with their index within the iterable

In [13]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
e = enumerate(avengers)
print(type(e))

<class 'enumerate'>


In [14]:
e_list = list(e)
print(e_list)

[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


#### enumerate() and unpack
* The `enumerate()` object itself is also an iterable, and we can loop over it while unpacking its elements

In [16]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers):
    print(index, value)

0 hawkeye
1 iron man
2 thor
3 quicksilver


In [17]:
# To change the default start index, use the optional argument 'start'

for index, value in enumerate(avengers, start = 10):
    print(index, value)

10 hawkeye
11 iron man
12 thor
13 quicksilver


#### Using zip()
* `zip()` accepts an arbitrary number of iterables and returns an iterator of tuples

In [58]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']

z = zip(avengers, names)
print(type(z))

<class 'zip'>


In [59]:
z_list = list(z)
z_list

[('hawkeye', 'barton'),
 ('iron man', 'stark'),
 ('thor', 'odinson'),
 ('quicksilver', 'maximoff')]

#### zip() and unpack

In [20]:
for z1, z2 in zip(avengers, names):
    print(z1, z2)

hawkeye barton
iron man stark
thor odinson
quicksilver maximoff


In [22]:
z = zip(avengers, names)
print(*z)

('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


### Using iterators to load large files into memory

#### Loading data in chunks
* There can be too much data to hold in memory
* Solution: load data in chunks!
* Pandas function `read_csv()`
    * Specify the chunk: `chunk_size`
    
#### Iterating over data

In [24]:
import pandas as pd
result = []

for chunk in pd.read_csv('datasets/titanic.csv', chunksize=1000):
    result.append(sum(chunk['Fare']))

total = sum(result)
total

28693.949299999967

In [25]:
# another way
total = 0

for chunk in pd.read_csv('datasets/titanic.csv', chunksize=1000):
    total += sum(chunk['Fare'])

total

28693.949299999967

## Chapter 2: List comprehensions and generators

### List comprehensions

#### Populate a list with a for loop
* Let's say you have a list and want to add 1 to each element in the list
* You can loop through all of the values in a for loop, but for loops are inefficient, both computationally, and in terms of coding time and space, particualry when you could do this in one line of code

In [26]:
nums = [12, 8, 21, 3, 16]

new_nums = []

for num in nums:
    new_nums.append(num + 1)
    
print(new_nums)

[13, 9, 22, 4, 17]


####  A list comprehension

In [27]:
nums = [12, 8, 21, 3, 16]
new_nums = [num + 1 for num in nums]
new_nums

[13, 9, 22, 4, 17]

#### List comprehension with range()

In [28]:
result = [num for num in range(11)]
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


#### List comprehensions
* Collapse for loops for building list into a single line
* Components
    * Iterable
    * Iterator variable (represent members of iterable)
    * Output expression

#### Nested loops (1)

In [32]:
pairs_1 = []

for num1 in range(0,2):
    for num2 in range(6,8):
        pairs_1.append((num1, num2))
        
print(pairs_1)

[(0, 6), (0, 7), (1, 6), (1, 7)]


* How to do this with a list comprehension?

#### Nested loops (2)

In [33]:
pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]
pairs_2

[(0, 6), (0, 7), (1, 6), (1, 7)]

### Advanced comprehensions

#### Conditionals in comprehensions
* Conditionals on the iterable

In [34]:
[num ** 2 for num in range(10) if num % 2 == 0]

[0, 4, 16, 36, 64]

* Conditionals on the output expression

In [61]:
[num ** 2 if num % 2 == 0 else 0 for num in range(10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

#### Dict comprehensions
* Create dictionaries
* Use curly braces `{}` instead of brackets `[]`

In [36]:
pos_neg = {num: -num for num in range(9)}
pos_neg

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}

In [37]:
print(type(pos_neg))

<class 'dict'>


### Introduction to generator expressions

#### Generator expressions
* Recall list comprehension

In [38]:
[2 * num for num in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

* Use `()` instead of `[]`

In [39]:
(2 * num for num in range(10))

<generator object <genexpr> at 0x10e8faba0>

#### List comprehensions vs. genererators
* A generator is like a list comprehension except that it does not store the list in memory
* List comprehension - returns a list
* Generators - returns a generator object
* Both can be iterated over

#### Printing values from generators (1)
* Here we can see that looping over a generator expression produces the elements of the analogous list

In [40]:
result = (num for num in range(6))
for num in result:
    print(num)

0
1
2
3
4
5


* We can also pass a generator to the function `list()` to create the list

In [43]:
result = (num for num in range(6))
list(result)

[0, 1, 2, 3, 4, 5]

#### Printing values from generators (2)
* Like any other iterator, we can pass a generator the function `next()` in order to iterate through its elements
* This is an example of "lazy evaluation", whereby the evaluation of the expression is delayed until the value is needed
* This can help a great deal when working with extremely large sequences as you don't want to store the entire list in memory, which is what comprehensions would do; you want to generate elements of the sequence on the fly

In [45]:
result = (num for num in range(6))
print(next(result))

0


In [46]:
print(next(result))

1


#### Generators vs. list comprehensions

In [48]:
# DON'T RUN THIS WITH BRACKETS
# This works because it did not yet create the entire list

(num for num in range(10 * 100000))

<generator object <genexpr> at 0x10e9ba1a8>

#### Conditionals in generator expressions

In [49]:
even_nums = (num for num in range(10) if num % 2 == 0)
list(even_nums)

[0, 2, 4, 6, 8]

#### Generator functions
* Produces generator object when called
* Defined like a regular function - `def`
* Yields a sequence of values instead of returning a single value
* Generates a value with `yield` keyword

#### Build a generator funciton
* sequence.py

In [50]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1

#### Use a generator function

In [51]:
result = num_sequence(5)
print(type(result))

<class 'generator'>


In [52]:
for item in result:
    print(item)

0
1
2
3
4
