# Python Data Science Toolbox (Part 2)

## Chapter 1. Using iterators in PythonLand

## 1. Iterators in Pythonland

### Iterating with a for loop

* We can iterate over a list using a for loop

In [1]:
employees = ['Nick', 'Lore', 'Hugo']

In [2]:
for employee in employees:
    print(employee)

Nick
Lore
Hugo


* We can iterate over a string using a for loop

In [3]:
for letter in 'DataCamp':
    print(letter) 

D
a
t
a
C
a
m
p


* We can iterate over a range object using a for loop

In [4]:
for i in range(4):
    print(i) 

0
1
2
3


### Iterators vs. iterables

* Iterable
  * Examples: lists, dictionaries, strings, file connections
  * An object with an associated `iter()` method
  * Applying `iter()` to an iterable creates an iterator
* Iterator
  * Produces next value with the `next()`

### Iterating over iterables: `next()`

In [5]:
word = 'Da'
it = iter(word)

In [6]:
print(next(it))

D


In [7]:
print(next(it))

a


```python
In [8]: print(next(it))
Out[8]:
```
```
        ---------------------------------------------------------------------------
        StopIteration                             Traceback (most recent call last)
        <ipython-input-9-4d0222393be1> in <module>()
        ----> 1 print(next(it))

        StopIteration: 
```

### Iterating at once with `*`

In [8]:
word = 'Data'
it = iter(word)

print(*it)

D a t a


In [9]:
print(*it) # No more values to go through!




### Iterating over dictionaries

In [10]:
pythonistas = {'hugo': 'bowne-anderson', 'francis': 'castro'}

for key, value in pythonistas.items():
    print(key, value)

hugo bowne-anderson
francis castro


### Iterating over file connections

In [11]:
file = open('Python_Data_Science_Toolbox_Part2/file.txt')
it = iter(file)

In [12]:
print(next(it))

This in the first line.



In [13]:
next(it)

'This in the second line.\n'

## 2. Playing with iterators

### Using `enumerate()`

In [14]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
e = enumerate(avengers)
print(type(e))

<class 'enumerate'>


In [15]:
e_list = list(e)
print(e_list) 

[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


### **enumerate()** and unpack

In [16]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']

In [17]:
for index, value in enumerate(avengers):
    print(index, value) 

0 hawkeye
1 iron man
2 thor
3 quicksilver


In [18]:
for index, value in enumerate(avengers, start=10):
    print(index, value)

10 hawkeye
11 iron man
12 thor
13 quicksilver


### Using `zip()`

In [19]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']

In [20]:
z = zip(avengers, names)
print(type(z))

<class 'zip'>


In [21]:
z_list = list(z)
print(z_list)

[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


### **zip()** and unpack

In [22]:
for z1, z2 in zip(avengers, names):
    print(z1, z2) 

hawkeye barton
iron man stark
thor odinson
quicksilver maximoff


### Print zip with `*`

In [23]:
z = zip(avengers, names)
print(*z) 

('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


## 3. Using iterators for big data

### Loading data in chunks

* There can be too much data to hold in memory
* Solution: load data in chunks!
* Pandas function: `.read_csv()`
    * Specify the chunk: chunksize

### Iterating over data

```python
In [1]: import pandas as pd

In [2]: result = []

In [3]: for chunk in pd.read_csv('data.csv', chunksize=1000):
            result.append(sum(chunk['x']))
    
In [4]: total = sum(result)

In [5]: print(total) 
```
```
        4252532
```

## Chapter 2. List comprehensions and generators

## 4. List comprehensions

### Populate a list with a for loop

In [24]:
nums = [12, 8, 21, 3, 16]
new_nums = []

for num in nums:
    new_nums.append(num + 1)
    
print(new_nums) 

[13, 9, 22, 4, 17]


### A list comprehension

In [25]:
nums = [12, 8, 21, 3, 16]
new_nums = [num + 1 for num in nums]

print(new_nums)

[13, 9, 22, 4, 17]


### *for* loop and list comprehension syntax

In [26]:
new_nums = []

for num in nums:
    new_nums.append(num + 1)
    
print(new_nums)

[13, 9, 22, 4, 17]


In [27]:
new_nums = [num + 1 for num in nums]
print(new_nums)

[13, 9, 22, 4, 17]


### List comprehension with `range()`

In [28]:
result = [num for num in range(11)] 
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


### List comprehensions

* Collapse for loops for building lists into a single line
* Components
    * Iterable
    * Iterator variable (represent members of iterable)
    * Output expression

### Nested loops (1)

In [29]:
pairs_1 = []

for num1 in range(0, 2):
    for num2 in range(6, 8):
        pairs_1.append((num1, num2))
        
print(pairs_1)

[(0, 6), (0, 7), (1, 6), (1, 7)]


* How to do this with a list comprehension?

### Nested loops (2)

In [30]:
pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]

print(pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]


* Tradeoff: readability

## 5. Advanced comprehensions

### Conditionals in comprehensions

* Conditionals on the iterable

In [31]:
[num ** 2 for num in range(10) if num % 2 == 0] 

[0, 4, 16, 36, 64]

* Python documentation on the % operator:
```
The % (modulo) operator yields the remainder from the division of the first argument by the second.
```

### Conditionals in comprehensions

* Conditionals on the output expression

In [32]:
[num ** 2 if num % 2 == 0 else 0 for num in range(10)] 

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

### Dict comprehensions

* Create dictionaries
* Use curly braces `{}` instead of brackets `[]`

In [33]:
pos_neg = {num: -num for num in range(9)}

print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


In [34]:
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


## 6. Introduction to generators

### Generator expressions

* Recall list comprehension

In [35]:
[2 * num for num in range(10)] 

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [36]:
(2 * num for num in range(10)) 

<generator object <genexpr> at 0x10aec7f48>

### List comprehensions vs. generators

* List comprehension - returns a list
* Generators - returns a generator object
* Both can be iterated over

### Printing values from generators (1)

In [37]:
result = (num for num in range(6))

for num in result:
    print(num) 

0
1
2
3
4
5


In [38]:
result = (num for num in range(6))

print(list(result))

[0, 1, 2, 3, 4, 5]


### Printing values from generators (2)

In [39]:
result = (num for num in range(6))

print(next(result))
print(next(result))
print(next(result))      # lazy evaluation
print(next(result))
print(next(result))

0
1
2
3
4


### Conditionals in generator expressions

In [40]:
even_nums = (num for num in range(10) if num % 2 == 0) 

print(list(even_nums))

[0, 2, 4, 6, 8]


### Generator functions

* Produces generator objects when called
* Defined like a regular function - `def`
* Yields a sequence of values instead of returning a single value
* Generates a value with `yield` keyword

### Build a generator function

In [41]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1

### Use a generator function

In [42]:
result = num_sequence(5)

print(type(result))

<class 'generator'>


In [43]:
for item in result:
    print(item)

0
1
2
3
4


## 7. Wrap-up: comprehensions

### Re-cap: list comprehensions

* **Basic**  
`[output expression for iterator variable in iterable]`


* **Advanced**   
`[output expression + conditional on output for iterator variable in iterable + conditional on iterable]`

## Chapter 3. Bringing it all together!

## 8. Case Study

### World bank data

* Data on world economies for over half a century
* Indicators
    * Population
    * Electricity consumption
    * CO2 emissions
    * Literacy rates
    * Unemployment

### Using `zip()`

In [44]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']

In [45]:
z = zip(avengers, names)
print(type(z)) 

<class 'zip'>


In [46]:
print(list(z))

[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


### Defining a function

In [47]:
def raise_both(value1, value2):
    """Raise value1 to the power of value2 and vice versa."""
    new_value1 = value1 ** value2
    new_value2 = value2 ** value1
    new_tuple = (new_value1, new_value2)
    return new_tuple

## 9. Using Python generators for streaming data

### Generators for the large data limit

* Use a generator to load a file line by line
* Works on streaming data!
* Read and process the file until all lines are exhausted

### Build a generator function

In [48]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1

## 10. Using iterators for streaming data

### Reading files in chunks

* Up next:
    * `.read_csv()` function and chunksize argument
    * Look at specific indicators in specific countries
    * Write a function to generalize tasks