# Python Data Science Toolbox Part II

## Introduction to Iterators

### Iterables vs Iterators

* **Iterable**
  * Example: lists, strings, dictionaries, file connections
  * An object with an associated **iter()** method
  * Applying **iter()** to an iterable creates an iterator
* **Iterator**
  * An object with an associated **next()** method
  

### Iterating over iterables: next()

In [1]:
word = 'Hi'
iterable = iter(word)
next(iterable)

'H'

In [2]:
next(iterable)

'i'

In [3]:
next(iterable)

StopIteration: 

### Iterating at once with *

In [4]:
word = 'Hey'
iterable = iter(word)
print(*iterable)

H e y


#### Warning: Once you iter() through the iterable, you can not iterate through it again, as there are no more values to go through!
#### Notice how there are no outputs

In [5]:
print(*iterable)




### Iterating over Dictionaries

In [6]:
my_dict = {
    'Test': 'User',
    'Chai': 'Grindean'
}
for key, value in my_dict.items():
    print(key, value)

Test User
Chai Grindean


### Iterating over file connections

In [7]:
import os
file_path = os.getcwd() + '\\sample_data\\textfile1.txt'
file = open(file_path)

iterable = iter(file)
print(next(iterable))
print(next(iterable))
print(next(iterable))

This is the first line

This is the second line

This is the third and last line


## Playing with Iterators

### Using enumerate()
#### Takes in any iterable as an argument, such as a list, and returns an enumerate object which consists of pairs of elements; containing the original element, and their index position, within the iterable

In [8]:
avengers = ['hawkeye', 'iron man', 'thor']
e = enumerate(avengers)
print(e)

# turn enumerate object into a list with the values, and index from avengers iterable
my_list = list(e)
print(my_list)

for index, value in my_list:
    print(index, value)

<enumerate object at 0x000001F1F4076480>
[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor')]
0 hawkeye
1 iron man
2 thor


### Using zip()
#### Takes any number of iterables and returns a Zip object, which is an iterator of tuples

In [9]:
avengers = ['hawkeye', 'iron man', 'thor']
last_names = ['barton', 'stark', 'odison', 'maximoff']
z = zip(avengers, last_names)
print(z)

# turn zip object into a list
my_list = list(z)
print(my_list)

<zip object at 0x000001F1F4079708>
[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odison')]


In [10]:
# We can loop through the iterator
for avenger, last_name in my_list:
    print(avenger, last_name)

hawkeye barton
iron man stark
thor odison


In [11]:
# Or we can use the splat (*) operator to unpack our Zip object
z = zip(avengers, last_names)
print(*z)

('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odison')


## Using iterators for big data

Imagine the scenario where there is too much data to hold in memory.

**Solution:** Load data in chunks!

We can use Pandas read_csv() function to help load data into chunks, by specifying the *chunksize* property

In [12]:
file_path = os.getcwd() + '\\sample_data\\data1.csv'

import pandas as pd
total = 0

# The object created in read_csv() is an iterable,
for chunk in pd.read_csv(file_path, chunksize = 2):
    total += sum(chunk['column_1'])
    print(f"{sum(chunk['column_1'])}")
print(f"Total: {total}")

60
700
5
64
233
87
321
111
71
967
Total: 2619
