# Lecture 10/03/18
Python Data Science Toolbox II

### Iterators & Iterables

In [46]:
# A list is an ITERABLE (allows you to iterate on it)
for i in [1,2,3]:
    print(i)
    
for i in range(10,21):
    print(i)

1
2
3
10
11
12
13
14
15
16
17
18
19
20


In [4]:
# Example of ITERATOR, you must step through an iterator with the next() function.
values = iter(range(10,21))
print(values)

print(next(values))
print(next(values))

<range_iterator object at 0x104606360>
10
11


### Useful Functions, enumerate() & zip()

In [5]:
# Useful functions, enumerate().

some_list = [10, 45, 96, 108]

for index, value in enumerate(some_list):
    print(index, value)

0 10
1 45
2 96
3 108


In [6]:
# Can start your index at any number

some_list = [10, 45, 96, 108]

for index, value in enumerate(some_list, start = 50):
    print(index, value)

50 10
51 45
52 96
53 108


### Zip Function

In [50]:
first_names = ['kevin', 'brett', 'taco','john']
last_names = ['kurek', 'farve', 'salad','smith']

together = zip(first_names, last_names)
print(together)

# In order to print you need to convert to a list
list_together = list(together)
print(list_together) # This prints a LIST of TUPLES.

<zip object at 0x106b622c8>
[('kevin', 'kurek'), ('brett', 'farve'), ('taco', 'salad'), ('john', 'smith')]


In [52]:
# In order to unpack and print a zip object you can do...

for first, last in zip(first_names, last_names):
    print(first, last)

kevin kurek
brett farve
taco salad
john smith


In [17]:
# Using the "Splat" operator, * , to print an iterator
first_names = ['kevin', 'brett', 'taco']
last_names = ['kurek', 'farve', 'salad']

together = zip(first_names,last_names)
print(*together)

('kevin', 'kurek') ('brett', 'farve') ('taco', 'salad')


## Useful Reason to use Iterators
Read in a CSV that's too large to fit into memory to compute a task

#### Lets say I wanted to find out how much money was given in total across all the loans, BUT I can't fit my dataframe into my local memory because the file is too big.

In [61]:
import pandas as pd

result = []

for chunk in pd.read_csv('/Users/kevin/Dropbox/Github/Classes/IS_485_685/sliced_training.csv', chunksize = 1000):
    result.append(sum(chunk[' Amount']))
    
sum(result)

1995921

## List Comprehensions
For Loops are inefficient, list comprehensions are faster. Why? Because the Python Interpreter knows what a list comprehension looks like and when it spots it in the code, it understands what is going to be happening; therefore, it can process the task quicker.

In [64]:
# Example
nums = [5, 10, 21, 51]

new_nums = [num + 10 for num in nums]
new_nums

[15, 20, 31, 61]

#### Timing a Standard For Loop

In [53]:
%%timeit

nums = [5, 10, 21, 51]

empty_list = []

for num in nums:
    empty_list.append(num + 5)

empty_list

The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 389 ns per loop


In [28]:
%%timeit

nums = [5, 10, 21, 51]
new_nums = [num + 5 for num in nums]
new_nums

The slowest run took 6.39 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 344 ns per loop


### Bring your attention above to the "best of 3" and notice that the list comprehension is faster

### Nested List Comprehensions
* Benefits: single line of code gives result.
* Cons: Sacrifice readility for others and yourself.

#### Objective: print a list of tuples over two different ranges.

In [56]:
# Standard Way

result = []

for num1 in range(0,2):
    for num2 in range(6,8):
        result.append((num1,num2))
        
result

[(0, 6), (0, 7), (1, 6), (1, 7)]

In [35]:
# List Comprehension

result = [(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
result

[(0, 6), (0, 7), (1, 6), (1, 7)]

### Advanced Comprehensions
Using conditionals within conventional list comprehensions

In [57]:
# Example, print the cube of a number in the range 0-10 if the number is divisible by 2
result = [num**3 for num in range(10) if num % 2 == 0]
result

SyntaxError: invalid syntax (<ipython-input-57-bfc90dae7b25>, line 2)

In [39]:
# Example, for numbers from 0-10, square them, but only print them to the list if they're divisible by 2,
# otherwise make them 0.
result = [num**2 if num % 2 == 0 else 0 for num in range(10)]
result

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

### Dict Comprehensions
Also identical to list comps except small syntax difference

In [40]:
# Example
result = {num: num+2 for num in range(5)}
result

{0: 2, 1: 3, 2: 4, 3: 5, 4: 6}

## Generators
Just like an iterator in which you must step through each value.
Benefits over list comprehensions: Doesn't hold whatever you're trying to evaluate in memory until you ask it to.

In [58]:
even_nums = (num for num in range(10) if num % 2 == 0)
print(even_nums)
print(next(even_nums))
print(next(even_nums))

for i in even_nums:
    print(i)

<generator object <genexpr> at 0x106b4e2b0>
0
2
4
6
8
