# Using Iterators in Python Land

## 4. Loading large Files

- #### Loading data in chunks
- SEE ONENOTE
- Imagine you have a column "lang" in a dataset/frameand you want to take the sum of all items in x
- but x had to make data points i.e. is too large to load to memory at once
- you can read the data in chunks and sum them in chunks


In [45]:
import pandas as pd

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv("Datasets/tweets.csv", chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


{'en': 97, 'et': 1, 'und': 2}


- #### a more robust version

In [56]:
# Define count_entries()
def count_entries(csv_file, c_size, col_name):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[col_name]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('Datasets/tweets.csv', 10, 'lang')

# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}


## 3. Zip
- #### Turns two iterables into tuples

In [46]:
avengers = ["hawkeye", "iron man", "thor", "quicksilver"] 
names = ['barton', 'start', 'odinson', 'maximoff']

z = zip(avengers, names)
print(type(z), '\n')

z_list = list(z)
print(z_list)

<class 'zip'> 

[('hawkeye', 'barton'), ('iron man', 'start'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


- #### Run two for loops simultaneouly

In [47]:
avengers = ["hawkeye", "iron-man", "thor", "quicksilver"] 
names = ['barton', 'start', 'odinson', 'maximoff']

# can be use tp run two for loops simultaneously
for z1, z2 in zip(avengers, names):
    print(z1, z2)

hawkeye barton
iron-man start
thor odinson
quicksilver maximoff


- #### Using star/splat operator
- using * will exhause the elements in your iterator, you will have to recreate the zip object you defined if you want to use it again


In [48]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
powers =['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)




# Check if unpacked tuples are equivalent to original tuples
print(result1, '\n', result2)
print(result1 == mutants)
print(result2 == powers)


('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde') 
 ('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')
False
False


In [49]:
avengers = ["hawkeye", "iron man", "thor", "quicksilver"] 
names = ['barton', 'start', 'odinson', 'maximoff']
z = zip(avengers, names)
print(*z)

# list will be empty unless you define z
z_list = list(z)
print(z_list)

('hawkeye', 'barton') ('iron man', 'start') ('thor', 'odinson') ('quicksilver', 'maximoff')
[]


## 2. Enumerate
- Recall that enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.
- Use enumerate on a list

In [50]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
powers =['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']


# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data, '\n')

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip, '\n')

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')] 

<zip object at 0x10f735820> 

charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


In [51]:
avengers = ["hawkeye", "iron man", "thor", "quicksilver"] 

e = enumerate(avengers)
print(type(e))

e_list = list(e)
print(e_list)

<class 'enumerate'>
[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


In [52]:
e = enumerate(avengers, start = 10)
e_list = list(e)
print(e_list)

[(10, 'hawkeye'), (11, 'iron man'), (12, 'thor'), (13, 'quicksilver')]


In [53]:
for index, value in enumerate(avengers, start = 1):
    print(index, value)

1 hawkeye
2 iron man
3 thor
4 quicksilver


## 1. Using iterators iter( )

- #### Definition

In [54]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for item in flash:
    print(item)


# Create an iterator for flash: superhero
superhero =  iter(flash)

# Print each item from the iterator
print('\n' + next(superhero))
print(next(superhero))
print(next(superhero))
print(next(superhero))

jay garrick
barry allen
wally west
bart allen

jay garrick
barry allen
wally west
bart allen


- #### Using Splat Operator : You can iterate at once using the splat operator

In [55]:
word = 'data'
it = iter(word)
print(*it)

d a t a
