# 1 Using iterators in PythonLand

## 1.1 Introduction to iterators

### 1.1.1 Iterators VS Iterables
 * Iterable
    * Examples : lists, strings, dictionaries, file connections
    * An object with an associated `iter()` method (연결된 iter method 가 있는 객체)
    * Applying `iter()` to an iterable creates an iterator

 * Iterator
    * Produces next value with `next()`
    * 연속 값을 생성하는 연결된 next 메서드가 있는 개체

 * Iterable에서 Iterator을 생성


### 1.1.2 Iterating over iterables : nex()

In [2]:
word = "Da"
it = iter(word)
next(it)

'D'

In [3]:
# D 다음의 value
next(it)

'a'

In [4]:
# 반환할 값이 남아 있지 않아 오류
next(it) 

StopIteration: ignored

### 1.1.3 Iterating at once with *

In [5]:
word = "Data"
it = iter(word)
print(*it)

D a t a


In [6]:
# No more values to go through!
print(*it)




### 1.1.4 Iterating over dictionaries

In [7]:
pythonistas = {'hugo' : 'bowne-anderson', 'francis' : 'castro'}
for key, value in pythonistas.items() :
    print(key, value)

hugo bowne-anderson
francis castro


### 1.1.5 예제

In [13]:
# Iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator.



# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash:
    print(person)


jay garrick
barry allen
wally west
bart allen


In [14]:
# Create an iterator for flash: superhero
superhero = iter(flash)


# Print each item from the iterator
print(next(superhero))
print(next(superhero))
print(next(superhero))
print(next(superhero))

jay garrick
barry allen
wally west
bart allen


--------------------------------------------------------------------------------

In [21]:
# Create an iterator for range(3): small_value
small_value = iter(range(3)) # 0 ~ 2 정수 생성

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))


0
1
2


In [22]:
# Loop over range(3) and print the values
for i in range(3):
    print(i)


0
1
2


In [23]:
# Create an iterator for range(10 ** 100): googol
googol = iter(range(10**100)) # 구골 : 10의 100승

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

0
1
2
3
4


--------------------------------------------------------------------------------

In [24]:
# Passing an iterable from range() and then printing the results of the function calls.


# Create a range object: values
values = range(10,21)

# Print the range object
print(values)

range(10, 21)


In [25]:
# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]


In [26]:
# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

165


## 1.2 Playing with iterators

### 1.2.1 Using enumerate()

In [28]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
e = enumerate(avengers)  # list 원소에 순서값을 부여해주는 함수
print(type(e))

<class 'enumerate'>
<enumerate object at 0x7fda42b1a1c0>


In [29]:
e_list = list(e)
print(e_list)

[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


### 1.2.2 eunumerate() and unpack

In [30]:
# Unpacking
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers) :
    print(index, value)

0 hawkeye
1 iron man
2 thor
3 quicksilver


In [31]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers, start=10) :
    print(index, value)

10 hawkeye
11 iron man
12 thor
13 quicksilver


### 1.2.3 Using zip()

In [32]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
z = zip(avengers, names)
print(type(z))

<class 'zip'>


In [33]:
z_list = list(z)
print(z_list)

[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


In [34]:
for z1, z2 in zip(avengers, names) :
    print(z1, z2)

hawkeye barton
iron man stark
thor odinson
quicksilver maximoff


### 1.2.4 Print zip with *

In [35]:
z = zip(avengers, names)
print(*z)

('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


### 1.2.5 예제

In [36]:
# You will practice using enumerate() on it by printing out a list of tuples and unpacking the tuples using a for loop.

# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]


In [38]:
# Unpack and print the tuple pairs
for index1,value1 in enumerate(mutants):
    print(index1, value1)

0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde


In [39]:
# Change the start index
for index2,value2 in enumerate(mutants, start=1):
    print(index2, value2)

1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


-------------------------------------------------------------------------------

In [40]:
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']

powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

In [43]:
# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')]


In [44]:
# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

<zip object at 0x7fda42a84740>


In [45]:
# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


--------------------------------------------------------------------------------

In [52]:
# Using * and zip to 'unzip'

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')


In [57]:
# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

print(result1)
print(result2)

('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde')
('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')


In [59]:
print(mutants)
print(powers)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)

['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']
False
False


### 1.3 Using iterators to load large fiels into memory

### 1.3.1 Loading data in chuncks
 * There can be too much data to hold in memory
 * Solution : load data in chunks!
 * `pandas` function : `read_csv()`
    * Specify the cunk : `chunksize`

In [62]:
import pandas as pd
PATH = "/content/drive/MyDrive/KUBIG"




# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv(PATH+"/tweets.csv", chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


{'en': 97, 'et': 1, 'und': 2}


In [63]:
# Define count_entries()
def count_entries(csv_file,c_size,colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file,chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries(PATH+"/tweets.csv",10,'lang')

# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}
