# Iterator

In [2]:
a = "Tang"
it = iter(a)
it

<str_iterator at 0x1f48bd8c988>

a是一個iterable，而it是一個iterator

iterable有list, tuple, dictionary, file connection, range objects

In [3]:
# 第一次對iterator用next就會回傳原始iterable的第一個東西
next(it)

'T'

In [4]:
next(it)

'a'

In [5]:
next(it)

'n'

In [6]:
next(it)

'g'

In [7]:
next(it)

StopIteration: 

也可以用 * 一次印出全部

另外，每次用過 next或* 之後，it內就沒東西了，因此繼續print(* it)或next(it)會出錯，因此這裡要再宣告一次it

In [9]:
it = iter(a)
print(*it)

T a n g


另外，dictionaries和files也是iterable，只是不能用一般的方法

想要用dictionary來iterate，必須unpack該dictionary

In [11]:
people = {"Joseph":"Tang", "Shiang":"Yang"}
for key, value in people.items():
    print(key, value)

Joseph Tang
Shiang Yang


In [12]:
#items函數會將非序列資料的dictionary轉變成可以iterate的dict_items物件
type(people.items())

dict_items

File connection 也可以用 next和* 來iterate

In [13]:
file = open("test.txt")
it = iter(file)
print(*it)

1
 2
 3
 4
 5
 6
 7
 8
 9
 10


In [14]:
file = open("test.txt")
it = iter(file)
next(it)

'1\n'

# Enumerate

enumerate() 可以輸入任何iterable argument (如list)，並回傳一個enumerate object

In [30]:
avengers = ['iron man', 'thor', 'captain america']
e = enumerate(avengers)
e

<enumerate at 0x1f48be6b188>

In [26]:
next(e)

(0, 'iron man')

In [27]:
next(e)

(1, 'thor')

In [28]:
next(e)

(2, 'captain america')

In [32]:
print(*e)

(0, 'iron man') (1, 'thor') (2, 'captain america')


此enumerate object包含了三個pairs，每個pairs由index和value組成 (每個pair為一個tuple object)

另外，每次用過 next 或 * 或 list 之後，e內就沒東西了，因此繼續print(* e)或next(e)會出錯，因此這裡要再宣告一次e

In [33]:
e = enumerate(avengers)
list(e)

[(0, 'iron man'), (1, 'thor'), (2, 'captain america')]

In [34]:
for index, value in enumerate(avengers):
    print(index, value)

0 iron man
1 thor
2 captain america


#### 由此可見，之所以讓iterator被看過一次就消失，是因為迴圈執行完後不需要繼續留著iterator

另外，也可以透過傳入不同的argument進enumerator中來改變index的起始值，如下

In [35]:
for index, value in enumerate(avengers, start = 10):
    print(index, value)

10 iron man
11 thor
12 captain america


---

# Zip

zip() 會將兩個iterable物件合併

In [46]:
student = ["Joseph", "Shiang"]
course = ["BA", "MISDI"]

z = zip(student, course)
type(z)

zip

In [47]:
print(*z)

('Joseph', 'BA') ('Shiang', 'MISDI')


zip() 會將兩個相同長的iterables根據相同的index合併成一個個pairs，每個pairs都是一個tuple

In [49]:
z = zip(student, course)
list(z)

[('Joseph', 'BA'), ('Shiang', 'MISDI')]

zip物件跟enumerator和iter物件一樣 <br/>
每次用過 next 或 * 或 list 之後，z內就沒東西了，因此繼續print(* z)或next(z)會出錯，因此這裡要再宣告一次z

In [50]:
for z1, z2 in zip(student, course):
    print(z1, z2)

Joseph BA
Shiang MISDI


Unzip a zip object 

一樣用zip也可以將zip object轉換成一兩個tuples，只要在zip object前面加上 *

In [51]:
z = zip(student, course)

In [52]:
a1, a2 = zip(*z)
print(a1, a2)

('Joseph', 'Shiang') ('BA', 'MISDI')


---

## Use iterators to load large files into memory

There can be too much data to hold in memory, so sometimes we have to load data in chunks. 

Then, perform the desire operation or operations on each chunk, store the result, and discard the chunk.

Later, load the next chunk and do the same

### Implementation using Pandas read_csv function

我們可以透過傳入chunksize這個argument的方式來實踐上述的data loading模式 <br/>
這也代表加入chunksize的read_csv回傳的物件為iterator

#### 舉例
假設有個csv data中有一個column為x，今天我們要計算column x的所有數的總和 <br/>
然而，這個data太大了，沒辦法一次就全部放進memory中，因此必須用chunk的方式來實踐

In [None]:
import pandas as pd
result = []
for chunk in pd.read_csv('data.csv', chunksize = 1000):
    result.append(sum(chunk["x"]))
total = sum(result)
print(total)