## 例1：迭代器

“索引式”访问元素：

In [None]:
lst = [1, 1, 2, "a", 3.14]
for i in range(len(lst)):
    print(lst[i])

迭代器式访问：

In [None]:
it = iter(lst)
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

如果继续运行，将出现 StopIteration 错误：

In [None]:
print(next(it))

迭代器遍历：

In [None]:
for e in lst:
    print(e)

有些集合无法用索引进行访问：

In [None]:
st = set(lst)
print(st)

以下将出现错误：

In [None]:
print(st[0])

但可以用迭代器：

In [None]:
it = iter(st)
print(next(it))
print(next(it))
print()

for e in st:
    print(e)

## 例2：计算文件行数

文件可以作为一个迭代器：

In [None]:
file = open("data/UNv1.0.en-zh.zh", encoding="utf-8")
print(next(file))
print(next(file))
file.close()

遍历文件的每一行：

In [None]:
file = open("data/UNv1.0.en-zh.zh", encoding="utf-8")
count = 0
for line in file:
    count = count + 1
file.close()
print(count)

一种更推荐的语法：

In [None]:
count = 0
with open("data/UNv1.0.en-zh.zh", encoding="utf-8") as file:
    for line in file:
        count = count + 1
print(count)

## 例3：计算序列样本方差

生成数据：

In [None]:
import math
vec = [math.sin(i + math.exp(i)) for i in range(100)]

简单做法：先计算均值，再计算方差：

In [None]:
# 记录样本量
count = 0
# 所有元素之和
vec_sum = 0.0
for e in vec:
    vec_sum = vec_sum + e
    count = count + 1
# 计算均值
mean = vec_sum / count
# 计算方差
vec_var = 0.0
for e in vec:
    vec_var = vec_var + (e - mean) ** 2
vec_var = vec_var / (count - 1)
print(vec_var)

一次遍历方法：$(n-1)S=\sum_i (x_i-\bar{x})^2=\sum_i x_i^2-n\bar{x}^2$

In [None]:
count = 0
ss = 0.0
s = 0.0
for e in vec:
    ss = ss + e * e
    s = s + e
    count = count + 1
mean = s / count
vec_var = (ss - count * mean * mean) / (count - 1)
print(vec_var)

## 例4：随机抽样

$w=(w_1,\ldots,w_n)$，$v=(v_1,\ldots,v_n)$，每个 $v_i$ 以 $w_i/\sum_j w_j$ 的概率被抽到。

In [None]:
# 生成数据
import string
wvec = list(range(1, 27))
vvec = list(string.ascii_uppercase)

#### 知识点1：同时迭代两个集合

In [None]:
it = zip(wvec, vvec)
print(next(it))
print(next(it))
print()

for w, v in zip(wvec, vvec):
    # print(w, v)
    print(f"w is {w}, v is {v}")

解法：

In [None]:
# 设置随机数种子
import random
random.seed(123)

D = 0.0
i = 0        # 迭代器中每个元素的索引
loc = 0      # 最终取出的 v 元素的位置（索引）
item = None  # 最终取出的 v 元素
for w, v in zip(wvec, vvec):
    D = D + w
    prob = w / D
    # 以这一概率选择 v
    if random.random() <= prob:
        loc = i
        item = v
    i = i + 1
print(loc)
print(item)

#### 知识点2：迭代集合的同时获取索引

In [None]:
it = enumerate(wvec)
print(next(it))
print(next(it))
print()
it = enumerate(zip(wvec, vvec))
print(next(it))
print(next(it))

注意此处迭代元素的写法：

In [None]:
for i, (w, v) in enumerate(zip(wvec, vvec)):
    print(f"i is {i}, w is {w}, v is {v}")

改写之前的方法，省去 `i` 的手动更新：

In [None]:
def random_select(wvec, vvec):
    D = 0.0
    loc = 0  # 最终取出的 v 元素的位置（索引）
    item = None  # 最终取出的 v 元素
    for i, (w, v) in enumerate(zip(wvec, vvec)):
        D = D + w
        prob = w / D
        # 以这一概率选择 v
        if random.random() <= prob:
            loc = i
            item = v
    return loc, item

测试抽样概率：

In [None]:
import collections
random.seed(123)
res = [random_select(wvec, vvec)[1] for i in range(10000)]
# 计算频率
elements_count = collections.Counter(res)
for key, value in elements_count.items():
    print(f"{key}: {value}")

按 key 排序：

In [None]:
freq = list(elements_count.items())
freq.sort()
print(freq)

作图：

In [None]:
import matplotlib.pyplot as plt
freqs = [v for k, v in freq]
plt.plot(freqs)
plt.show()

## 例5：Reduce

In [None]:
def add(x ,y):
    return x + y

import functools
x = [1, 2, 3, 4, 5]
it = iter(x)
xsum = functools.reduce(add, it)
print(xsum)

`reduce` 的第三个参数可以设定规约的初值：

In [None]:
def mult(x, y):
    return x * y

it = iter(x)
xprod = functools.reduce(mult, it, 1)
print(xprod)

## 例6：Filter

`it` 是原始的迭代器：

In [None]:
x = list(range(10))
it = iter(x)
print(next(it))  # 0
print(next(it))  # 1
print(next(it))  # 2

`it_filtered` 是“过滤”后的迭代器：

In [None]:
def is_even(x):
    return x % 2 == 0

it = iter(x)
it_filtered = filter(is_even, it)
print(next(it_filtered))  # 0
print(next(it_filtered))  # 2
print(next(it_filtered))  # 4

## 例7：Map

`it` 是原始的迭代器：

In [None]:
x = list(range(10))
it = iter(x)
print(next(it))  # 0
print(next(it))  # 1
print(next(it))  # 2

`it_mapped` 是变换后的迭代器：

In [None]:
def square(x):
    return x * x

it = iter(x)
it_mapped = map(square, it)
print(next(it_mapped))  # 0
print(next(it_mapped))  # 1
print(next(it_mapped))  # 4

## 例8：islice

`it` 是原始的迭代器，`it_n` 是长度截断的迭代器：

In [None]:
import itertools
x = list(range(100))
it = iter(x)
it_n = itertools.islice(it, 5)
for e in it_n:
    print(e)

## 例9：组合使用，Lambda 表达式

In [None]:
import itertools
it = iter(range(10000))
it_new = filter(lambda x: x % 2 == 0, it)
it_new = map(lambda x: x * x, it_new)
it_new = itertools.islice(it_new, 10)
print(list(it_new))

## 练习

使用 Map/Reduce 等函数式编程工具计算向量的均值，规定只能使用迭代器**一次**，**不能使用 for 循环**，且不知道向量的长度。

In [None]:
import math
vec = [math.sin(i + math.exp(i)) for i in range(100)]
it = iter(vec)

同样的要求，计算样本方差？