## 1. 基本概念

1. Iterable（可迭代对象）  
   - 含有 `__iter__()` 方法，调用后返回一个 iterator；或实现了按序访问的旧协议 `__getitem__(index)`（从 0 开始，直到抛出 `IndexError`）。
   - 常见：`list`, `tuple`, `str`, `dict`, `set`, `range`, 自定义类等。

2. Iterator（迭代器对象）  
   - 同时实现：
     - `__iter__()`：返回自身（`return self`）
     - `__next__()`：返回下一个元素；无元素时抛出 `StopIteration`
   - 迭代器是“惰性”的（lazy），只在需要时生成数据。

3. for 循环的内部机制（伪过程）：
```
_iter = iter(obj)           # 调用 obj.__iter__()
while True:
    try:
        item = next(_iter)  # 调用 _iter.__next__()
    except StopIteration:
        break
    # 使用 item
```

## 2. 判定一个对象是否可迭代 / 迭代器

In [1]:
from collections.abc import Iterable, Iterator

def inspect(obj):
    print(type(obj), 'Iterable?', isinstance(obj, Iterable), 'Iterator?', isinstance(obj, Iterator))

inspect([1, 2, 3])     # list: Iterable True, Iterator False
inspect(iter([1, 2]))  # list_iterator: Iterable True, Iterator True
inspect(range(5))      # range: Iterable True, Iterator False
inspect((x for x in range(3)))  # generator: Iterable True, Iterator True

<class 'list'> Iterable? True Iterator? False
<class 'list_iterator'> Iterable? True Iterator? True
<class 'range'> Iterable? True Iterator? False
<class 'generator'> Iterable? True Iterator? True


## 3. 自定义迭代器类

In [None]:
class Countdown:
    """从 n 倒数到 1 的迭代器"""
    def __init__(self, n: int):
        self.current = n

    def __iter__(self):
        return self  # 自身就是迭代器

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        val = self.current
        self.current -= 1
        return val

for num in Countdown(5):
    print(num, end=' ')

注意：迭代器“一次性”消费，耗尽后需重新创建实例。

## 4. 可迭代但不是迭代器的类（分离 Iterator 对象）

In [2]:
class CountdownIterable:
    """可反复迭代（每次 for 都重新开始）"""
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        # 每次返回一个新的迭代器对象
        return CountdownIterator(self.n)

class CountdownIterator:
    def __init__(self, n):
        self.current = n

    def __iter__(self):
        return self

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        val = self.current
        self.current -= 1
        return val

c = CountdownIterable(3)
print(list(c))  # [3, 2, 1]
print(list(c))  # 再次可用 [3, 2, 1]

[3, 2, 1]
[3, 2, 1]


## 5. 生成器（Generator）
生成器是创建迭代器的最常用方式，语法简洁。
### 5.1 `yield` 生成器函数

In [3]:
def countdown(n):
    while n > 0:
        yield n
        n -= 1

print(list(countdown(5)))  # [5, 4, 3, 2, 1]

[5, 4, 3, 2, 1]


生成器函数调用后返回一个生成器对象（即迭代器）。

### 5.2 `yield from` 委托

In [6]:
def flatten(list_of_lists):
    for sub in list_of_lists:
        yield from sub   # 等价于: for x in sub: yield x
iter_res = flatten([[1, 2], (3, 4), "ab"])
print(next(iter_res))
print(next(iter_res))
print(next(iter_res))
print(next(iter_res))
print(next(iter_res))
print(next(iter_res))
print(list(flatten([[1, 2], (3, 4), "ab"])))  # [1, 2, 3, 4, 'a', 'b']

1
2
3
4
a
b
[1, 2, 3, 4, 'a', 'b']


### 5.3 生成器表达式

In [None]:
gen = (x * x for x in range(5))
print(next(gen))  # 0
print(list(gen))  # 剩余: [1, 4, 9, 16]

## 6. StopIteration 与返回值
生成器结束时可以用 `return value`，该值会作为 `StopIteration.value` 保存。

In [None]:
def compute_sum(n):
    total = 0
    for i in range(1, n+1):
        total += i
        yield i
    return total

g = compute_sum(5)
for v in g:
    print(v)
# 想拿 return 的值：
g = compute_sum(5)
try:
    while True:
        next(g)
except StopIteration as e:
    print('sum =', e.value)  # sum = 15

## 7. 可重复迭代 vs 只能一次

- 可重复：容器类（list, range, dict 等），每次 iter() 返回新的迭代器。
- 只能一次：生成器、文件句柄、迭代器对象本身。

常见 Bug：对一个生成器做两次遍历，第二次为空。

In [7]:
g = (x for x in range(3))
print(list(g))  # [0, 1, 2]
print(list(g))  # []

[0, 1, 2]
[]


## 8. `iter()` 的两种形式
1. `iter(obj)`：获取对象的迭代器。
2. `iter(callable, sentinel)`：不断调用 `callable()`，直到其返回 `sentinel` 停止。

In [17]:
import random

def roll():
    return random.randint(1, 6)

for v in iter(roll, 6):  # 直到 roll() 返回 6
    print(v)

5
5
4
3


## 9. `next()` 的默认值参数

In [21]:
it = iter([1, 2])
print(next(it))               # 1
print(next(it))               # 2
print(next(it, 'END'))        # 'END'（不会抛异常）

1
2
END


## 10. 常用工具：`itertools` 模块

In [None]:
import itertools as it

# 无限迭代器
counter = it.count(start=10, step=2)
print(next(counter), next(counter), next(counter))  # 10 12 14

# cycle / repeat
cyc = it.cycle('AB')
print([next(cyc) for _ in range(5)])  # ['A', 'B', 'A', 'B', 'A']

print(list(it.islice(it.repeat('X'), 5)))  # ['X', 'X', 'X', 'X', 'X']

# 链接
print(list(it.chain([1, 2], ('a', 'b'))))  # [1, 2, 'a', 'b']

# 组合与排列
print(list(it.combinations([1, 2, 3], 2)))  # [(1, 2), (1, 3), (2, 3)]
print(list(it.permutations('ABC', 2)))      # [('A','B'), ('A','C'), ...]

# 分组
for k, group in it.groupby('aaabbcaaa'):
    print(k, list(group))
# a ['a','a','a']
# b ['b','b']
# c ['c']
# a ['a','a','a']

# 压缩与过滤
print(list(it.compress('abcdef', [1, 0, 1, 0, 0, 1])))  # ['a','c','f']

# 累计
print(list(it.accumulate([1, 2, 3, 4])))  # [1, 3, 6, 10]

# 短路拉链
print(list(it.zip_longest([1,2], ['a'], fillvalue='?')))  # [(1,'a'), (2,'?')]

## 11. 惰性（Lazy）与内存效率
使用迭代器可以避免一次性加载大数据。示例：读取大文件行处理。

In [None]:
def read_large_file(path):
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:       # 文件对象自身就是迭代器
            yield line.rstrip('\n')

# 惰性处理
for line in read_large_file('big.txt'):
    if 'ERROR' in line:
        print(line)

## 12. 管道式（Pipeline）迭代

In [22]:
def gen_numbers():
    for i in range(1, 20):
        yield i

def even_only(iterable):
    for x in iterable:
        if x % 2 == 0:
            yield x

def square(iterable):
    for x in iterable:
        yield x * x

pipeline = square(even_only(gen_numbers()))
print(list(pipeline))  # [4, 16, 36, 64, 100, 144, 196, 256, 324]

[4, 16, 36, 64, 100, 144, 196, 256, 324]


## 13. 可迭代解包、星号等操作如何触发迭代

In [23]:
a, b, *rest = range(6)
print(a, b, rest)  # 0 1 [2,3,4,5]

0 1 [2, 3, 4, 5]


## 14. enumerate / zip 等组合迭代器

In [24]:
for idx, val in enumerate(['a', 'b', 'c'], start=1):
    print(idx, val)

for a, b in zip([1, 2, 3], ['x', 'y', 'z']):
    print(a, b)

1 a
2 b
3 c
1 x
2 y
3 z


## 15. 迭代器耗尽的常见陷阱

In [25]:
def data():
    print("生成数据")
    for i in range(3):
        print(" yield", i)
        yield i

g = data()
# 第一次消费
sum1 = sum(g)
# 第二次消费为空
sum2 = sum(g)

print(sum1, sum2)  # 3 0

生成数据
 yield 0
 yield 1
 yield 2
3 0


## 20. 实战例子：流式处理日志统计

In [31]:
from collections import Counter
def read_lines(path):
    print("read_lines:",path)
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            print('read_lines')
            yield line.rstrip('\n')

def filter_error(lines):
    print("filter_error:",lines)
    for line in lines:
        if 'ERROR' in line:
            print('filter_error')
            yield line

def parse_code(lines):
    print("parse_code:",lines)
    for line in lines:
        # 假设格式: "[ERROR] <code> message"
        if line.startswith('[ERROR]'):
            parts = line.split()
            print('parse_code')
            yield parts[1]  # 取错误码



def top_errors(path, n=5):
    codes = parse_code(filter_error(read_lines(path)))
    counter = Counter(codes)
    return counter.most_common(n)

print(top_errors('app.log', 10))

parse_code: <generator object filter_error at 0x10f529150>
filter_error: <generator object read_lines at 0x10f4fba60>
read_lines: app.log
read_lines
filter_error
parse_code
read_lines
filter_error
parse_code
read_lines
filter_error
parse_code
read_lines
filter_error
parse_code
read_lines
filter_error
parse_code
read_lines
filter_error
parse_code
[('<111>', 1), ('<222>', 1), ('<333>', 1), ('<444>', 1), ('<555>', 1), ('<666>', 1)]
