Convert all notes to markdown

StdioA · Dec 24, 2018 · 28e6611 · 28e6611
1 parent dde871c
commit 28e6611
Show file tree

Hide file tree

Showing 21 changed files with 3,053 additions and 0 deletions.
diff --git a/markdown/第01章：Python 数据类型.md b/markdown/第01章：Python 数据类型.md
@@ -0,0 +1,58 @@
+
+# Python 数据类型
+> Guido 对语言设计美学的深入理解让人震惊。我认识不少很不错的编程语言设计者，他们设计出来的东西确实很精彩，但是从来都不会有用户。Guido 知道如何在理论上做出一定妥协，设计出来的语言让使用者觉得如沐春风，这真是不可多得。  
+> ——Jim Hugunin  
+>   Jython 的作者，AspectJ 的作者之一，.NET DLR 架构师
+
+Python 最好的品质之一是**一致性**：你可以轻松理解 Python 语言，并通过 Python 的语言特性在类上定义**规范的接口**，来支持 Python 的核心语言特性，从而写出具有“Python 风格”的对象。  
+Python 解释器在碰到特殊的句法时，会使用特殊方法（我们称之为魔术方法）去激活一些基本的对象操作。如 `my_c[key]` 语句执行时，就会调用 `my_c.__getitem__` 函数。这些特殊方法名能让你自己的对象实现和支持一下的语言构架，并与之交互：
+* 迭代
+* 集合类
+* 属性访问
+* 运算符重载
+* 函数和方法的调用
+* 对象的创建和销毁
+* 字符串表示形式和格式化
+* 管理上下文（即 `with` 块）
+
+
+```python
+# 通过实现魔术方法，来让内置函数支持你的自定义对象
+# https://github.com/fluentpython/example-code/blob/master/01-data-model/frenchdeck.py
+import collections
+import random
+
+Card = collections.namedtuple('Card', ['rank', 'suit'])
+
+class FrenchDeck:
+    ranks = [str(n) for n in range(2, 11)] + list('JQKA')
+    suits = 'spades diamonds clubs hearts'.split()
+
+    def __init__(self):
+        self._cards = [Card(rank, suit) for suit in self.suits
+                                        for rank in self.ranks]
+
+    def __len__(self):
+        return len(self._cards)
+
+    def __getitem__(self, position):
+        return self._cards[position]
+
+deck = FrenchDeck()
+# 实现 __length__ 以支持 len
+print(len(deck))
+# 实现 __getitem__ 以支持下标操作
+print(deck[1])
+print(deck[5::13])
+# 有了这些操作，我们就可以直接对这些对象使用 Python 的自带函数了
+print(random.choice(deck))
+```
+
+    52
+    Card(rank='3', suit='spades') [Card(rank='7', suit='spades'), Card(rank='7', suit='diamonds'), Card(rank='7', suit='clubs'), Card(rank='7', suit='hearts')]
+    Card(rank='6', suit='diamonds')
+
+
+Python 支持的所有魔术方法，可以参见 Python 文档 [Data Model](https://docs.python.org/3/reference/datamodel.html) 部分。
+
+比较重要的一点：不要把 `len`，`str` 等看成一个 Python 普通方法：由于这些操作的频繁程度非常高，所以 Python 对这些方法做了特殊的实现：它可以让 Python 的内置数据结构走后门以提高效率；但对于自定义的数据结构，又可以在对象上使用通用的接口来完成相应工作。但在代码编写者看来，`len(deck)` 和 `len([1,2,3])` 两个实现可能差之千里的操作，在 Python 语法层面上是高度一致的。
diff --git a/markdown/第02章：序列构成的数组.md b/markdown/第02章：序列构成的数组.md
@@ -0,0 +1,137 @@
+
+# 序列构成的数组
+> 你可能注意到了，之前提到的几个操作可以无差别地应用于文本、列表和表格上。  
+> 我们把文本、列表和表格叫作数据火车……FOR 命令通常能作用于数据火车上。  
+> ——Geurts、Meertens 和 Pemberton  
+>   *ABC Programmer’s Handbook*
+
+* 容器序列  
+    `list`、`tuple` 和 `collections.deque` 这些序列能存放不同类型的数据。
+* 扁平序列  
+    `str`、`bytes`、`bytearray`、`memoryview` 和 `array.array`，这类序列只能容纳一种类型。
+
+容器序列存放的是它们所包含的任意类型的对象的**引用**，而扁平序列里存放的**是值而不是引用**。换句话说，扁平序列其实是一段连续的内存空间。由此可见扁平序列其实更加紧凑，但是它里面只能存放诸如字符、字节和数值这种基础类型。
+
+序列类型还能按照能否被修改来分类。
+* 可变序列  
+    `list`、`bytearray`、`array.array`、`collections.deque` 和 `memoryview`。
+* 不可变序列  
+    `tuple`、`str` 和 `bytes`
+
+
+```python
+# 列表推导式和生成器表达式
+symbols = "列表推导式"
+[ord(symbol) for symbol in symbols]
+(ord(symbol) for symbol in symbols)
+```
+
+
+```python
+# 因为 pack/unpack 的存在，元组中的元素会凸显出它们的位置信息
+first, *others, last = (1, 2, 3, 4, 5)
+print(first, others, last)
+# 当然后面很多可迭代对象都支持 unpack 了…
+```
+
+
+```python
+# namedtuple
+from collections import namedtuple
+
+Point = namedtuple('Point', ['x', 'y'])
+p = Point(1, 2)
+print(p, p.x, p.y)
+# _asdict() 会返回 OrderedDict
+print(p._asdict())
+```
+
+
+```python
+# 为什么切片(slice)不返回最后一个元素
+a = list(range(6))
+# 使用同一个数即可将列表进行分割
+print(a[:2], a[2:])
+```
+
+
+```python
+# Ellipsis
+def test(first, xxx, last):
+    print(xxx)
+    print(type(xxx))
+    print(xxx == ...)
+    print(xxx is ...)
+    return first, last
+
+# ... 跟 None 一样，有点神奇
+print(test(1, ..., 2))
+```
+
+### bisect 二分查找
+
+
+```python
+import bisect
+def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
+    i = bisect.bisect(breakpoints, score)
+    return grades[i]
+
+print([grade(score) for score in [33, 99, 77, 70, 89, 90, 100]])
+
+a = list(range(0, 100, 10))
+# 插入并保持有序
+bisect.insort(a, 55)
+print(a)
+```
+
+### Array
+> 虽然列表既灵活又简单，但面对各类需求时，我们可能会有更好的选择。比如，要存放 1000 万个浮点数的话，数组（array）的效率要高得多，因为数组在背后存的并不是 float 对象，而是数字的机器翻译，也就是字节表述。这一点就跟 C 语言中的数组一样。再比如说，如果需要频繁对序列做先进先出的操作，deque（双端队列）的速度应该会更快。
+
+`array.tofile` 和 `fromfile` 可以将数组以二进制格式写入文件，速度要比写入文本文件快很多，文件的体积也小。
+
+> 另外一个快速序列化数字类型的方法是使用 pickle（https://docs.python.org/3/library/pickle.html）模块。pickle.dump 处理浮点数组的速度几乎跟array.tofile 一样快。不过前者可以处理几乎所有的内置数字类型，包含复数、嵌套集合，甚至用户自定义的类。前提是这些类没有什么特别复杂的实现。
+
+array 具有 `type code` 来表示数组类型：具体可见 [array 文档](https://docs.python.org/3/library/array.html).
+
+### memoryview
+> memoryview.cast 的概念跟数组模块类似，能用不同的方式读写同一块内存数据，而且内容字节不会随意移动。
+
+
+```python
+import array
+
+arr = array.array('h', [1, 2, 3])
+memv_arr = memoryview(arr)
+# 把 signed short 的内存使用 char 来呈现
+memv_char = memv_arr.cast('B') 
+print('Short', memv_arr.tolist())
+print('Char', memv_char.tolist())
+memv_char[1] = 2  # 更改 array 第一个数的高位字节
+# 0x1000000001
+print(memv_arr.tolist(), arr)
+print('-' * 10)
+bytestr = b'123'
+# bytes 是不允许更改的
+try:
+    bytestr[1] = '3'
+except TypeError as e:
+    print(repr(e))
+memv_byte = memoryview(bytestr)
+print('Memv_byte', memv_byte.tolist())
+# 同样这块内存也是只读的
+try:
+    memv_byte[1] = 1
+except TypeError as e:
+    print(repr(e))
+
+```
+
+### Deque
+`collections.deque` 是比 `list` 效率更高，且**线程安全**的双向队列实现。
+
+除了 collections 以外，以下 Python 标准库也有对队列的实现：
+* queue.Queue (可用于线程间通信)
+* multiprocessing.Queue (可用于进程间通信)
+* asyncio.Queue
+* heapq
diff --git a/markdown/第03章：字典和集合.md b/markdown/第03章：字典和集合.md
@@ -0,0 +1,150 @@
+
+# 字典和集合
+
+> 字典这个数据结构活跃在所有 Python 程序的背后，即便你的源码里并没有直接用到它。  
+> ——A. M. Kuchling 
+
+可散列对象需要实现 `__hash__` 和 `__eq__` 函数。  
+如果两个可散列对象是相等的，那么它们的散列值一定是一样的。
+
+
+```python
+# 字典提供了很多种构造方法
+a = dict(one=1, two=2, three=3)
+b = {'one': 1, 'two': 2, 'three': 3} 
+c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) 
+d = dict([('two', 2), ('one', 1), ('three', 3)]) 
+e = dict({'three': 3, 'one': 1, 'two': 2})
+a == b == c == d == e
+```
+
+
+```python
+# 字典推导式
+r = range(5)
+d = {n * 2: n for n in r if n < 3}
+print(d)
+# setdefault
+for n in r:
+    d.setdefault(n, 0)
+print(d)
+```
+
+
+```python
+# defaultdcit & __missing__
+class mydefaultdict(dict):
+    def __init__(self, value, value_factory):
+        super().__init__(value)
+        self._value_factory = value_factory
+
+    def __missing__(self, key):
+        # 要避免循环调用
+        # return self[key]
+        self[key] = self._value_factory()
+        return self[key]
+
+d = mydefaultdict({1:1}, list)
+print(d[1])
+print(d[2])
+d[3].append(1)
+print(d)
+```
+
+### 字典的变种
+* collections.OrderedDict
+* collections.ChainMap (容纳多个不同的映射对象，然后在进行键查找操作时会从前到后逐一查找，直到被找到为止)
+* collections.Counter
+* colllections.UserDict (dict 的 纯 Python 实现)
+
+
+```python
+# UserDict
+# 定制化字典时，尽量继承 UserDict 而不是 dict
+from collections import UserDict
+
+class mydict(UserDict):
+    def __getitem__(self, key):
+        print('Getting key', key)
+        return super().__getitem__(key)
+
+d = mydict({1:1})
+print(d[1], d[2])
+```
+
+
+```python
+# MyppingProxyType 用于构建 Mapping 的只读实例
+from types import MappingProxyType
+
+d = {1: 1}
+d_proxy = MappingProxyType(d)
+print(d_proxy[1])
+try:
+    d_proxy[1] = 1
+except Exception as e:
+    print(repr(e))
+
+d[1] = 2
+print(d_proxy[1])
+```
+
+
+```python
+# set 的操作
+# 子集 & 真子集
+a, b = {1, 2}, {1, 2}
+print(a <= b, a < b)
+
+# discard
+a = {1, 2, 3}
+a.discard(3)
+print(a)
+
+# pop
+print(a.pop(), a.pop())
+try:
+    a.pop()
+except Exception as e:
+    print(repr(e))
+```
+
+### 集合字面量
+除空集之外，集合的字面量——`{1}`、`{1, 2}`，等等——看起来跟它的数学形式一模一样。**如果是空集，那么必须写成 `set()` 的形式**，否则它会变成一个 `dict`.  
+跟 `list` 一样，字面量句法会比 `set` 构造方法要更快且更易读。
+
+### 集合和字典的实现
+集合和字典采用散列表来实现：
+1. 先计算 key 的 `hash`, 根据 hash 的某几位（取决于散列表的大小）找到元素后，将该元素与 key 进行比较
+2. 若两元素相等，则命中
+3. 若两元素不等，则发生散列冲突，使用线性探测再散列法进行下一次查询。
+
+这样导致的后果：
+1. 可散列对象必须支持 `hash` 函数；
+2. 必须支持 `__eq__` 判断相等性；
+3. 若 `a == b`, 则必须有 `hash(a) == hash(b)`。
+
+注：所有由用户自定义的对象都是可散列的，因为他们的散列值由 id() 来获取，而且它们都是不相等的。
+
+
+### 字典的空间开销
+由于字典使用散列表实现，所以字典的空间效率低下。使用 `tuple` 代替 `dict` 可以有效降低空间消费。  
+不过：内存太便宜了，不到万不得已也不要开始考虑这种优化方式，**因为优化往往是可维护性的对立面**。
+
+往字典中添加键时，如果有散列表扩张的情况发生，则已有键的顺序也会发生改变。所以，**不应该在迭代字典的过程各种对字典进行更改**。
+
+
+```python
+# 字典中就键的顺序取决于添加顺序
+
+keys = [1, 2, 3]
+dict_ = {}
+for key in keys:
+    dict_[key] = None
+
+for key, dict_key in zip(keys, dict_):
+    print(key, dict_key)
+    assert key == dict_key
+
+# 字典中键的顺序不会影响字典比较
+```