## 区分并发和并行
* 并发（Concurrency）通常应用于 I/O 操作频繁的场景，比如你要从网站上下载多个文件，I/O 操作的时间可能会比 CPU 运行处理的时间长得多。
* 并行（Parallelism）则更多应用于 CPU heavy 的场景，比如 MapReduce 中的并行计算，为了加快运行速度，一般会用多台机器、多个处理器来完成。

### 单线程与多线程性能比较
#### 假设我们有一个任务，是下载一些网站的内容并打印

* 单线程
    1. 先是遍历存储网站的列表；
    2. 然后对当前网站执行下载操作；
    3. 等到当前操作完成后，再对下一个网站进行同样的操作，一直到结束。

In [2]:

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    
def download_all(sites):
    for site in sites:
        download_one(site)

def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()

Read 189658 from https://en.wikipedia.org/wiki/Portal:Arts
Read 195629 from https://en.wikipedia.org/wiki/Portal:History
Read 242087 from https://en.wikipedia.org/wiki/Portal:Society
Read 335213 from https://en.wikipedia.org/wiki/Portal:Biography
Read 143121 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 170746 from https://en.wikipedia.org/wiki/Portal:Technology
Read 194867 from https://en.wikipedia.org/wiki/Portal:Geography
Read 160215 from https://en.wikipedia.org/wiki/Portal:Science
Read 344283 from https://en.wikipedia.org/wiki/Computer_science
Read 431421 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 327230 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 491525 from https://en.wikipedia.org/wiki/PHP
Read 183878 from https://en.wikipedia.org/wiki/Node.js
Read 62420 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 322886 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 9.29219

* 多线程

In [3]:

import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))


def download_all(sites):
    # with futures.ProcessPoolExecutor() as executor: # 并行方式
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # 多线程方式 创建一个线程池，总共有 5 个线程可以分配使用
        executor.map(download_one, sites) # executor.map 表示对 sites 中的每一个元素，并发地调用函数 download_one()

def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

Read 143121 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 195629 from https://en.wikipedia.org/wiki/Portal:History
Read 242087 from https://en.wikipedia.org/wiki/Portal:Society
Read 335213 from https://en.wikipedia.org/wiki/Portal:Biography
Read 189658 from https://en.wikipedia.org/wiki/Portal:Arts
Read 194867 from https://en.wikipedia.org/wiki/Portal:Geography
Read 170746 from https://en.wikipedia.org/wiki/Portal:Technology
Read 160215 from https://en.wikipedia.org/wiki/Portal:Science
Read 344283 from https://en.wikipedia.org/wiki/Computer_science
Read 431421 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 491525 from https://en.wikipedia.org/wiki/PHP
Read 183878 from https://en.wikipedia.org/wiki/Node.js
Read 62420 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 327230 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 322886 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 1.61282

### Futures
Python 中的 Futures 模块，位于 concurrent.futures 和 asyncio 中，它们都表示带有延迟的操作。Futures 会将处于等待状态的操作包裹起来放到队列中，这些操作的状态随时可以查询，当然，它们的结果或是异常，也能够在操作完成后被获取。
* __Executor__ 类 当我们执行 executor.submit(func) 时，它便会安排里面的 func() 函数执行，并返回创建好的 future 实例，以便你之后查询调用。
* __done__() 表示相对应的操作是否完成——True 表示完成，False 表示没有完成。不过，要注意，done() 是 non-blocking 的，会立即返回结果。相对应的 add_done_callback(fn)，则表示 Futures 完成后，相对应的参数函数 fn，会被通知并执行调用。
* __result__() 表示当 future 完成后，返回其对应的结果或异常
* __as_completed__(fs) 针对给定的 future 迭代器 fs，在其完成后，返回完成后的迭代器

In [5]:

import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site) # 调用 executor.submit()，将下载每一个网站的内容都放进 future 队列 to_do，等待执行
            to_do.append(future)
            
        for future in concurrent.futures.as_completed(to_do): # as_completed() 函数，在 future 完成后，便输出结果
            future.result()
def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

Read 195629 from https://en.wikipedia.org/wiki/Portal:History
Read 140754 from https://en.wikipedia.org/wiki/Portal:MathematicsRead 189658 from https://en.wikipedia.org/wiki/Portal:Arts

Read 242087 from https://en.wikipedia.org/wiki/Portal:Society
Read 335213 from https://en.wikipedia.org/wiki/Portal:Biography
Read 160215 from https://en.wikipedia.org/wiki/Portal:Science
Read 170968 from https://en.wikipedia.org/wiki/Portal:Technology
Read 344283 from https://en.wikipedia.org/wiki/Computer_science
Read 194867 from https://en.wikipedia.org/wiki/Portal:Geography
Read 431421 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 62420 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 491525 from https://en.wikipedia.org/wiki/PHP
Read 327230 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 183878 from https://en.wikipedia.org/wiki/Node.js
Read 322886 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 1.65223

## 全局解释器锁
Python 的解释器并不是线程安全的，为了解决由此带来的 race condition 等问题，Python 便引入了全局解释器锁，也就是同一时刻，只允许一个线程执行。当然，在执行 I/O 操作时，如果一个线程被 block 了，全局解释器锁便会被释放，从而让另一个线程能够继续执行。