## 多进程 

In [2]:
import time 
import logging
import threading
import importlib

importlib.reload(logging)
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s  [%(threadName)s] %(message)s')

In [1]:
import multiprocessing     ## 与多线程引入的库不一样  但用法非常像

In [3]:
def worker():
    logging.info('worker')

In [4]:
p = multiprocessing.Process(target=worker)

In [5]:
p.start()        ## 这就启动这个进程了

2017-05-04 07:09:01,144 INFO  [MainThread] worker


再重新配置一下logging的格式

In [38]:
import multiprocessing

In [45]:
importlib.reload(logging)

<module 'logging' from '/root/.pyenv/versions/3.5.2/lib/python3.5/logging/__init__.py'>

In [46]:
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s  [%(processName)s] %(message)s')

In [47]:
def worker():
    logging.info('worker')

In [48]:
p = multiprocessing.Process(target=worker)

In [49]:
p.start()     ## 修改为进程name了

2017-05-04 07:26:13,104 INFO  [Process-8] worker


In [50]:
p = multiprocessing.Process(target=worker, name='worker')  ## 指定进程的名子

In [51]:
p.start()   

2017-05-04 07:27:58,655 INFO  [worker] worker


并且daemon、nondaemon 参数等都是一样的 

In [None]:
## 不一样的地方：

In [52]:
p.pid     ## 进程是有进程号的

11757

In [53]:
p.exitcode   ## 进程有退出码

0

In [54]:
p.terminate()   ## 进程是可以被杀死的   terminate方法就是杀死这个进程的

In [55]:
help(p.terminate)

Help on method terminate in module multiprocessing.process:

terminate() method of multiprocessing.context.Process instance
    Terminate process; sends SIGTERM signal or uses TerminateProcess()



In [None]:
## 其它的都跟线程是一样的了

需要记住，进程是要启动一个新的解释器的，所以进程的代价是比线程要高的；所以通常需要一个长期任务，才放到进程上去跑！

用进程跑一段很短的任务，不值得！

进程有的，线程差不多都有。

In [None]:
multiprocessing.Event    multiprocessing.Lock  multiprocessing.Condition  信号量等5种方式，在进程里都有。

线程所有的同步方式，在进程上都有，而且用法一样，效果也一模一样。

用进程还是线程对于写代码是一样的，没有什么特别的。

**但是通讯方式不一样**

 由于多进程是跨解释器的，所以进程的通讯，数据需要序列化和反序列化。

GIL(全局解释器锁)对多进程无效（只在一个进程内有效），内置容器不是进程安全的，queue.Queue也不是进程安全的！

如果多进程里需要用到queue的话，有替代品

In [56]:
multiprocessing.Queue()   ## 这个是进程安全的

<multiprocessing.queues.Queue at 0x7f5050271c18>

In [57]:
from multiprocessing import Manager

In [58]:
mgr = Manager()

In [61]:
d = mgr.dict()   ## dict方法会产生进程安全的字典

In [62]:
mgr.list()

<ListProxy object, typeid 'list' at 0x7f505024c3c8>

In [63]:
ns = mgr.Namespace()

In [64]:
ns.f = 3

通常来说，多进程下，尽量避免数据的交互

因为序列化和反序列化是有代价的！

master -> worker   多进程通常用这种方式工作的，由master分发给worker，但是worker之间通常做不通信。

### 多线程和多进程

什么时候用多线程，什么时候用多进程呢？

* CPU 密集型用多进程，可以充分利用CPU
* IO 密集型用多线程，减少序列化/反序列化

但并不绝对！

请求/应答   这种模型，更多时候是结合使用的

比如web应用 就是请求/应答模型

通常由master进程接收请求，分发给多个worker进程处理，worker进程中再使用线程来进一步并发处理，最后返回结果给master做响应。  (非常多的知名软件都是这样做的，比如 nginx )

wsgi容器，gunicorn 也是这种模型

## python 3 新引入的内容

### concurrent.futures

In [65]:
import concurrent       ## 并发包

In [66]:
from concurrent import futures    ## futures 是一个异步编程的模型

In [None]:
futures.ThreadPoolExecutor   ## 实现一个线程池的

In [67]:
pool = futures.ThreadPoolExecutor(max_workers=5)   ## 实现了一个size 为 5 的线程池

In [68]:
help(pool.submit)   ## 用于提交单独的一个线程

Help on method submit in module concurrent.futures.thread:

submit(fn, *args, **kwargs) method of concurrent.futures.thread.ThreadPoolExecutor instance
    Submits a callable to be executed with the given arguments.
    
    Schedules the callable to be executed as fn(*args, **kwargs) and returns
    a Future instance representing the execution of the callable.
    
    Returns:
        A Future representing the given call.



In [69]:
fut = pool.submit(lambda:1+1)

In [70]:
fut.result()     ## 会返回线程里面执行的结果

2

In [71]:
fut.done()   ## 看这个线程运行线束了没有

True

In [72]:
def worker():
    time.sleep(30)
    logging.info('worker')

In [73]:
fut = pool.submit(worker)

In [74]:
fut.done()   ## 看这个线程运行线束了没有

False

In [75]:
fut.cancel()   ## 一旦开始了，就cancel不了了

False

2017-05-04 08:20:32,585 INFO  [MainProcess] worker


In [76]:
fut.running()   ## 现在已经结束了

False

In [77]:
## 再写一个worker

def worker():
    raise Exception('haha')

In [78]:
fut = pool.submit(worker)

In [79]:
fut.exception()     ## 会返回exception的实例

Exception('haha')

In [80]:
futures.ProcessPoolExecutor()       ## 进程池与线程池的操作完全一样

<concurrent.futures.process.ProcessPoolExecutor at 0x7f505019dac8>

进程池和线程池的使用  简化了多线程/多进程 操作 ，并且对返回值和异常都可以处理

In [81]:
help(futures.ThreadPoolExecutor)      ## 问题在于无法设置线程名   但是问题都不大

Help on class ThreadPoolExecutor in module concurrent.futures.thread:

class ThreadPoolExecutor(concurrent.futures._base.Executor)
 |  This is an abstract base class for concrete asynchronous executors.
 |  
 |  Method resolution order:
 |      ThreadPoolExecutor
 |      concurrent.futures._base.Executor
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, max_workers=None)
 |      Initializes a new ThreadPoolExecutor instance.
 |      
 |      Args:
 |          max_workers: The maximum number of threads that can be used to
 |              execute the given calls.
 |  
 |  shutdown(self, wait=True)
 |      Clean-up the resources associated with the Executor.
 |      
 |      It is safe to call this method several times. Otherwise, no other
 |      methods can be called after this one.
 |      
 |      Args:
 |          wait: If True then shutdown will not return until all running
 |              futures have finished executing and the resources used by the
 | 

In [82]:
help(futures._base.Executor)     ## 说明可以用 with 语法的(有enter和exit)

Help on class Executor in module concurrent.futures._base:

class Executor(builtins.object)
 |  This is an abstract base class for concrete asynchronous executors.
 |  
 |  Methods defined here:
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, exc_type, exc_val, exc_tb)
 |  
 |  map(self, fn, *iterables, timeout=None, chunksize=1)
 |      Returns an iterator equivalent to map(fn, iter).
 |      
 |      Args:
 |          fn: A callable that will take as many arguments as there are
 |              passed iterables.
 |          timeout: The maximum number of seconds to wait. If None, then there
 |              is no limit on the wait time.
 |          chunksize: The size of the chunks the iterable will be broken into
 |              before being passed to a child process. This argument is only
 |              used by ProcessPoolExecutor; it is ignored by
 |              ThreadPoolExecutor.
 |      
 |      Returns:
 |          An iterator equivalent to: map(func, *iterables) but the call