<a href="https://colab.research.google.com/github/QidiLiu/Python_learning/blob/master/Morvan-python_Notes/Python%E5%9F%BA%E7%A1%80_%E5%A4%9A%E8%BF%9B%E7%A8%8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python基础-多进程

## 1. 为什么要多进程

正如之前在“[Python基础-多线程](https://github.com/QidiLiu/Python_learning/blob/master/Morvan-python_Notes/Python%E5%9F%BA%E7%A1%80_%E5%A4%9A%E7%BA%BF%E7%A8%8B.ipynb)”中说的一样：

*多线程是表面的同时进行，通过频繁走位实现，而多进程是真正的同时进行，两个进程之间的数据可以相互沟通，但本质上是相互独立的*

一般而言，多线程适合处理“I/O密集型”的任务，而多进程则适合处理“CPU密集型”的任务。

## 2. 添加进程 Process

跟[多线程中的Thread对象使用方式](https://github.com/QidiLiu/Python_learning/blob/master/Morvan-python_Notes/Python%E5%9F%BA%E7%A1%80_%E5%A4%9A%E7%BA%BF%E7%A8%8B.ipynb)非常像。

In [1]:
import multiprocessing as mp

def job(a, b):
    print(a+b)
    print('blablabla...')

if __name__ == '__main__':
    p1 = mp.Process(target=job, args=(1,2)) # 创建进程
    p1.start() # 启动进程
    p1.join() # 连接进程

3
blablabla...


## 3. Queue进程输出

功能与[Queue在多线程中的应用](https://github.com/QidiLiu/Python_learning/blob/master/Morvan-python_Notes/Python%E5%9F%BA%E7%A1%80_%E5%A4%9A%E7%BA%BF%E7%A8%8B.ipynb)类似，不再赘述。



In [2]:
import multiprocessing as mp

def job(q):
    output = 0
    for i in range(1000):
        output += i + i**2 + i**3
    q.put(output) # queue

if __name__ == '__main__':
    q = mp.Queue()
    p1 = mp.Process(target=job, args=(q,))
    p2 = mp.Process(target=job, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    result_1 = q.get()
    result_2 = q.get()
    print(result_1 + result_2)

499667166000


## 4. 进程池 Pool

把所有东西放在一个进程池里，让Python自己解决怎么分配进程。

In [3]:
import multiprocessing as mp

def job(x):
    return x**2 # 用Pool时可以直接出返回值

def multicore():
    pool = mp.Pool()
    # 大锅炖式分配map()
    result = pool.map(job, range(15))
    print(result)
    # 精准异步分配apply_async()
    result = pool.apply_async(job, (2,))
    print(result.get())
    multi_result = [pool.apply_async(job, (i,)) for i in range(15)]
    print([result.get() for result in multi_result])

if __name__ == '__main__':
    multicore()

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]
4
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]


## 5. 共享内存

共享内存指各个CPU中的进程都能读写某些变量，我个人把它理解为**多进程中的全局变量**。

In [4]:
import multiprocessing as mp

value = mp.Value('d', 1) # 多进程共享值，其中'd'指该变量的数据类型为double
array = mp.Array('i', [1, 2, 3]) # 多进程共享列表,与numpy中的array不同，只能是1维列表

print(value)
print(array)

<Synchronized wrapper for c_double(1.0)>
<SynchronizedArray wrapper for <multiprocessing.sharedctypes.c_int_Array_3 object at 0x7f3e4f6de488>>


## 6. 进程锁 Lock

在没上锁的多进程运算中，各个进程会“抢夺”共享内存的变量。为了避免计算在“抢夺”中出错，要用进程锁Lock锁住处理共享变量的计算部分。

In [5]:
import multiprocessing as mp

BALANCE_1 = mp.Value('i', 0)
BALANCE_2 = mp.Value('i', 0)
lock = mp.Lock()

def change_without_lock(n):
    for i in range(1000000):
        BALANCE_1.value += n
        BALANCE_1.value -= n

def change_with_lock(n):
    lock.acquire()
    for i in range(1000000):
        BALANCE_2.value += n
        BALANCE_2.value -= n
    lock.release()

if __name__ == '__main__':
    p1 = mp.Process(target=change_without_lock, args=(8,))
    p2 = mp.Process(target=change_without_lock, args=(10,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    p3 = mp.Process(target=change_with_lock, args=(8,))
    p4 = mp.Process(target=change_with_lock, args=(10,))
    p3.start()
    p4.start()
    p3.join()
    p4.join()
    print(f'without lock: {BALANCE_1.value}')
    print(f'with lock: {BALANCE_2.value}')

without lock: 982
with lock: 0


这个计算结果很诡异，可能跟Google Colab的运算机制有关系。