**使用multiprocessing实现多核CPU并行处理**  
@Author: Ray  
@Build time: 2022.08.23  
@Cite: Bilibili -> 莫烦Python  
@Note: `多进程在不能在交互式Python运行，应写成.py文件然后执行`

In [66]:
import time
import multiprocessing

# ^ 禁用同一单元格内的输出覆盖
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## 创建进程 & 使用队列收集返回值
* 多进程文件应使用命令行执行
* 用法和treading非常类似

In [67]:
!cat code1.py

import multiprocessing

def job(Q):
    print("new process is start\n")
    res = 0
    for i in range(10000000):
        res += i+i**2+i**3
    Q.put(res)  # * 将返回值放进队列'Q'
    print("new process is finished\n")

if __name__=='__main__':  # 多进程不加这句不行，多线程可以不加这句
    Q = multiprocessing.Queue()  # 创建队列
    process1 = multiprocessing.Process(target=job, args=(Q,))  # ! 函数有一个参数的时候，必须有逗号
    process2 = multiprocessing.Process(target=job, args=(Q,))

    process1.start()  # 进程1开始
    process2.start()  # 进程2开始

    process1.join()   # 进程1加入主进程
    process2.join()   # 进程2加入主进程

    res1 = Q.get()
    res2 = Q.get()
    print(res1, res2)
    print("主进程结束")

In [68]:
!python code1.py

new process is start

new process is start

new process is finished

new process is finished

2499999833333358333330000000 2499999833333358333330000000
主进程结束


## 时间对比：多进程能快多少？
* 对于计算密集型程序，多进程提成显著；
* 多线程实际上还是单核运算，由于IO读取限制，速度反而更慢

In [69]:
# 不使用并行

start_time = time.time()

def job():
    res = 0
    for i in range(10000000):
        res += i+i**2+i**3
    return res

res1 = job()
res2 = job()
print(res1, res2)

end_time = time.time()
print("cost time: ", end_time-start_time)

2499999833333358333330000000 2499999833333358333330000000
cost time:  9.241755247116089


In [70]:
# 使用多进程

start_time = time.time()

!python code1.py

end_time = time.time()
print("cost time: ", end_time-start_time)

new process is start

new process is start

new process is finished

new process is finished

2499999833333358333330000000 2499999833333358333330000000
主进程结束
cost time:  5.089699983596802


In [71]:
# 使用多线程

import threading
from queue import Queue

start_time = time.time()

def job(Q):
    print("new thread is start\n")
    res = 0
    for i in range(10000000):
        res += i+i**2+i**3
    Q.put(res)  # * 将返回值放进队列'Q'
    print("new thread is finished\n")

if __name__=='__main__':  # 多进程不加这句不行，多线程可以不加这句
    Q = Queue()  # 创建队列
    thread1 = threading.Thread(target=job, args=(Q,))  # ! 函数有一个参数的时候，必须有逗号
    thread2 = threading.Thread(target=job, args=(Q,))

    thread1.start()  # 进程1开始
    thread2.start()  # 进程2开始

    thread1.join()   # 进程1加入主进程
    thread2.join()   # 进程2加入主进程

    res1 = Q.get()
    res2 = Q.get()
    print(res1, res2)
    print("主线程结束")

end_time = time.time()
print("cost time: ", end_time-start_time)

new thread is start

new thread is start

new thread is finished

new thread is finished

2499999833333358333330000000 2499999833333358333330000000
主线程结束
cost time:  9.338061809539795


## 进程池Pool
* multiprocessing.Process()创建单一进程，返回值需要用Queue来承接
* multiprocessing.Pool()可创建多个进程，自动分配核心，任务函数可以有返回值
* 执行进程池里的任务有两种方法
    1. `map`方法自动分配进程执行多个任务
    2. `apply_async`方法使用1个核心，执行一个任务

### `pool.map(<函数名>, <迭代器>)`

In [72]:
!cat code2.py

import multiprocessing

def job(x):
    for i in range(10):
        x = x+x**i
    return len(str(x))

if __name__=='__main__':

    # ^ 创建进程池
    pool = multiprocessing.Pool()  # 默认使用全部CPU
    # pool = multiprocessing.Pool(processes=5)  # 指定使用CPU的核心数

    # ^ 执行运算
    # 使用刚创建的进程池pool，执行job函数的运算；
    # 函数的输入参数是列表中的元素，多核心CPU一起处理全部运算，并将结果放到results变量里
    results = pool.map(job, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) # 一共是10个不同初始值的函数运算
    print("-- job(1)的结果是：")
    print(results[0])
    print("-- job(1-10)的结果是：")
    print(results)

In [73]:
!python code2.py

-- job(1)的结果是：
236125
-- job(1-10)的结果是：
[236125, 294538, 337000, 370393, 397922, 421345, 441729, 459774, 475963, 490642]


### `pool.apply_async(<函数名>, <一个任务的参数组成的迭代器>)`

In [74]:
!cat code3.py

import multiprocessing

def job(x):
    for i in range(10):
        x = x+x**i
    return len(str(x))

if __name__=='__main__':

    # ^ 创建进程池
    pool = multiprocessing.Pool()  # 默认使用全部CPU
    # pool = multiprocessing.Pool(processes=5)  # 指定使用CPU的核心数

    # ^ 执行运算
    # 使用刚创建的进程池pool，执行job函数的运算；
    # 只能输入一个任务的参数，返回一个任务的结果
    result = pool.apply_async(job, (1,))
    res = result.get()  # 使用get方法获得返回值
    print("job(1)的结果是: ")
    print(res)

    # ^ 如果想使用apply_async实现map的效果，需要对此方法迭代
    results = [pool.apply_async(job, (i,)) for i in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
    list_res = [res.get() for res in results]
    print("job(1-10)的结果是: ")
    print(list_res)


In [75]:
!python code3.py

job(1)的结果是: 
236125
job(1-10)的结果是: 
[236125, 294538, 337000, 370393, 397922, 421345, 441729, 459774, 475963, 490642]


## 共享内存
* threading中可以通过全局变量实现多线程之间的参数传递
* 但是multiprocessing不可以，核之间是独立的
* 想要实现参数传递，只能用专门的方法

In [76]:
value = multiprocessing.Value('f', 0.313)  # (<数据类型>, <数据值>)
value.value

0.31299999356269836

In [77]:
array = multiprocessing.Array('i', [1, 2, 3])  # 数组只能是一维的

## 进程锁Lock

演示多进程在争抢共享内存里的变量

In [78]:
!cat code4.py

import multiprocessing
import time

def job(v, num, process_name):
    for _ in range(10):
        time.sleep(0.5)
        v.value += num
        print("{}: {}".format(process_name, v.value))

if __name__=='__main__':
    print("--- 演示多进程在争抢共享内存里的变量v")
    v = multiprocessing.Value('i', 0)
    process1 = multiprocessing.Process(target=job, args=(v, 1, 'process 1'))
    process2 = multiprocessing.Process(target=job, args=(v, 100, 'process 2'))
    process1.start()
    process2.start()
    process1.join()
    process2.join()

In [79]:
!python code4.py

--- 演示多进程在争抢共享内存里的变量v
process 2: 100
process 1: 101
process 2: 201
process 1: 202
process 2: 302
process 1: 303
process 2: 403
process 1: 404
process 1: 405process 2: 405

process 2: 505
process 1: 506
process 2: 606
process 1: 607
process 2: 707
process 1: 708
process 2: 808
process 1: 809
process 1: 810
process 2: 910


使用`Lock`锁住进程，防止像上面那样相互干扰

In [80]:
!cat code5.py

import multiprocessing
import time

lock = multiprocessing.Lock()  # ! 必须写在主函数中

def job(v, num, process_name, lock):   # ! 注意这里添加个lock
    lock.acquire()  # * 获取进程锁
    for _ in range(10):
        time.sleep(0.5)
        v.value += num
        print("{}: {}".format(process_name, v.value))
    lock.release()  # * 释放进程锁

if __name__=='__main__':

    print("--- 演示使用进程锁, 防止多进程争抢共享内存里的变量v")
    v = multiprocessing.Value('i', 0) # 创建共享内存里的变量

    process1 = multiprocessing.Process(target=job, args=(v, 1, 'process 1', lock))
    process2 = multiprocessing.Process(target=job, args=(v, 100, 'process 2', lock))
    process1.start()
    process2.start()
    process1.join()
    process2.join()

In [81]:
!python code5.py

--- 演示使用进程锁, 防止多进程争抢共享内存里的变量v
process 2: 100
process 2: 200
process 2: 300
process 2: 400
process 2: 500
process 2: 600
process 2: 700
process 2: 800
process 2: 900
process 2: 1000
process 1: 1001
process 1: 1002
process 1: 1003
process 1: 1004
process 1: 1005
process 1: 1006
process 1: 1007
process 1: 1008
process 1: 1009
process 1: 1010
