# 进程和线程

### 概念
#### 进程
指OS(操作系统)中执行的一个程序的实例,OS以进程为资源分配和调度的基本单位,程序只是指令,数据及其组织形式的描述,而进程是程序的实体.
1. 每个进程有自己**独立**的`地址空间`和`数据栈`.
2. 进程可以通过`fork`或者`spawn`的方式创建新的进程.
3. 进程间通过`IPC`(inter-process communication)实现数据共享,具体包括管道,信号,套接字,共享内存区等方法.
#### 线程
一个进程还可以拥有多个并发的执行线索,(???什么鬼),即拥有多个可以获得CPU调度的执行单元,这就是**线程**.
每个进程可以有多个线程,由于多个线程同属于一个进程,所以他们的**资源共享,通信共享**.
#### Python实现并发编程主要有3种方式：多进程、多线程、多进程+多线程。

### [Python的多进程](./多进程下载.py)
父进程使用os模块中的`fork()`创建子进程,此时子进程是父进程的拷贝,但是PID不同,函数返回子进程的PID
由于Windows没有fork()的调用,为了实现跨平台的多进程编程,需要使用`_multiprocessing`模块的`Process`类来创建子进程,该模块提供了更高级的封装,例如批量启动进程的进程池(`Pool`),用于进程间通信的队列`Queue`和管道`Pipe`

In [9]:
# 使用下载文件来说明使用多进程和单进程的差别
# 单线程,从头执行到尾


def download_task(filename):
    """
    下载单个任务,随机下载时间
    :param filename: 下载的文件名
    :return:
    """
    print(f"开始下载{filename}")
    time_to_download = randint(2,3)
    sleep(time_to_download)
    print(f"{filename}下载完成,耗时{time_to_download}秒")

def main():
    start = time()
    download_task("Python中文版.pdf")
    download_task("东京热.avi")
    end = time()
    print(f"全部下载完成,耗时{round(end-start,2)}秒")

if __name__ == "__main__":
    main()

开始下载Python中文版.pdf
Python中文版.pdf下载完成,耗时3秒
开始下载东京热.avi
东京热.avi下载完成,耗时2秒
全部下载完成,耗时5.01秒


### [子进程通信](./子进程间通信.py)

In [None]:
# author:TYUT-Lmy
# date:2021/12/12
# description:
from multiprocessing import Process
from time import sleep

counter = 0


def sub_task(string):
    global counter
    while counter < 10:
        print(string, end=" ", flush=True) #print结束之后，不管你有没有达到条件，立即将内存中的东西显示到屏幕上，清空缓存。
        counter += 1
        sleep(0.1)


def main():
    Process(target=sub_task, args=("Ping",)).start()
    Process(target=sub_task, args=("Pong",)).start()


if __name__ == "__main__":
    main()


# Pong Ping Pong Ping Ping Pong Ping Pong Ping Pong Ping Pong Ping Pong Ping Pong Ping Pong Ping Pong

程序运行的结果是Ping和Pong各输出了10个,在创建子进程的时候,counter被各自创建,并没有相互通信的思想在内.
解决方法:!!!使用Queue类,暂未解决

### Python的多线程
当前的Python多线程编程主要使用`Threading`这个模块

In [3]:
from random import randint
from threading import Thread
from time import time, sleep

def download(filename):
    print(f"开始下载{filename}")
    time_to_download = randint(2,3)
    sleep(time_to_download)
    print(f"{filename}下载花了{time_to_download}秒")

def main():
    start = time()
    t1 = Thread(target = download, args =("a.txt",))
    t1.start()
    t2  = Thread(target = download,args=("c.exe",))
    # 创建线程
    t2.start()
    t1.join()
    t2.join()

    end = time()
    print(f"总共耗费了{round((end-start),2)}秒")


if __name__ == "__main__":
    main()

开始下载a.txt
开始下载c.exe
c.exe下载花了3秒a.txt下载花了3秒

总共耗费了3.01秒


通过继承Threading类创建自定义线程

In [1]:
from random import randint
from threading import Thread
from time import time, sleep


class DownloadTask(Thread):
    """DT类继承于Thread类"""

    def __init__(self, filename):
        super().__init__()
        self._filename = filename

    @property
    def filename(self):
        return self._filename

    def run(self):
        print(f"开始下载{self.filename}")
        time_to_download = randint(2, 3)
        sleep(time_to_download)
        print(f"{self.filename}下载完成,耗时{time_to_download}秒")


def main():
    start = time()
    t1 = DownloadTask("a.txt")
    t1.start()
    t2 = DownloadTask("b.avi")
    t2.start()
    t1.join()
    t2.join()
    end = time()
    print(f"总共耗费了{round((end - start), 2)}秒")


if __name__ == "__main__":
    main()

开始下载a.txt
开始下载b.avi
a.txt下载完成,耗时2秒
b.avi下载完成,耗时3秒
总共耗费了3.01秒


### 临界资源的保护
#### 多个线程可以共享进程的内存空间。
但是当多个线程共享同一个变量（我们通常称之为“资源”）的时候，很有可能产生不可控的结果从而导致程序失效甚至崩溃。
如果一个资源被多个线程竞争使用，那么我们通常称之为`临界资源`，对“临界资源”的访问需要加上保护，否则资源会处于“混乱”的状态。


In [3]:
# author:TYUT-Lmy
# date:2021/12/13
# description:
from time import sleep
from threading import Thread


class Account:
    def __init__(self):
        self._balance = 0

    @property
    def balance(self):
        return self._balance

    def deposit(self, amount):
        """
        存款
        :param amount: 存款的金额
        :return:
        """
        new_balance = self.balance + amount
        sleep(0.1)
        self._balance = new_balance


class AddMoneyThread(Thread):
    def __init__(self, account, amount):
        super().__init__()
        self._account = account
        self._amount = amount

    def run(self):
        self._account.deposit(self._amount)


def main():
    account = Account()
    threads = []
    for _ in range(100):
        t = AddMoneyThread(account, 1)
        threads.append(t)
        t.start() # start之后 会自动执行对象的run()方法

    for t in threads:
        t.join()

    print(f"账户的余额是{account.balance}") # 账户的余额是1


if __name__ == "__main__":
    main()

账户的余额是1


!!!上面会显示账户的余额是1,因为程序执行过快,获取资源时,所有的线程获取到的账户余额都是0,也因此他们都是在0的基础上进行了+1的操作
我们需要进行对资源的保护,即上`锁`,只有获得钥匙的线程,才可以访问这个上了锁的临界资源,其余线程会被阻塞,直到获得了钥匙.如下代码

In [5]:
# author:TYUT-Lmy
# date:2021/12/13
# description:
from time import sleep
from threading import Thread,Lock


class Account:
    def __init__(self):
        self._balance = 0
        self._lock = Lock()

    @property
    def balance(self):
        return self._balance

    def deposit(self, amount):
        """
        存款
        :param amount: 存款的金额
        :return:
        """
        self._lock.acquire() #请求获得要是
        try:
            new_balance = self.balance + amount
            sleep(0.01)
            self._balance = new_balance
        finally:
            self._lock.release()



class AddMoneyThread(Thread):
    def __init__(self, account, amount):
        super().__init__()
        self._account = account
        self._amount = amount

    def run(self):
        self._account.deposit(self._amount)


def main():
    account = Account()
    threads = []
    for _ in range(100):
        t = AddMoneyThread(account, 1)
        threads.append(t)
        t.start() # start之后 会自动执行对象的run()方法

    for t in threads:
        t.join()

    print(f"账户的余额是{account.balance}") # 账户的余额是1


if __name__ == "__main__":
    main()

账户的余额是100


### 异步I/O
Python中,单线程+异步I/O的编程模型称为协程,没有线程切换的CPU保存与恢复断点的开销.

### 案例1:将耗时间的任务放到线程中以获得更好的用户体验。
思想:下载比较占用时间,点击下载按钮后,程序其他的任务都不能够执行了.

In [8]:
import time
import tkinter
import tkinter.messagebox as mb

def download():
    # 模拟下载
    time.sleep(3)
    mb.showinfo("提示","下载完成")

def show_about():
    mb.showinfo("关于","作者:Zane")

def main():
    top = tkinter.Tk()
    top.title("单线程")
    top.geometry("200x150")
    top.wm_attributes('-topmost', True)

    panel = tkinter.Frame(top)
    button1 = tkinter.Button(panel, text='下载', command=download)
    button1.pack(side='left')
    button2 = tkinter.Button(panel, text='关于', command=show_about)
    button2.pack(side='right')
    panel.pack(side='bottom')
    tkinter.mainloop()

if __name__ == '__main__':
    main()

可以通过将下载操作放入到一个独立的线程中,代码如下

In [9]:
import time
import tkinter
import tkinter.messagebox
from threading import Thread


def main():
    class DownloadTaskHandler(Thread):

        def run(self):
            time.sleep(10)
            tkinter.messagebox.showinfo('提示', '下载完成!')
            # 启用下载按钮
            button1.config(state=tkinter.NORMAL)

    def download():
        # 禁用下载按钮
        button1.config(state=tkinter.DISABLED)
        # 通过daemon参数将线程设置为守护线程(主程序退出就不再保留执行)
        # 在线程中处理耗时间的下载任务
        DownloadTaskHandler(daemon=True).start()

    def show_about():
        tkinter.messagebox.showinfo('关于', '作者: Zane')

    top = tkinter.Tk()
    top.title('单线程')
    top.geometry('200x150')
    top.wm_attributes('-topmost', 1)

    panel = tkinter.Frame(top)
    button1 = tkinter.Button(panel, text='下载', command=download)
    button1.pack(side='left')
    button2 = tkinter.Button(panel, text='关于', command=show_about)
    button2.pack(side='right')
    panel.pack(side='bottom')

    tkinter.mainloop()


if __name__ == '__main__':
    main()

2021-12-13 21:13:47.859 python3.9[4438:181379] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_bundleIdentifierWithReply:) block performed very slowly (219.33 secs).


### 案例2:使用多线程对复杂任务"分而治之"
完成1~1000000000求和的计算密集型任务

#### 单线程单进程方法(大约5分钟)

In [None]:
from time import time


def main():
    now = time()
    sum = 0
    for i in range(10000000001):
        sum += i
    print(sum)
    print(f"程序执行了{round(time() - now, 4)}秒")
    #500000000500000000
    #程序执行了30.6642秒

if __name__ == '__main__':
    main()

#### 多线程方法(程序执行了30.6413秒)

In [19]:
from threading import Thread
from time import time

SUM = 0


class CountThread(Thread):
    def __init__(self, start_num):
        super().__init__()
        self._start_num = start_num
        self._lock = Lock()

    @property
    def start_num(self):
        return self._start_num

    def run(self):
        self._lock.acquire()  #请求获得钥匙
        global SUM
        try:
            sum = 0
            for i in range(self.start_num, self.start_num + 100000000):
                sum += i
            SUM += sum
        finally:
            self._lock.release()


def main():
    now = time()
    threads = []
    for i in range(10):  #创建10个线程
        thread = CountThread(1 + 100000000 * i)
        thread.start()

        threads.append(thread)

    for thread in threads:
        thread.join()

    print(SUM)
    print(f"程序执行了{round(time() - now, 4)}秒")  #程序执行了6.76秒



if __name__ == '__main__':
    main()

500000000500000000
程序执行了30.6413秒


#### [多进程方法](./多进程案例) (16.6763秒)

In [None]:
from multiprocessing import Process, Queue
from random import randint
from time import time


def task_handler(start_num, result_queue):
    total = 0
    for number in range(start_num,start_num + 10000000):
        total += number
    result_queue.put(total)


def main():
    now = time()
    processes = []
    result_queue = Queue()
    for i in range(8):
        p = Process(target=task_handler,
                    args=(1+10000000*i,result_queue))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()

    total = 0
    while not result_queue.empty():
        total += result_queue.get()

    print(total)
    print(f"程序执行了{round(time() - now, 4)}秒")



if __name__ == '__main__':
    main()

#### 多进程多线程(12.6846秒)

In [None]:
# author:TYUT-Lmy
# date:2021/12/13
# description:
from multiprocessing import Process, Queue
from threading import Thread, Lock
from time import time


class Task(Thread):

    def __init__(self, start_num):
        super().__init__()
        self._start_num = start_num
        self._lock = Lock()
        self.SUM = 0

    @property
    def start_num(self):
        return self._start_num

    def run(self):
        for number in range(self.start_num, self.start_num + 25000000):
            self.SUM += number

    def get_sum(self):
        return self.SUM


def task_handler(start_num, result_queue):
    total = 0
    tasks = []
    for i in range(5):
        t = Task(start_num + i * 25000000)
        tasks.append(t)
        t.start()
    for t in tasks:
        t.join()
        total += t.get_sum()

    result_queue.put(total)


def main():
    now = time()
    processes = []
    result_queue = Queue()
    for i in range(8):
        p = Process(target=task_handler,
                    args=(1 + 125000000 * i, result_queue))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()

    total = 0
    while not result_queue.empty():
        total += result_queue.get()

    print(total)
    print(f"程序执行了{round(time() - now, 4)}秒")
    # 500000000500000000
    # 程序执行了12.6846秒


if __name__ == '__main__':
    main()
