# 说明

- 对应[【2022 年】崔庆才 Python3 网络爬虫学习教程](https://cuiqingcai.com/17777.html) 中的 "异步爬虫和模拟登录-协程的基本原理"
- 对应 "52讲视频" 模块三-第16讲：异步爬虫的原理和解析

# 笔记

- 爬虫是 IO 密集型任务

使用 requests 库来爬取某个站点的话，发出一个请求之后，程序必须要等待网站返回响应之后才能接着运行，而在  **等待响应的过程** 中，整个爬虫程序是一直在等待的，实际上没有做任何事情。对于这种情况，我们有没有优化方案呢？

In [None]:
url = 'https://static4.scrape.cuiqingcai.com/'

In [2]:
from time import sleep
import requests
from selenium import webdriver

url = 'https://httpbin.org/delay/5'

browser = webdriver.Chrome()
browser.get(url)
sleep(2)
browser.close()


In [4]:
proxies={
    'http': 'http://127.0.0.1:7890',
    'https': 'http://127.0.0.1:7890'  # https -> http
}

r = requests.get(url, proxies=proxies)
print(r.status_code)

200


In [27]:
import requests
import logging
import time

logging.basicConfig(level = logging.INFO,
                    format='%(asctime)s - %(levelname)s: %(message)s')

proxies={
    'http': 'http://127.0.0.1:7890',
    'https': 'http://127.0.0.1:7890'  # https -> http
}

TOTAL_NUMBER = 1
URL = 'https://httpbin.org/delay/5'

start_time = time.time()    # 什么意思——返回当前时间的时间戳（1970纪元后经过的浮点秒数）
for i in range(1, TOTAL_NUMBER + 1):    # 1~3
    logging.info('scraping %s', URL)
    response = requests.get(URL, proxies=proxies)
end_time = time.time()
logging.info('total time %s second', end_time - start_time)

2022-06-30 15:56:43,534 - INFO: scraping https://httpbin.org/delay/5
2022-06-30 15:56:50,038 - INFO: scraping https://httpbin.org/delay/5
2022-06-30 15:56:56,173 - INFO: scraping https://httpbin.org/delay/5
2022-06-30 15:57:02,280 - INFO: total time 18.746297597885132 second


> [Python time time()方法](https://www.runoob.com/python/att-time-time.html)

```Python
import time

print(time.time(), '\n')
print(time.localtime(), '\n')
time.sleep(1)
print(time.localtime(time.time()), '\n')
print(time.asctime(time.localtime(time.time())), '\n')
```

- 输出

```
1656576002.1171286 

time.struct_time(tm_year=2022, tm_mon=6, tm_mday=30, tm_hour=16, tm_min=0, tm_sec=2, tm_wday=3, tm_yday=181, tm_isdst=0) 

time.struct_time(tm_year=2022, tm_mon=6, tm_mday=30, tm_hour=16, tm_min=0, tm_sec=3, tm_wday=3, tm_yday=181, tm_isdst=0) 

Thu Jun 30 16:00:03 2022 
```

In [28]:
import time

print(time.time(), '\n')
print(time.localtime(), '\n')
time.sleep(1)
print(time.localtime(time.time()), '\n')
print(time.asctime(time.localtime(time.time())), '\n')

1656576002.1171286 

time.struct_time(tm_year=2022, tm_mon=6, tm_mday=30, tm_hour=16, tm_min=0, tm_sec=2, tm_wday=3, tm_yday=181, tm_isdst=0) 

time.struct_time(tm_year=2022, tm_mon=6, tm_mday=30, tm_hour=16, tm_min=0, tm_sec=3, tm_wday=3, tm_yday=181, tm_isdst=0) 

Thu Jun 30 16:00:03 2022 



## 二、基础知识

## 三、协程

### 1.协程的用法

### 2、定义协程

In [34]:
import asyncio

async def execute(x):   # 定义一个协程
    print('Number', x)

coroutine = execute(1)  # 调用 execute()，返回一个协程对象
print('Coroutine:', coroutine)

Coroutine: <coroutine object execute at 0x00000205C88655C0>


  coroutine = execute(1)  # 调用 execute()，返回一个协程对象


In [36]:
import asyncio

async def execute(x):   # 定义一个 execute 方法
    print('Number', x)

coroutine = execute(1)  # 调用 execute()，返回一个 coroutine 协程对象
print('Coroutine:', coroutine)
print('After calling execute')  # 调用执行后

loop = asyncio.get_event_loop() # 利用 get_event_loop() 方法 创建一个事件循环 
loop.run_until_complete(coroutine)  # 
print('After calling loop')     # 调用循环后

Coroutine: <coroutine object execute at 0x00000205C891D0C0>
After calling execute


RuntimeError: This event loop is already running

理论结果

```
Coroutine: <coroutine object execute at 0x1034cf830>
After calling execute
Number: 1
After calling loop
```

> **可见，async 定义的方法就会变成一个无法直接执行的 coroutine 对象，必须将其注册到事件循环中才可以执行。**

In [1]:
import asyncio

async def execute(x):
    print('Number:', x)
    return x

coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')

loop = asyncio.get_event_loop()
# 显示声明地 对 coroutine 进行封装 
task = loop.create_task(coroutine)
print('Task:', task)
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')

Coroutine: <coroutine object execute at 0x000001712DACD2C0>
After calling execute
Task: <Task pending name='Task-3' coro=<execute() running at C:\Users\m1595\AppData\Local\Temp/ipykernel_21668/1158204210.py:3>>


RuntimeError: This event loop is already running

Number: 1


In [2]:
import asyncio

async def execute(x):
    print('Number:', x)
    return x

coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')

task = asyncio.ensure_future(coroutine)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')

Coroutine: <coroutine object execute at 0x000001712DA36F40>
After calling execute
Task: <Task pending name='Task-4' coro=<execute() running at C:\Users\m1595\AppData\Local\Temp/ipykernel_21668/578759774.py:3>>


RuntimeError: This event loop is already running

Number: 1
