# asyncio（异步协程）

异步协程是基于单线程+事件循环的非阻塞机制，不会创建新线程，而是让多个任务在同一个线程内切换执行。
异步协程适用于 I/O 任务（如网络请求），但不适合 CPU 密集型任务。

# 异步协程爬取红楼梦小说

In [20]:
import asyncio
import aiohttp
import aiofiles
from lxml import etree
import os
import time

from sqlalchemy.orm.strategy_options import defer

headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'}

if not os.path.exists('./data'):
    os.makedirs('./data')


start_time = time.time()


async   def func1(ch_url):
        async with aiohttp.ClientSession() as session:
            async with session.get(url=ch_url,headers=headers) as response:
                title_res = await response.text()
                ch_tree=etree.HTML(title_res)
                ch_list=ch_tree.xpath('/html/body/div[3]/div[2]/div[3]/div//a')
                data=[]
                for a in ch_list:
                    #获取tilte
                    title=a.text.strip()
                    #获取内容
                    content_url="https://m.shicimingju.com" + a.get('href')
                    data.append((title,content_url))#append() 只能接受 一个 参数，而 (title, content_url) 是 一个元组，所以它是正确的写法
                return data


async  def func2(data,lock):
            title,url=data
            async with aiohttp.ClientSession() as session:
                async with session.get(url=url,headers=headers) as response:
                    content_res=await response.text()
                    tree=etree.HTML(content_res)
                    content=tree.xpath('/html/body/div[3]/div[2]/div[2]/div//text()')
                    print('正在爬取：',title,'--',url)
                    content='\n'.join(content)
                    async with lock:
                        async with aiofiles.open('./data/hongloumeng.txt','a',encoding='utf-8') as fp:
                            await fp.write(title+':'+'\n'+ content +'\n')

async def main():
    ch_url='https://m.shicimingju.com/book/hongloumeng.html'
    data=await func1(ch_url)

    lock=asyncio.Lock()

    for d in data:
        await func2(d,lock)


asyncio.run(main())

print('爬取完成！')

end_time = time.time()

print(f'花费时间:{end_time - start_time:.2f}s')


正在爬取： 第 一 回 甄士隐梦幻识通灵 贾雨村风尘怀闺秀 -- https://m.shicimingju.com/book/hongloumeng/1.html
正在爬取： 第 二 回 贾夫人仙逝扬州城 冷子兴演说荣国府 -- https://m.shicimingju.com/book/hongloumeng/2.html
正在爬取： 第 三 回 托内兄如海酬训教 接外孙贾母惜孤女 -- https://m.shicimingju.com/book/hongloumeng/3.html
正在爬取： 第 四 回 薄命女偏逢薄命郎 葫芦僧乱判葫芦案 -- https://m.shicimingju.com/book/hongloumeng/4.html
正在爬取： 第 五 回 游幻境指迷十二钗 饮仙醪曲演红楼梦 -- https://m.shicimingju.com/book/hongloumeng/5.html
正在爬取： 第 六 回 贾宝玉初试云雨情 刘姥姥一进荣国府 -- https://m.shicimingju.com/book/hongloumeng/6.html
正在爬取： 第 七 回 送宫花贾琏戏熙凤 宴宁府宝玉会秦钟 -- https://m.shicimingju.com/book/hongloumeng/7.html
正在爬取： 第 八 回 比通灵金莺微露意 探宝钗黛玉半含酸 -- https://m.shicimingju.com/book/hongloumeng/8.html
正在爬取： 第 九 回 恋风流情友入家塾 起嫌疑顽童闹学堂 -- https://m.shicimingju.com/book/hongloumeng/9.html
正在爬取： 第 十 回 金寡妇贪利权受辱 张太医论病细穷源 -- https://m.shicimingju.com/book/hongloumeng/10.html
正在爬取： 第十一回 庆寿辰宁府排家宴 见熙凤贾瑞起淫心 -- https://m.shicimingju.com/book/hongloumeng/11.html
正在爬取： 第十二回 王熙凤毒设相思局 贾天祥正照 -- https://m.shicimingju.com/book/hongloumeng/12.html
正在爬取： 

由于我们在存储的时候使用asyncio.gather() 并发执行，会导致爬取的章节混乱，
若使用asyncio.Lock()可以确保写入时不会被打断，但会导致所有任务 按顺序执行，失去了并发的优势，这会让爬取速度变慢，和同步代码效果差不多。

# 混乱排序下的爬取，t=0.58s

In [19]:
import asyncio
import aiohttp
import aiofiles
from lxml import etree
import os
import time



headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'}

if not os.path.exists('./data'):
    os.makedirs('./data')


start_time = time.time()


async   def func1(ch_url):
        async with aiohttp.ClientSession() as session:
            async with session.get(url=ch_url,headers=headers) as response:
                title_res = await response.text()
                ch_tree=etree.HTML(title_res)
                ch_list=ch_tree.xpath('/html/body/div[3]/div[2]/div[3]/div//a')
                data=[]
                for a in ch_list:
                    #获取tilte
                    title=a.text.strip()
                    #获取内容
                    content_url="https://m.shicimingju.com" + a.get('href')
                    data.append((title,content_url))#append() 只能接受 一个 参数，而 (title, content_url) 是 一个元组，所以它是正确的写法
                return data


async  def func2(data):
            title,url=data
            async with aiohttp.ClientSession() as session:
                async with session.get(url=url,headers=headers) as response:
                    content_res=await response.text()
                    tree=etree.HTML(content_res)
                    content=tree.xpath('/html/body/div[3]/div[2]/div[2]/div//text()')
                    print('正在爬取：',title,'--',url)
                    content='\n'.join(content)

                    async with aiofiles.open('./data/hongloumeng.txt','a',encoding='utf-8') as fp:
                        await fp.write(title+':'+'\n'+ content +'\n')

async def main():
    ch_url='https://m.shicimingju.com/book/hongloumeng.html'
    data=await func1(ch_url)
    await asyncio.gather(*(func2(d) for d in data))

asyncio.run(main())

print('爬取完成！')

end_time = time.time()

print(f'花费时间:{end_time - start_time:.2f}s')


正在爬取： 第 一 回 甄士隐梦幻识通灵 贾雨村风尘怀闺秀 -- https://m.shicimingju.com/book/hongloumeng/1.html
正在爬取： 第十四回 林如海捐馆扬州城 贾宝玉路谒北静王 -- https://m.shicimingju.com/book/hongloumeng/14.html
正在爬取： 第 五 回 游幻境指迷十二钗 饮仙醪曲演红楼梦 -- https://m.shicimingju.com/book/hongloumeng/5.html
正在爬取： 第 六 回 贾宝玉初试云雨情 刘姥姥一进荣国府 -- https://m.shicimingju.com/book/hongloumeng/6.html
正在爬取： 第十五回 王凤姐弄权铁槛寺 秦鲸卿得趣馒头庵 -- https://m.shicimingju.com/book/hongloumeng/15.html
正在爬取： 第十三回 秦可卿死封龙禁尉 王熙凤协理宁国府 -- https://m.shicimingju.com/book/hongloumeng/13.html
正在爬取： 第 九 回 恋风流情友入家塾 起嫌疑顽童闹学堂 -- https://m.shicimingju.com/book/hongloumeng/9.html
正在爬取： 第 七 回 送宫花贾琏戏熙凤 宴宁府宝玉会秦钟 -- https://m.shicimingju.com/book/hongloumeng/7.html
正在爬取： 第 四 回 薄命女偏逢薄命郎 葫芦僧乱判葫芦案 -- https://m.shicimingju.com/book/hongloumeng/4.html
正在爬取： 第 二 回 贾夫人仙逝扬州城 冷子兴演说荣国府 -- https://m.shicimingju.com/book/hongloumeng/2.html
正在爬取： 第十一回 庆寿辰宁府排家宴 见熙凤贾瑞起淫心 -- https://m.shicimingju.com/book/hongloumeng/11.html
正在爬取： 第十六回 贾元春才选凤藻宫 秦鲸卿夭逝黄泉路 -- https://m.shicimingju.com/book/hongloumeng/16.html
正在爬取

# 虎扑球员头像下载
#### 单进程

In [29]:
import os
from curl_cffi import requests
from lxml import etree
import time


if not os.path.exists('./data'):
    os.makedirs("./data")

headers = {
    'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Mobile Safari/537.36',

    'Cookie':'smidV2=2025030919471573ced9ffaf5fe4edfa74b5526409ac50003801d34d210c9c0; new_nba=1; new_nba.sig=slSAJI6uejKCajd3mHOP1-Lssar98CC05plbblZ8sJo; Hm_lvt_6158ac1596b0de37381ffd343b3df24c=1741520843; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTk1N2FiYjMzMDkxMWRjLTBjYzZiMDkwZjYxOGRkLTFiNTI1NjM2LTE0ODQ3ODQtMTk1N2FiYjMzMGEyNjQ0In0%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%22%24device_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%7D; Hm_lvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741520848,1741597162; HMACCOUNT=6CB9AF89D45FA5C9; Hm_lvt_4fac77ceccb0cd4ad5ef1be46d740615=1741520848,1741597162; Hm_lvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741526750,1741597162; Hm_lpvt_4fac77ceccb0cd4ad5ef1be46d740615=1741623468; Hm_lpvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741623468; Hm_lpvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741623468'
    }

url='https://nba.hupu.com/players/{}'

team = {
    'lakers','clippers', 'warriors', 'kings', 'suns',
    'nuggets','timberwolves','thunder','blazers','jazz',
    'mavericks', 'rockets', 'grizzlies', 'pelicans', 'spurs',
    'celtics', 'nets', 'knicks', '76ers', 'raptors',
    'bucks', 'bulls', 'cavaliers', 'pistons', 'pacers',
    'heat', 'hawks', 'hornets', 'magic', 'wizards'
}

start_time=time.time()

for item in team:
    new_url = url.format(item)
    response=requests.get(url=new_url,headers=headers)
    # print(response.text)
    tree=etree.HTML(response.text)

    # soup=BeautifulSoup(response.text,'lxml')
    # print(page)

    #正则解析
    # pattern = r'<img\s+src="(https?://.*?\.(?:jpg|jpeg|png|gif))"'
    # pattern = r'<img\s+src="(https://gdc\.hupucdn\.com/gdc/nba/players/uploads/gamespace/players/.*?\.(?:jpg|jpeg|png|gif))"'
    #
    # img_str=re.findall(pattern,page,re.S)
    # # print(img_str)

    #BeautifulSoup解析
    # img_str = soup.select('.td_padding > a > img ')
    # print(img_str)


    list_tr= tree.xpath('//div[@class="players_right"]/table/tbody/tr')
    # print(f"找到 {len(list)} 个 tr 元素")  # 看看有没有找到 tr


    #储存图片
    for tr in list_tr[1:]:
        img_url= tr.xpath('./td/a/img/@src')[0]
        print(img_url)
        img_data=requests.get(img_url,headers=headers).content
        img_name=img_url.split('/')[-1]
        print(f'正在爬取{img_name}--{img_url}')

        with open(f'./data/{img_name}','wb') as f:
            f.write(img_data)


end_time=time.time()
print(f'花费时间：{end_time-start_time:.2f}s')
print(f'爬取完成')


https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0ff2369d4adc5151176346af1e551873.png
正在爬取0ff2369d4adc5151176346af1e551873.png--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0ff2369d4adc5151176346af1e551873.png
https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取brand.jpg--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/2179f6c213b6e5e3d07841d9e4933d67.jpg
正在爬取2179f6c213b6e5e3d07841d9e4933d67.jpg--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/2179f6c213b6e5e3d07841d9e4933d67.jpg
https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/6db0d0484401e2b302e8d9871104452b.jpg
正在爬取6db0d0484401e2b302e8d9871104452b.jpg--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/6db0d0484401e2b302e8d9871104452b.jpg
https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0abe20cd162576ffccb91bcc5b8bce55.jpg
正在爬取0abe20cd162576ffccb91bcc

#### 进程池

In [33]:
import os
from curl_cffi import requests
from lxml import etree
import time
from multiprocessing.dummy import Pool


if not os.path.exists('./data'):
    os.makedirs("./data")

headers = {
    'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Mobile Safari/537.36',

    'Cookie':'smidV2=2025030919471573ced9ffaf5fe4edfa74b5526409ac50003801d34d210c9c0; new_nba=1; new_nba.sig=slSAJI6uejKCajd3mHOP1-Lssar98CC05plbblZ8sJo; Hm_lvt_6158ac1596b0de37381ffd343b3df24c=1741520843; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTk1N2FiYjMzMDkxMWRjLTBjYzZiMDkwZjYxOGRkLTFiNTI1NjM2LTE0ODQ3ODQtMTk1N2FiYjMzMGEyNjQ0In0%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%22%24device_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%7D; Hm_lvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741520848,1741597162; HMACCOUNT=6CB9AF89D45FA5C9; Hm_lvt_4fac77ceccb0cd4ad5ef1be46d740615=1741520848,1741597162; Hm_lvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741526750,1741597162; Hm_lpvt_4fac77ceccb0cd4ad5ef1be46d740615=1741623468; Hm_lpvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741623468; Hm_lpvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741623468'
    }

url='https://nba.hupu.com/players/{}'


start_time=time.time()

def func1(url):
    team = {
    'lakers','clippers', 'warriors', 'kings', 'suns',
    'nuggets','timberwolves','thunder','blazers','jazz',
    'mavericks', 'rockets', 'grizzlies', 'pelicans', 'spurs',
    'celtics', 'nets', 'knicks', '76ers', 'raptors',
    'bucks', 'bulls', 'cavaliers', 'pistons', 'pacers',
    'heat', 'hawks', 'hornets', 'magic', 'wizards'
}
    data=[]
    for item in team:
        new_url = url.format(item)
        response=requests.get(url=new_url,headers=headers)
        # print(response.text)
        tree=etree.HTML(response.text)
        list_tr= tree.xpath('//div[@class="players_right"]/table/tbody/tr')
        # print(f"找到 {len(list)} 个 tr 元素")  # 看看有没有找到 tr

        #获取data
        for tr in list_tr[1:]:
            img_url= tr.xpath('./td/a/img/@src')[0]
            img_name=tr.xpath('./td[2]/b/a/text()')[0]
            data.append((img_name,img_url))

    return data


def func2(data):
    img_name,img_url=data
    img_data=requests.get(img_url,headers=headers).content
    print(f'正在爬取{img_name}--{img_url}')

    with open(f'./data/{img_name}.png','ab') as f:
        f.write(img_data)

data=func1(url)
pool=Pool(4)
pool.map(func2,data)
pool.close()
pool.join()


end_time=time.time()
print(f'花费时间：{end_time-start_time:.2f}s')
print(f'爬取完成')


正在爬取丹吉洛-拉塞尔--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0ff2369d4adc5151176346af1e551873.png
正在爬取迈尔斯·诺里斯--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取兰德里-沙梅特--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/40c7f5d1e66fcee628d6ae0f5c5206ce.png
正在爬取特里-罗齐尔--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/7f60a66a7e4142a4332ca2eec1f9cfc5.png
正在爬取泰雷斯-马丁--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取艾尔-霍福德--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/1940cf829ae2057e93277b45dda3a527.png
正在爬取马克-威廉姆斯--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取杰伦-布伦森--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/4a16bbdd795d77882486dbc4d14aa8a2.png
正在爬取德鲁·彼得森--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取约什-哈特--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0e62451d534e4450ab38e8504d198bad.png
正在爬取小温德尔-摩尔--https://c2.hoopchina.com.c

#### 异步协程

In [36]:
import asyncio
import aiohttp
import time
import os
import asyncio
from lxml import etree


if not os.path.exists('./data'):
    os.makedirs("./data")

headers = {
    'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Mobile Safari/537.36',

    'Cookie':'smidV2=2025030919471573ced9ffaf5fe4edfa74b5526409ac50003801d34d210c9c0; new_nba=1; new_nba.sig=slSAJI6uejKCajd3mHOP1-Lssar98CC05plbblZ8sJo; Hm_lvt_6158ac1596b0de37381ffd343b3df24c=1741520843; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTk1N2FiYjMzMDkxMWRjLTBjYzZiMDkwZjYxOGRkLTFiNTI1NjM2LTE0ODQ3ODQtMTk1N2FiYjMzMGEyNjQ0In0%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%22%24device_id%22%3A%221957abb330911dc-0cc6b090f618dd-1b525636-1484784-1957abb330a2644%22%7D; Hm_lvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741520848,1741597162; HMACCOUNT=6CB9AF89D45FA5C9; Hm_lvt_4fac77ceccb0cd4ad5ef1be46d740615=1741520848,1741597162; Hm_lvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741526750,1741597162; Hm_lpvt_4fac77ceccb0cd4ad5ef1be46d740615=1741623468; Hm_lpvt_b241fb65ecc2ccf4e7e3b9601c7a50de=1741623468; Hm_lpvt_a3d34dd67fa1fb34b2b430bbaaa2a5bf=1741623468'
    }



start_time=time.time()

async def func1(url):

    team = {
    'lakers','clippers', 'warriors', 'kings', 'suns',
    'nuggets','timberwolves','thunder','blazers','jazz',
    'mavericks', 'rockets', 'grizzlies', 'pelicans', 'spurs',
    'celtics', 'nets', 'knicks', '76ers', 'raptors',
    'bucks', 'bulls', 'cavaliers', 'pistons', 'pacers',
    'heat', 'hawks', 'hornets', 'magic', 'wizards'
}
    data=[]
    for item in team:
        new_url = url.format(item)
        async with aiohttp.ClientSession() as session:
            async with session.get(new_url,headers=headers) as response:
                response=await response.text()
                # print(response.text)
                tree=etree.HTML(response)
                list_tr= tree.xpath('//div[@class="players_right"]/table/tbody/tr')
                # print(f"找到 {len(list)} 个 tr 元素")  # 看看有没有找到 tr
                #获取data
                for tr in list_tr[1:]:
                    img_url= tr.xpath('./td/a/img/@src')[0]
                    img_name=tr.xpath('./td[2]/b/a/text()')[0]
                    data.append((img_name,img_url))

    return data


async def func2(data):
    img_name,img_url=data
    async with aiohttp.ClientSession() as session:
        async with session.get(img_url) as response:

            img_data=await response.read()
            print(f'正在爬取{img_name}--{img_url}')

            async with aiofiles.open(f'./data/{img_name}.png','ab') as f:
                await f.write(img_data)

async def main():
    url='https://nba.hupu.com/players/{}'
    data=await func1(url)
    task =[func2(d) for d in data]
    await asyncio.gather(*task)


asyncio.run(main())


end_time=time.time()
print(f'花费时间：{end_time-start_time:.2f}s')
print(f'爬取完成')



正在爬取特伦登-沃特福特--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/6d403d7aed5efd43a8d9c7ce25fbc09c.jpg
正在爬取丹吉洛-拉塞尔--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0ff2369d4adc5151176346af1e551873.png
正在爬取宰伊尔-威廉姆斯--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0c0af9b10485988089b9d172bb6e31bb.jpg
正在爬取卡梅伦-托马斯--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/0abe20cd162576ffccb91bcc5b8bce55.jpg
正在爬取泰雷斯-马丁--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取卡梅伦-约翰逊--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/d35224b8a8243c824359788328ee0566.png
正在爬取约什-克里斯托弗--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/852c1dacdaa2c580dc131e9048066eaa.jpg
正在爬取达里克-怀特黑德--https://c2.hoopchina.com.cn/images/gamespace/brand.jpg
正在爬取戴罗恩-夏普--https://gdc.hupucdn.com/gdc/nba/players/uploads/gamespace/players/6db0d0484401e2b302e8d9871104452b.jpg
正在爬取基恩-约翰逊--https://gdc.hupucdn.com/gdc/nba/player

在爬取图片时:
1.单线程：86s
2.进程池：25s
3.异步协程：5s

# 总结：
	·async 用于定义异步函数，它返回一个 coroutine（协程对象）。
	async with 适用于 “需要管理资源的异步操作”，比如：
	•	aiohttp.ClientSession()（网络请求）
	•	aiofiles.open()（异步文件操作）
	•	asyncpg.create_pool()（数据库连接池）


	·await 只能在 async 函数内使用，用于等待异步操作完成。
    常见的 await 场景：
	•	网络请求（await response.text()）
	•	文件 I/O（await f.read()、await f.write()）
	•	数据库查询（await connection.fetch()）
	•	异步 sleep（await asyncio.sleep(1)）


	·asyncio.run(coroutine) 用于运行一个 async 函数。


	·asyncio.gather() 并发执行多个任务，同时运行多个协程，提升性能。适用于多个异步任务同时执行的情况，比如批量爬取网页。
     task = [func2(d) for d in data]
     await asyncio.gather(*task)
     是同时运行多个异步任务的最佳方式，适用于并发执行多个任务。


	·asyncio.Semaphore(n) 用于控制最多同时执行 n 个任务。
	
	·asyncio.create_task() 适用于管理多个独立的异步任务，不需要 await 立即等待它完成。
	
	·普通 open() 在 async 里不能用，需要用 aiofiles 进行异步文件操作。
	

	·结合 aiohttp 进行异步网络请求。


