# gevent使用

### gevent简介

Gevent是一种基于协程的Python网络库，它用到Greenlet提供的，封装了libevent事件循环的高层同步API。它让开发者在不改变编程习惯的同时，用同步的方式写异步I/O的代码。

在gevent中用到的主要模式是Greenlet, 它是以C扩展模块形式接入Python的轻量级协程。 Greenlet全部运行在主程序操作系统进程的内部，但它们被协作式地调度.

使用Gevent的性能确实要比用传统的线程高，甚至高很多。

### gevent的一些坑

1. Monkey-patching，猴子补丁，

如果使用了这个补丁，Gevent直接修改标准库里面大部分的阻塞式系统调用，包括socket、ssl、threading和 select等模块，而变为协作式运行。但是**无法保证**你在复杂的生产环境中有哪些地方使用这些标准库会由于打了补丁而出现奇怪的问题

2. 第三方库支持。得确保项目中用到其他用到的网络库也必须使用纯Python或者明确说明支持Gevent

3. gevent的猴子补丁不生效问题 

通常在程序的一开始处(程序入口)就打上猴子补丁,否则有可能补丁没有生效,通常在程序一开始处执行`from gevent import monkey; monkey.patch_all()`
有的时候打了猴子补丁,会产生一些怪异的情况,这些怪异的情况使无法判别的,所以猴子补丁谨慎使用.

## gevent用法

gevent对协程的支持,本质上是greenlet在实现切换工作

In [2]:
import time
import gevent


def foo():
    print('Running in foo1')
    gevent.sleep(2)
    print('Running in foo2')


def bar():
    print('Running in bar1')
    gevent.sleep(1)
    print('Running in bar2')


def func3():
    print("running in func1")
    gevent.sleep(0)
    print("running in func2")

start_time = time.time()
print('start time: {}'.format(start_time))
gevent.joinall([
    gevent.spawn(foo),
    gevent.spawn(bar),
    gevent.spawn(func3),
])
end_time = time.time()
print('end time: {}'.format(end_time))
print('共花费时间: {}'.format(end_time - start_time))


start time: 1532911238.115201
Running in foo1
Running in bar1
running in func1
running in func2
Running in bar2
Running in foo2
end time: 1532911240.1165957
共花费时间: 2.001394748687744


从运行结果可以看出，通过gevent.sleep()模拟执行IO操作，从而实现自动切换，程序最终花费的时间还是2秒

### gevent使用流程

In [None]:
# spawn方法可以看作是创建一个协程, joinall方法可以看作是添加任务,并启动运行协程
from gevent import monkey;monkey.patch_all()
import gevent
import requests


def run_task(url):
    try:
        r = requests.get(url)
    except Exception as e:
        print(e)

        
if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3']
    greenlets = [gevent.spawn(run_task, url) for url in urls]
    gevent.joinall(greenlets)

### 协程池的使用

In [None]:
# -*- coding:utf-8 -*-
from gevent import monkey; monkey.patch_all()
import gevent
from gevent.pool import Pool

exist_pages = []
with open("xxxxx.txt", "r", encoding="utf-8") as f:
    for line in f:
        exist_pages.append(int(line.strip()))


def get_page_index():
    for page in range(1, 623000):
        yield page


def record_leak_pages(page):
    with open("xxxx.txt", "a", encoding="utf-8") as f:
        f.write("{}\n".format(page))


def check_page_exist(page):
    if page not in exist_pages:
        record_leak_pages(page)


pool = Pool(200)
try:
    pool.map(check_page_exist, get_page_index())
except Exception as e:
    print(e)

### 用协程gevent写一个简单并发爬网页

In [3]:
from urllib import request
import gevent
import time


def fetch(url):
    print("get: {}".format(url))
    resp = request.urlopen(url)
    data = resp.read()
    print("{} bytes received from {}".format(len(data), url))


urls = ["http://sina.com.cn",
        "http://www.cnblogs.com/",
        "https://news.cnblogs.com/"]

time_start = time.time()
for url in urls:
    fetch(url)

print("同步串行cost:", time.time()-time_start)

async_time = time.time()
gevent.joinall([
    gevent.spawn(fetch, "http://sina.com.cn"),
    gevent.spawn(fetch, "http://www.cnblogs.com/"),
    gevent.spawn(fetch, "https://news.cnblogs.com/")
])
print("异步cost:", time.time()-async_time)

get: http://sina.com.cn
570962 bytes received from http://sina.com.cn
get: http://www.cnblogs.com/
45740 bytes received from http://www.cnblogs.com/
get: https://news.cnblogs.com/
76225 bytes received from https://news.cnblogs.com/
同步串行cost: 0.9758610725402832
get: http://sina.com.cn
570962 bytes received from http://sina.com.cn
get: http://www.cnblogs.com/
45740 bytes received from http://www.cnblogs.com/
get: https://news.cnblogs.com/
76225 bytes received from https://news.cnblogs.com/
异步cost: 0.6798131465911865


这里可以看出异步的时候和串行执行的时间基本差不多，其实这里的异步并没有起作用，因为这里的gevent并不能识别出urllib执行时的IO操作，想要是gevent实现异步的方法是导入模块：`from gevent import monkey`

In [6]:
import gevent
from gevent import monkey
from urllib import request
import time

monkey.patch_all()

def fetch(url):
    print("get: {}".format(url))
    resp = request.urlopen(url)
    data = resp.read()
    print("{} bytes received from {}".format(len(data), url))


urls = ["http://sina.com.cn",
        "http://www.cnblogs.com/",
        "https://news.cnblogs.com/"
        ]

time_start = time.time()
for url in urls:
    fetch(url)

print("同步串行cost:", time.time()-time_start)

async_time = time.time()
gevent.joinall([
    gevent.spawn(fetch, "http://sina.com.cn"),
    gevent.spawn(fetch, "http://www.cnblogs.com/"),
    gevent.spawn(fetch, "https://news.cnblogs.com/")
])
print("异步cost:", time.time()-async_time)

get: http://sina.com.cn
570962 bytes received from http://sina.com.cn
get: http://www.cnblogs.com/
45678 bytes received from http://www.cnblogs.com/
get: https://news.cnblogs.com/
76229 bytes received from https://news.cnblogs.com/
同步串行cost: 0.5046508312225342
get: http://sina.com.cn
get: http://www.cnblogs.com/
get: https://news.cnblogs.com/
76229 bytes received from https://news.cnblogs.com/
570962 bytes received from http://sina.com.cn
45702 bytes received from http://www.cnblogs.com/
异步cost: 0.1904904842376709


  with loop.timer(seconds, ref=ref) as t:
