## urllib模块
urllib提供了一系列用于操作URL的功能
1. Get
  - urllib的request模块可以非常方便地抓取URL内容，也就是发送一个GET请求到指定的页面，然后返回HTTP的相应
  - 如果要想模拟浏览器发送GET请求，就需要使用Request对象，通过网Request对象添加HTTP头，就可以把请求伪装成浏览器。

In [9]:
# 对豆瓣的一个URL https://api.douban.com/v2/book/2129650进行抓取，并返回响应
from urllib import request

with request.urlopen('https://www.baidu.com') as f:
    data = f.read()
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', data.decode('utf-8'))

Status: 200 OK
Accept-Ranges: bytes
Cache-Control: no-cache
Content-Length: 227
Content-Type: text/html
Date: Mon, 06 May 2019 07:41:38 GMT
Etag: "5cc18225-e3"
Last-Modified: Thu, 25 Apr 2019 09:47:17 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Pragma: no-cache
Server: BWS/1.1
Set-Cookie: BD_NOT_HTTPS=1; path=/; Max-Age=300
Set-Cookie: BIDUPSID=DA7C47147ABF3F170F274C1F57B56C6F; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1557128498; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Strict-Transport-Security: max-age=0
X-Ua-Compatible: IE=Edge,chrome=1
Connection: close
Data: <html>
<head>
	<script>
		location.replace(location.href.replace("https://","http://"));
	</script>
</head>
<body>
	<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>


In [12]:
from urllib import request

req = request.Request('http://www.douban.com/')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

Status: 200 OK
Date: Mon, 06 May 2019 07:53:20 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
X-Xss-Protection: 1; mode=block
X-Douban-Mobileapp: 0
Expires: Sun, 1 Jan 2006 01:00:00 GMT
Pragma: no-cache
Cache-Control: must-revalidate, no-cache, private
Set-Cookie: talionnav_show_app="0"
Set-Cookie: bid=7SLmnO_oPrw; Expires=Tue, 05-May-20 07:53:20 GMT; Domain=.douban.com; Path=/
X-DOUBAN-NEWBID: 7SLmnO_oPrw
X-DAE-Node: anson7
X-DAE-App: talion
Server: dae
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=15552000;
X-Content-Type-Options: nosniff
Data: 


<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/WebPage" class="ua-safari ua-mobile ">
    <head>
        <meta charset="UTF-8">
        <title>豆瓣(手机版)</title>
        <meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
        <meta name="viewport" content="width=device-width, height=device-height, user-sca

2. Post
  - 如果要以POST发送一个请求，只需要把参数data已bytes形式传入。

In [48]:
from urllib import request, parse

print('Login to weibo.cn...')
phone = input('Phone:')
passwd = input('Password:')
login_data = parse.urlencode([
    ('username', phone),
    ('password', passwd),
    ('entry', 'weibo'),
    ('client_id', ''),
    ('savestate', '1'),
    ('ec', ''),
    ('pagerefer', 'https://weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F')
])

req = request.Request('https://weibo.cn/sso/login')
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')

with request.urlopen(req, data=login_data.encode('utf-8')) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

Login to weibo.cn...
Phone:13751827397
Password:tang1992
Status: 200 OK
Server: Tengine/2.2.2
Date: Mon, 06 May 2019 09:15:12 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
X-Powered-By: PHP/7.2.1
Set-Cookie: MLOGIN=0; expires=Mon, 06-May-2019 10:15:12 GMT; Max-Age=3600; path=/; domain=.weibo.cn
X-Log-Uid: 
Set-Cookie: _T_WM=20127595165; expires=Thu, 16-May-2019 09:15:12 GMT; Max-Age=864000; path=/; domain=.weibo.cn
Set-Cookie: WEIBOCN_FROM=1110003030; path=/; domain=.weibo.cn; HttpOnly
Set-Cookie: M_WEIBOCN_PARAMS=uicode%3D20000174; expires=Mon, 06-May-2019 09:25:12 GMT; Max-Age=600; path=/; domain=.weibo.cn; HttpOnly
PROC_NODE: mweibo-10-41-21-71.dbl.intra.weibo.cn
SSL_NODE: ssl-011.mweibo.dbl.intra.weibo.cn
LB: 39.156.6.57
Data: <!DOCTYPE html>
<html lang="zh-cn">
<head>
    <meta charset="utf-8">
    <link rel="dns-prefetch" href="//h5.sinaimg.cn">
    <meta name="viewport" content="width=device-width,initial-scale=1,us

s = b'\u7cfb\u7edf\u9519\u8bef\uff0c\u8bf7\u7a0d\u540e\u518d\u8bd5'
print(s.decode('unicode-escape'))