## requests库的高级用法

本节介绍requests库的一些高级用法，例如：文件上传、处理cookie，超时设置等。

## 1、文件上传

前面我们通过 post 请求发送的是文本内容，也就是 ASCII 字符。如果需要发送文件到服务器，比如上传图片、视频等，就需要发送二进制数据。一般上传文件使用的都是Content-Type: multipart/form-data;数据类型，可以发送文件，也可以发送相关的消息体数据。

使用 requests 上传文件的基本步骤：

1）构造文件数据，通过 open 函数以二进制方式打开文件

2）构造相关数据

3）发送请求，将文件数据以 files 参数传入，其他消息体数据通过 data或 json 传入

## 上传的前提条件

1：有一个上传接口，地址如下：http://xx.xx.xx.xx//upload/stream(公司的服务地址）

2：上传接口的参数如下所示：

{"parentId":"","fileCategory":"personal","fileSize":179,"fileName":"summer_text_0920.txt","uoType":1}

其中有两个参数需要跟大家解释一下：filesize:指的是文件的字节大小。 filename:指的是你上传之后保存的文件名~记得不要搞错了后缀哟。其他的参数可以忽略，在做自己公司接口的时候，就按照自己公司的接口文档去做即可。

## 开始操作：

用requests上传文件比较简单，只需指定post方法的file参数即可。

In [None]:
#requests库除了可以模拟提交一些数据，假如有网站需要上传文件，也可以用它来实现。示例如下：

本例使用post方法向http://httpbin.org/post上传一个本地图片

In [2]:
import requests

response1 = requests.post("http://httpbin.org/post")
print(response1.text)

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "0", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e60f28d-569d568842702201148cb157"
  }, 
  "json": null, 
  "origin": "1.80.145.216", 
  "url": "http://httpbin.org/post"
}



In [3]:
import requests

files = {'file': open('favicon.ico', 'rb')}
response = requests.post("http://httpbin.org/post", files=files)
print(response.text)

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "data:application/octet-stream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAAAAAAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABERE3YTExPFDg4OEgAAAAAAAAAADw8PERERFLETExNpAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABQUFJYTExT8ExMU7QAAABkAAAAAAAAAAAAAABgVFRf/FRUX/xERE4UAAAAAAAAAAAAAAAAAAAAAAAAAABEREsETExTuERERHhAQEBAAAAAAAAAAAAAAAAAAAAANExMU9RUVF/8VFRf/EREUrwAAAAAAAAAAAAAAABQUFJkVFRf/BgYRLA4ODlwPDw/BDw8PIgAAAAAAAAAADw8PNBAQEP8VFRf/FRUX/xUVF/8UFBSPAAAAABAQEDAPDQ//AAAA+QEBAe0CAgL/AgIC9g4ODjgAAAAAAAAAAAgICEACAgLrFRUX/xUVF/8VFRf/FRUX/xERES0UFBWcFBQV/wEBAfwPDxH7DQ0ROwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA0NEjoTExTnFRUX/xUVF/8SEhKaExMT2RUVF/8VFRf/ExMTTwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAERERTBUVF/8VFRf/ExMT2hMTFPYVFRf/FBQU8AAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAITExTxFRUX/xMTFPYTExT3FRUX/xQUFOEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBQU4RUVF/8TExT3FBQU3hUVF/8TExT5Dw8PIQAAAAAAAAAAA

In [3]:
#备注：这是一个模拟文件上传的过程，需要注意的是改图片文件要个当前脚本在同一目录下。同时，这里的form为空，说明文件上传
#会有一个files字段来标识。

In [4]:
import requests
files = {'file':open('手指下.png', 'rb')}
r = requests.post('http://httpbin.org/post', files=files)
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "data:application/octet-stream;base64,iVBORw0KGgoAAAANSUhEUgAAAR8AAAFBCAYAAABO06vKAAAYVGlDQ1BJQ0MgUHJvZmlsZQAAWIWVWQdUVEuT7jt5hjxDzjmD5JxzzhlEYEhDEoYooAgikhRBBQRFRSSqqCigIghiQFGCD0REBJGgooAKKkH2EvS9/d+e3bN9Tt/7TXV1dVV1dag7AHDt942MDEMwAhAeEUN1MDXgd3P34Me+AwhACxiANCD4kqMj9e3srABcfr//e1kaBNDG+7nMhqx/t/+vhck/IJoMAGQHYz//aHI4jK8DgEojR1JjAMCowXSh+JjIDewFY2YqrCCMIzdw0BZO38B+W7h4k8fJwRDGFwHA0fr6UoMAoG+G6fxx5CBYDv0Q3EaM8KdEwKyzMNYhB/v6A8AlDfNIh4fv3sBuMBb3+4ecoP8m0++PTF/foD94y5bNgjOiREeG+e75f7rj/y7hYbG/xxCFK20w1cxhw2bYb0Ohuy03MC2MZyP8bGxhTITxD4r/Jj+MEYTgWDPnLX4ENznaEPYZYIWxnL+vkSWMuWFsEhFmY7VN9wukmJjDGI4QRAIlxtxpu29mQLSx47bMU9TdDra/cSDVUH+772Vf6ua4G/ydsaHO+tvyh4IDzH/L/5YY7OQKYwIASEIcxcUGxvQwZo4OdbTc4kEKJgYb2vzmocY6bOgvDGO1gAhTgy35SK9AqonDNn9kePRve5EZwRRzm21cHBPsZLblH2Qt2XdTf3YYNwZE6Dv/lhMQ7Wb12xb/ACPjLduRzwIinLftRY5Gxhg4bPedjwyz2+ZH4QLCTDfogjDmjI5z3O6L0oqBA3JLPsoqMsbOaUtPlE+Ir4Xdlj6oOGAFDIER4AexcPUDu0EIoDybbZqFf221mABfQAVBIADIbFN+93DdbImAn

## 2、Coolies

获得cookie的方式

In [6]:
import requests
headers={'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36'}
response = requests.get("https://www.baidu.com",headers=headers)
print(response.status_code)
print(response.cookies)



200
<RequestsCookieJar[<Cookie BAIDUID=D0387FC04FAA050A66E2AAE475CDDC6D:FG=1 for .baidu.com/>, <Cookie H_WISE_SIDS=142081_141693_142979_142209_139560_142075_142062_142115_128700_135846_141003_142019_141838_142357_140631_139045_143162_140853_142514_138878_140988_141899_142397_142780_142286_136862_131862_140174_131246_141261_138165_140324_138883_133847_141941_127969_140065_140593_143060_141807_138425_141008_141190_142162_143275_141926_131423_141864_141917_107311_142345_138595_142272_138663_136753_110085 for .baidu.com/>, <Cookie rsv_i=3a97OEJIsjV1UU3%2BRvb62%2F4YV2TakPOUQ20fUXOej7Tcku%2BhERfmRCx351BmhmoZp7MzIFetNWB5TsRRkBPVfAAI2EFwV4s for .baidu.com/>, <Cookie BDSVRTM=98 for www.baidu.com/>]>


In [7]:
for key, value in response.cookies.items():
    print(key + '=' + value)

BAIDUID=D0387FC04FAA050A66E2AAE475CDDC6D:FG=1
H_WISE_SIDS=142081_141693_142979_142209_139560_142075_142062_142115_128700_135846_141003_142019_141838_142357_140631_139045_143162_140853_142514_138878_140988_141899_142397_142780_142286_136862_131862_140174_131246_141261_138165_140324_138883_133847_141941_127969_140065_140593_143060_141807_138425_141008_141190_142162_143275_141926_131423_141864_141917_107311_142345_138595_142272_138663_136753_110085
rsv_i=3a97OEJIsjV1UU3%2BRvb62%2F4YV2TakPOUQ20fUXOej7Tcku%2BhERfmRCx351BmhmoZp7MzIFetNWB5TsRRkBPVfAAI2EFwV4s
BDSVRTM=98


In [None]:
#首先，使用cookies属性就可以成功的获取到Cookies。当然，还可以直接用Cookies维持登陆状态

### <img src="手指下.png">动手练一练

案例：使用cookie属性获取服务端发送来的Cookie,遍历其所有的cookies

### 基本用法

简单说cookie就是客户端向服务器端保持状态的，它可以辨别用户的身份，大部分是加密的。我们可以用的就是模拟登陆，在需要输入登陆账号和密码的网站就可以利用Cookie来获取数据，比如csdn。

一般携带Cookie请求有三种方式：

三种Cookie请求方式：

第一种：cookie放在headers中：以请求自己的知乎为例

In [None]:
#比如：登陆CSDN，将Header中的Cookies内容复制下来。
#然后，替换称自己的Cookies，设置在headers里就可以，然后再发送请求。

这里以请求我自己的CSDN博客首页为例： 首先找到登陆之后的Cookie和User-Agent，然后将User-Agent和Cookie复制到程序里面

In [8]:
import requests
headers={'cookie':'uuid_tt_dd=10_6071013070-1582718337976-180892; dc_session_id=10_1582718337976.106282; TY_SESSION_ID=84b79899-525c-4484-9d56-b723a7b6a851; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_6071013070-1582718337976-180892!5744*1*weixin_41477876; __gads=ID=8f84570e58407855:T=1582718339:S=ALNI_MYeaI4I6L7qBV8_NGvH0N-ycojkCw; Hm_lvt_e5ef47b9f471504959267fd614d579cd=1583123073; Hm_lpvt_e5ef47b9f471504959267fd614d579cd=1583123073; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6071013070-1582718337976-180892; __yadk_uid=S5LCpPKweFPEZhDART1tEH9SubV1DI6s; SESSION=cf2f3cb2-7f37-4498-b1b9-8aceaa3ae255; UserName=weixin_41477876; UserInfo=7019b561038545b5a988b1794ca99f21; UserToken=7019b561038545b5a988b1794ca99f21; UserNick=weixin_41477876; AU=E2C; UN=weixin_41477876; BT=1583135526331; p_uid=U000000; announcement=%257B%2522isLogin%2522%253Atrue%252C%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblog.csdn.net%252Fblogdevteam%252Farticle%252Fdetails%252F103603408%2522%252C%2522announcementCount%2522%253A0%252C%2522announcementExpire%2522%253A3600000%257D; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1583123585,1583131094,1583135500,1583137539; c_ref=https%3A//blog.csdn.net/weixin_41477876; utm_source=distribute.pc_feed.none-task; dc_tos=q6k4ue; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1583137671'
        , 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36'}
r=requests.get("https://blog.csdn.net/weixin_41477876",headers=headers)
print(r.status_code)
print(r.text)

200
<html>
<head>
    <link rel="canonical" href="https://blog.csdn.net/weixin_41477876" />
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta name="referrer" content="always">
    <meta name="viewport"
          content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <meta name="apple-mobile-web-app-status-bar-style" content="yes">
    <meta name="report" content='{"pid":"blog"}'>
    <meta name="shenma-site-verification" content="bf32a58a3df1c3452b75fbe96c7feb42_1497604697">
    <script src='//g.csdnimg.cn/tingyun/1.8.3/blog-m.js' type='text/javascript'></script>
    <script src="https://g.csdnimg.cn/debug/1.0.0/debug.js"></script>
    <script src="https://csdnimg.cn//public/common/libs/jquery/jquery-1.9.1.min.js" type="text/javascript"></script>
    <link rel="stylesheet" href="https://csdnimg.cn//public/common/libs/bootstrap/css/bootstrap.css

In [9]:
with open("csdn.html", "w", encoding="utf-8") as f:
    f.write(r.content.decode());

第二种：cookie字典传给cookies参数

这里以请求人人网为例：

In [13]:
url = "http://www.renren.com/967272361/profile"
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36'}

In [15]:
re1=requests.get(url,headers=headers)
print(re1.text)

<!doctype html><html class="nx-main980" >
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE8" />
<meta name="Description" content="人人网 校内是一个真实的社交网络，联络你和你周围的朋友。 加入人人网校内你可以:联络朋友，了解他们的最新动态；和朋友分享相片、音乐和电影；找到老同学，结识新朋友；用照片和日志记录生活,展示自我。" />
<meta name="Keywords" content="Xiaonei,Renren,校内,大学,同学,同事,白领,个人主页,博客,相册,群组,社区,交友,聊天,音乐,视频,校园,人人,人人网" />
<meta property="qc:admins" content="232517306762562566375" />
<meta property="wb:webmaster" content="f2fdc876b8ba2a5d" />
<meta name="msApplication-ID" content="App" />
<meta name="msApplication-PackageFamilyName" content="57722RenRenpreview.RenrenHD_fknrsfzqca1jw" /><link rel="shortcut icon" type="image/x-icon" href="http://a.xnimg.cn/favicon-rr.ico?ver=3" />
<link rel="apple-touch-icon" href="http://a.xnimg.cn/wap/apple_icon_.png" />
<script type="text/javascript">
XN = {get_check:'',get_check_x:'5149c2a8',env:{domain:'renren.com',shortSiteName:'人人',siteName:'人人网'}};
try

In [17]:
with open("ren.html", "w", encoding="utf-8") as f:
    f.write(re1.content.decode());

In [48]:
#没有cookie就访问不了这个页面，

In [27]:
url = "http://www.renren.com/967272361/profile"
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36' }
cookies={'anonymid': 'k7a6emymrdaija', ' _r01_': '1', ' JSESSIONID': 'abcOBHha_YI7L8DLqrAcx', ' ick_login': '28f01abd-7466-4ab2-8250-02c0eaa8191e', ' taihe_bi_sdk_uid': 'b0d5c39309d5264aadd3924008d0f4d5', ' taihe_bi_sdk_session': 'c9ce3527d754df67715ba9985d0d8f26', ' loginfrom': 'null', ' jebe_key': 'ce041811-ae29-4914-8c94-0849bca1c393%7C6b5705a589a58f667a9538850e69b067%7C1583136206444%7C1%7C1583413352569', ' depovince': 'GW', ' t': '12113f10740557b2d5734fe702230ac18', ' societyguester': '12113f10740557b2d5734fe702230ac18', ' id': '973893508', ' xnsid': '5538b9b7', ' jebecookies': '0117e205-332d-43bb-8fd6-40b0cd890353|||||', ' ver': '7.0', ' wp_fold': '0'}
re2=requests.get(url,headers=headers,cookies=cookies)
print(re2.text)





<!Doctype html>
<html class="nx-main860">
<head>
    <meta name="Description" content="人人网 校内是一个真实的社交网络，联络你和你周围的朋友。 加入人人网校内你可以:联络朋友，了解他们的最新动态；和朋友分享相片、音乐和电影；找到老同学，结识新朋友；用照片和日志记录生活,展示自我。"/>
    <meta name="Keywords" content="Xiaonei,Renren,校内,大学,同学,同事,白领,个人主页,博客,相册,群组,社区,交友,聊天,音乐,视频,校园,人人,人人网"/>
    <title>人人网 - 起风了</title>
    <meta charset="utf-8"/>
<link rel="shortcut icon" type="image/x-icon" href="http://a.xnimg.cn/favicon-rr.ico?ver=3" />
<link rel="apple-touch-icon" href="http://a.xnimg.cn/wap/apple_icon_.png" />
<link rel="stylesheet" type="text/css" href="http://s.xnimg.cn/a86614/nx/core/base.css">
<script type="text/javascript">
if(typeof nx === 'undefined'){
var nx = {};
}
nx.log = {
startTime : + new Date()
};
nx.user = {
id : "973893508",
ruid:"973893508",
tinyPic	: "http://head.xiaonei.com/photos/0/0/men_tiny.gif ",
name : "新用户oF0z",
privacy: "99",
requestToken : '797393854',
_rtk : 'fb484218'
};nx.user.isvip = false;nx.user.hidead = false;nx.webpager = nx.webpager || {}

In [28]:
with open("renren.html", "w", encoding="utf-8") as f:
    f.write(re2.content.decode());

补充知识：浏览器的cookie的值改成字典格式的方法

In [58]:
#首先我们把复制的cookie的值赋给b
b = 'bid=Qzw9cKnyESM; ll="108288"; __yadk_uid=4YChvgeANLBEh4iV00n1tc0HQ8zpmSl1; __utmc=30149280; __utmc=223695111; _vwo_uuid_v2=D8099FF3ECFE384A3F35BFA190C05A5EE|91f795432cda34bbc17ba6265fb33177; ps=y; dbcl2="169126613:FUpqH/CNWB8"; ck=pyZ7; ap=1; push_noty_num=0; push_doumail_num=0; __utmz=30149280.1520490941.8.7.utmcsr=accounts.douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/login; __utmv=30149280.16912; __utmz=223695111.1520492304.6.4.utmcsr=douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/search; ct=y; __utma=30149280.1712477244.1514880643.1520490941.1520496097.9; __utmb=30149280.0.10.1520496097; __utma=223695111.1169484511.1516955420.1520492304.1520496097.7; __utmb=223695111.0.10.1520496097; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1520496097%2C%22https%3A%2F%2Fwww.douban.com%2Fsearch%3Fsource%3Dsuggest%26q%3D%25E5%2589%258D%25E4%25BB%25BB%22%5D; _pk_ses.100001.4cf6=*; _pk_id.100001.4cf6=21a4461bbb469631.1516955420.7.1520496674.1520492685'

In [23]:
#首先我们把复制的cookie的值赋给b
b='anonymid=k7a6emymrdaija; _r01_=1; JSESSIONID=abcOBHha_YI7L8DLqrAcx; ick_login=28f01abd-7466-4ab2-8250-02c0eaa8191e; taihe_bi_sdk_uid=b0d5c39309d5264aadd3924008d0f4d5; taihe_bi_sdk_session=c9ce3527d754df67715ba9985d0d8f26; loginfrom=null; jebe_key=ce041811-ae29-4914-8c94-0849bca1c393%7C6b5705a589a58f667a9538850e69b067%7C1583136206444%7C1%7C1583136206543; depovince=GW; t=12113f10740557b2d5734fe702230ac18; societyguester=12113f10740557b2d5734fe702230ac18; id=973893508; xnsid=5538b9b7; jebecookies=0117e205-332d-43bb-8fd6-40b0cd890353|||||; ver=7.0; wp_fold=0; jebe_key=ce041811-ae29-4914-8c94-0849bca1c393%7C6b5705a589a58f667a9538850e69b067%7C1583136206444%7C1%7C1583413352569'

In [24]:
#然后根据分号切片
line = b.split(';')

In [25]:
#然后遍历并切片存入字典cookie
cookie={}
for i in line:
    key,value = i.split('=',1)
    cookie[key] = value

In [26]:
#然后打印一下看下效果
print(cookie)

{'anonymid': 'k7a6emymrdaija', ' _r01_': '1', ' JSESSIONID': 'abcOBHha_YI7L8DLqrAcx', ' ick_login': '28f01abd-7466-4ab2-8250-02c0eaa8191e', ' taihe_bi_sdk_uid': 'b0d5c39309d5264aadd3924008d0f4d5', ' taihe_bi_sdk_session': 'c9ce3527d754df67715ba9985d0d8f26', ' loginfrom': 'null', ' jebe_key': 'ce041811-ae29-4914-8c94-0849bca1c393%7C6b5705a589a58f667a9538850e69b067%7C1583136206444%7C1%7C1583413352569', ' depovince': 'GW', ' t': '12113f10740557b2d5734fe702230ac18', ' societyguester': '12113f10740557b2d5734fe702230ac18', ' id': '973893508', ' xnsid': '5538b9b7', ' jebecookies': '0117e205-332d-43bb-8fd6-40b0cd890353|||||', ' ver': '7.0', ' wp_fold': '0'}


### <img src="手指下.png">动手练一练

使用你的账号和密码登录豆瓣网页

In [32]:
b='ll="118371"; bid=ub8iu801IJ0; __utmc=30149280; _vwo_uuid_v2=D0A969D0181D2160DBA297A0038214695|94d8cc3802a6016913ef359ddd842875; ap_v=0,6.0; __yadk_uid=zqitzFjc2D5K4JWfaL9uKsHLFY2PL9du; push_noty_num=0; push_doumail_num=0; __utmv=30149280.16478; __gads=ID=0dac229478ae8334:T=1583142911:S=ALNI_MYxv27rJqGA9xG30womgVHxhjIdrA; douban-profile-remind=1; ct=y; gr_user_id=fb3b3ab5-cafb-4b6c-b8fe-5827fd217e1d; _pk_ref.100001.8cb4=%5B%22%22%2C%22%22%2C1583147822%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DXySZ2FzLomCJBH5FvKViqDBfpaeDdLb7Cm5qfFJD92EEHdPI9Ej-hgSiE1n6IPGN%26wd%3D%26eqid%3D896d52ad000174f3000000065e5ceb29%22%5D; _pk_ses.100001.8cb4=*; __utma=30149280.1811576128.1583124717.1583142805.1583147822.3; __utmz=30149280.1583147822.3.3.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmt=1; dbcl2="164789361:M3TKWQS1lXs"; ck=54qd; _pk_id.100001.8cb4=352aa3b3e5f2fa44.1583124715.3.1583148652.1583145329.; __utmb=30149280.21.6.1583148423889'

In [33]:
line = b.split(';')

In [34]:
line

['ll="118371"',
 ' bid=ub8iu801IJ0',
 ' __utmc=30149280',
 ' _vwo_uuid_v2=D0A969D0181D2160DBA297A0038214695|94d8cc3802a6016913ef359ddd842875',
 ' ap_v=0,6.0',
 ' __yadk_uid=zqitzFjc2D5K4JWfaL9uKsHLFY2PL9du',
 ' push_noty_num=0',
 ' push_doumail_num=0',
 ' __utmv=30149280.16478',
 ' __gads=ID=0dac229478ae8334:T=1583142911:S=ALNI_MYxv27rJqGA9xG30womgVHxhjIdrA',
 ' douban-profile-remind=1',
 ' ct=y',
 ' gr_user_id=fb3b3ab5-cafb-4b6c-b8fe-5827fd217e1d',
 ' _pk_ref.100001.8cb4=%5B%22%22%2C%22%22%2C1583147822%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DXySZ2FzLomCJBH5FvKViqDBfpaeDdLb7Cm5qfFJD92EEHdPI9Ej-hgSiE1n6IPGN%26wd%3D%26eqid%3D896d52ad000174f3000000065e5ceb29%22%5D',
 ' _pk_ses.100001.8cb4=*',
 ' __utma=30149280.1811576128.1583124717.1583142805.1583147822.3',
 ' __utmz=30149280.1583147822.3.3.utmcsr=baidu|utmccn=(organic)|utmcmd=organic',
 ' __utmt=1',
 ' dbcl2="164789361:M3TKWQS1lXs"',
 ' ck=54qd',
 ' _pk_id.100001.8cb4=352aa3b3e5f2fa44.1583124715.3.1583148652.1583145329.',
 ' __u

In [35]:
cookie={}
for i in line:
    key,value = i.split('=',1)
    cookie[key] = value

In [36]:
cookie

{'ll': '"118371"',
 ' bid': 'ub8iu801IJ0',
 ' __utmc': '30149280',
 ' _vwo_uuid_v2': 'D0A969D0181D2160DBA297A0038214695|94d8cc3802a6016913ef359ddd842875',
 ' ap_v': '0,6.0',
 ' __yadk_uid': 'zqitzFjc2D5K4JWfaL9uKsHLFY2PL9du',
 ' push_noty_num': '0',
 ' push_doumail_num': '0',
 ' __utmv': '30149280.16478',
 ' __gads': 'ID=0dac229478ae8334:T=1583142911:S=ALNI_MYxv27rJqGA9xG30womgVHxhjIdrA',
 ' douban-profile-remind': '1',
 ' ct': 'y',
 ' gr_user_id': 'fb3b3ab5-cafb-4b6c-b8fe-5827fd217e1d',
 ' _pk_ref.100001.8cb4': '%5B%22%22%2C%22%22%2C1583147822%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DXySZ2FzLomCJBH5FvKViqDBfpaeDdLb7Cm5qfFJD92EEHdPI9Ej-hgSiE1n6IPGN%26wd%3D%26eqid%3D896d52ad000174f3000000065e5ceb29%22%5D',
 ' _pk_ses.100001.8cb4': '*',
 ' __utma': '30149280.1811576128.1583124717.1583142805.1583147822.3',
 ' __utmz': '30149280.1583147822.3.3.utmcsr=baidu|utmccn=(organic)|utmcmd=organic',
 ' __utmt': '1',
 ' dbcl2': '"164789361:M3TKWQS1lXs"',
 ' ck': '54qd',
 ' _pk_id.100001.8cb4':

In [37]:
import requests
cookies=cookie
headers={'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36'}

In [38]:

r = requests.get(url,headers = headers)

In [39]:
print(r.text)



<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/WebPage" class="ua-chrome ua-mobile ">
  <head>
      <meta charset="UTF-8">
      <title>豆瓣(手机版)</title>
      <meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
      <meta name="viewport" content="width=device-width, height=device-height, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
      <meta name="format-detection" content="telephone=no">
      <meta name="description" content="读书、看电影、涨知识、学穿搭...，加入兴趣小组，获得达人们的高质量生活经验，找到有相同爱好的小伙伴。">
      <meta name="keywords" content="豆瓣,手机豆瓣,豆瓣手机版,豆瓣电影,豆瓣读书,豆瓣同城">
      <link rel="canonical" href="
https://m.douban.com/">
      <link href="https://img3.doubanio.com/f/talion/20f294507038a0d03718cd15b4defe16ea78d05a/css/card/base.css" rel="stylesheet">
      
<script>
  var saveKey = '_t_splash'
  var day = 3
  if (Date.now() - window.localStorage.getItem(saveKey) < 1000 * 60 * 60 * 24 * day) {
    window.location

### 第三种 先发送post请求，获取cookie，带上cookie请求登陆之后的页面

这里要用到一个seesion类，seesion 实例具有的方法和requests一样，但是 seesion具有保持功能， 就类似浏览器输入一次密码之后，会自动保留cookie

seesion = requests.seesion( )

seesion.post(url, data, headers) # 服务器设置在本地的cookie会保存在本地

seesion.get(url) # 会带上之前保存在seesion中的cookie，能够请求成功

这种方法要先提交自己的账号密码，并且要找到提交的地址。那么如何找到提交地址呢？


In [11]:
#模拟登录

In [33]:
import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
response = requests.get('http://httpbin.org/cookies')
print(response.text)

{
  "cookies": {}
}



这里请求了一个测试网址，http://httpbin.org/cookies/set/number/123456789， 请求这个网址时，可以设置一个cookie,

叫做number,内容是123456789.随后又请求了http://httpbin.org/cookies， 此网站可以获取当前的cookie.但是不能成功获取到

cookie.

上面并没有获取到cookies ,因为request设置cookie和后来的获取cookie相当于打开了两个浏览器

In [63]:
import requests

s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response = s.get('http://httpbin.org/cookies')
print(response.text)

{
  "cookies": {
    "number": "123456789"
  }
}



成功获取。解决的办法就是维持同一个会话，也就是相当于打开一个新的浏览器选项卡而不是新开一个浏览器。使用session就可以。

利用Session，可以做到模拟同一个会话而不用担心Cookies的问题。它通常用于模拟登录成功之后再进行下一步的操作。

Session在平常用得非常广泛，可以用于模拟在一个浏览器中打开同一站点的不同页面，后面会有专门的章节来讲解这部分内容。

## 4、证书验证

requests还提供了证书验证的功能。当发送HTTP请求的时候，它会检查SSL证书，我们可以使用verify参数控制是否检查此证书。其实如果不加verify参数的话，默认是True，会自动验证。

前面我们提到过，12306的证书没有被官方CA机构信任，会出现证书验证错误的结果。我们现在访问它，都可以看到一个证书问题的页面，如图3-8所示。

<img src="12306.png">

现在我们用requests来测试一下：

In [67]:
#如果报错，就需要下面的操作
import requests

response = requests.get('https://www.12306.cn')
print(response.status_code)

200


如果这里提示一个错误SSLError，表示证书验证错误。所以，如果请求一个HTTPS站点，但是证书验证错误的页面时，就会报这样的错误，那么如何避免这个错误呢？很简单，把verify参数设置为False即可。

In [34]:
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn', verify=False)
print(response.status_code)

200


## 5、超时设置

在本机网络状况不好或者服务器网络响应太慢甚至无响应时，我们可能会等待特别久的时间才可能收到响应，甚至到最后收不到响应而报错。为了防止服务器不能及时响应，应该设置一个超时时间，即超过了这个时间还没有得到响应，那就报错。这需要用到timeout参数。这个时间的计算是发出请求到服务器返回响应的时间。

In [68]:
import requests
from requests.exceptions import ReadTimeout
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')

200


In [72]:
import requests
r=requests.get('https://www.taobao.com',timeout=1)
print(r.status_code)

200


## 6、认证设置

<img src="身份认证.jpg">

在访问网站时，我们可能会遇到这样的认证页面,此时可以使用requests自带的身份认证功能.如果用户名和密码正确的话，请求时就会自动认证成功，会返回200状态码，如果认证失败，则返回401状态码。

In [20]:
pip install requests_oauthlib

Collecting requests_oauthlib
  Downloading https://files.pythonhosted.org/packages/a3/12/b92740d845ab62ea4edf04d2f4164d82532b5a0b03836d4d4e71c6f3d379/requests_oauthlib-1.3.0-py2.py3-none-any.whl
Collecting oauthlib>=3.0.0 (from requests_oauthlib)
[?25l  Downloading https://files.pythonhosted.org/packages/05/57/ce2e7a8fa7c0afb54a0581b14a65b56e62b5759dbc98e80627142b8a3704/oauthlib-3.1.0-py2.py3-none-any.whl (147kB)
[K     |████████████████████████████████| 153kB 6.1kB/s eta 0:00:01
Installing collected packages: oauthlib, requests-oauthlib
Successfully installed oauthlib-3.1.0 requests-oauthlib-1.3.0
Note: you may need to restart the kernel to use updated packages.


使用requests库进行身份验证就很简单，只需设置auth参数即可。auth参数的值是一个HTTPBasicAuth对象，封装了用户名和密码。

In [None]:
import requests
from requests.auth import HTTPBasicAuth

r = requests.get('http:localhost:5000', auth=HTTPBasicAuth('username', 'password'))
print(r.status_code)
print(r.text)

## 7、异常处理

In [73]:
import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')
except ConnectionError:
    print('Connection error')
except RequestException:
    print('Error')

200


## 8、项目实战

### 项目1、京东商品页面的爬取。https://item.jd.com/2967929.html

### 项目2、亚马逊商品页面的爬取。https://www.amazon.cn/gp/product/B01M8L5Z3Y

### 项目3、百度/360搜索关键字提交。https://www.baidu.com/；  https://www.so.com/