# 动态网页爬取：AJAX

## 要求

爬取 https://www.ptpress.com.cn/ 下的新书推荐部分的**成功/励志**栏目书籍，具体内容包括：书名、价格和作者。

### 1. 导入基本的库

In [1]:
import requests
import json

### 2. 构造前置请求

In [2]:
BookId_URL = "https://www.ptpress.com.cn/recommendBook/getRecommendBookListForPortal?bookTagId=e03b1ec7-466e-484c-865c-6738989e306a"
detail_URL = "https://www.ptpress.com.cn/bookinfo/getBookDetailsById"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44'}

### 3. 对BookId URL 发送请求

In [4]:
book_id_package = requests.get(url=BookId_URL, headers=headers, timeout=3)
if book_id_package.status_code == requests.codes.ok:
    print("SUCCESS")
else:
    print("FAIL")

SUCCESS


### 4. 得到所有的 bookId

In [6]:
BookJson = json.loads(book_id_package.content)
BookData = BookJson["data"]
bookId = []
# 通过遍历得到所有的ID
for book in BookData:
    bookId.append(book['bookId'])
bookId

['c4b12d98-f6b6-4038-b0cf-24c6ca8edbf4',
 '2d9b2c02-b493-4fa1-9a12-2667adb8a4c9',
 '5aab5283-0dda-47c4-9f6c-6da1e788f2aa',
 'dd3b8d3c-e1e3-4c8c-aab1-78207e745674',
 '490a68c6-d92b-45cf-86a2-ccd3aa65a3b3',
 'd1442930-e86f-456f-8c71-281f991fe70a',
 '74fa5f56-47d1-4af4-a140-795da52a5718',
 '7eee68e2-b318-4fa6-ac80-8d1c95e5694f',
 'abde92cc-d7d1-4dbf-a9d6-9e975519ded5',
 'f1925264-de0c-490c-b158-e43ecb2176fe']

### 5. 构造POST请求函数 与 JSON解析函数

In [7]:
def req_post(url, data):
    data = {
        'bookId': data
    }
    response = requests.post(url, headers=headers, data=data, timeout=3)
    if response.status_code == requests.codes.ok:
        print("SUCCESS")
        return response
    else:
        print("FAIL")
        return None
    
def parse(response):
    json_parse = json.loads(response.content)['data']
    # 得到书名、价格和作者
    book_name = json_parse['bookName']
    author = json_parse['author']
    price = json_parse['discountPrice']
    return book_name, author, price

### 6. 迭代所有的bookId爬取所有图书数据

In [8]:
book_name_ls = []
author_ls = []
price_ls = []
for b_id in bookId:
    # 请求
    response = req_post(detail_URL, b_id)
    # 解析
    book_name, author, price = parse(response)
    book_name_ls.append(book_name)
    author_ls.append(author)
    price_ls.append(price)
    
book_name_ls

SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS
SUCCESS


['当钻牛角尖遇到四象限：通透青年的茫然破局指南',
 '烧掉你的船：将焦虑转化为积极行动的9个策略',
 '批判性思维入门：30天学会独立思考',
 '认知觉醒：开启自我改变的原动力（百万册精装纪念版）',
 '人生歪理 歪得很有道理',
 '清晰思考：将平凡时刻转化为非凡成果',
 '前方高能',
 '哈佛高效学习法',
 '图解一切问题：培养图形思维，掌握图形工具',
 '百名院士的入党心声']

### 7. 存储为CSV文件

In [9]:
import pandas as pd

data = {
    'bookName': book_name_ls,
    "author": author_ls,
    "price": price_ls
}

df = pd.DataFrame(data)
df.to_csv("励志-励志.csv", index=True)