<a href="https://colab.research.google.com/github/LIAO-JIAN-PENG/python_lecture/blob/main/dcard_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dcard API 使用

* API 網址：```https://www.dcard.tw/service/api/v2```
* 舊版 API 網址：```https://www.dcard.tw/_api```

說明	|請求方法	|路徑
---|---|---
全部文章|	GET	|/posts
看板資訊	|GET	|/forums
看板內文章列表	|GET	|/forums/{看板名稱}/posts
文章內文	|GET	|/posts/{文章ID}
文章內引用連結	|GET	|/posts/{文章ID}/links
文章內留言	|GET	|/posts/{文章ID}/comments

## 全部文章(不分類等於首頁)
```GET https://www.dcard.tw/service/api/v2/posts```
* 回傳的文章數量預設是前 30 筆，加上limit參數來限制文章數量，最多 100 筆，如下：
熱門文章前 100 筆 -> ```https://www.dcard.tw/service/api/v2/posts?popular=true&limit=100```

## 最新 & 熱門
* 預設使用 "最新" 作為排序，透過popular參數可以切換 "最新" 與 "熱門"，如下：
    - 最新文章 -> ```https://www.dcard.tw/service/api/v2/posts?popular=false```
    - 熱門文章 -> ```https://www.dcard.tw/service/api/v2/posts?popular=true```
    
## 看板資訊
```GET https://www.dcard.tw/service/api/v2/forums```

## 看板內文章列表
```GET https://www.dcard.tw/service/api/v2/forums/{看板名稱}/posts```

* [參考資料](https://blog.jiatool.com/posts/dcard_api_v2/)

In [None]:
import requests
import re

## 首頁練習

In [None]:
# 使用 dcard api 直接拿到 json 的檔案，不需要自己爬
# 切記 : 要用 headers 偽裝
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
}
url = 'https://www.dcard.tw/_api/posts'

requ = requests.get(url, headers=headers)

requ # status code

<Response [200]>

In [None]:
# 連結google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
if not os.path.exists(r'/content/drive/MyDrive/PuiPui'):
    os.mkdir(r'/content/drive/MyDrive/PuiPui')

In [None]:
import json

In [None]:
# json 存檔
data = requ.json()
with open(r'/content/drive/MyDrive/PuiPui'+'/dcard_api.json', 'w', newline='') as jsonfile:
    json.dump(data, jsonfile)

## 看板實戰
* 挑戰 puipui
```GET https://www.dcard.tw/service/api/v2/forums/puipui/posts```

In [None]:
url = 'https://www.dcard.tw/service/api/v2/forums/puipui/posts'

res = requests.get(url, headers=headers)

articles = res.json()

In [None]:
imgs = []
for article in articles[:5]:
    print('ID:', article['id'])
    print('Title:', article['title'])
    print('School:', article['school'])
    print('media:', article['media'])
    
    print()

ID: 235309104
Title: 吃飽飽⋯
School: 綺綺
media: []

ID: 235309020
Title: 哈姆太郎版天竺鼠車車
School: 夯薯好吃
media: [{'url': 'https://i.imgur.com/s9QBDTe.jpg'}]

ID: 235307571
Title: 突然想到一個問題
School: 國立高雄大學
media: [{'url': 'https://i.imgur.com/O1pmAD2.jpg'}]

ID: 235306988
Title: PUI PUI 防疫
School: 國立中央大學
media: [{'url': 'https://i.imgur.com/lWXpHTj.jpg'}]

ID: 235305829
Title: 原來天竺鼠車車是這樣演的
School: 匿名
media: [{'url': 'https://i.imgur.com/Nu9qgGw.jpg'}, {'url': 'https://i.imgur.com/4AaR5nK.jpg'}, {'url': 'https://i.imgur.com/Le7Dhrs.jpg'}, {'url': 'https://i.imgur.com/ZmIbbpP.jpg'}, {'url': 'https://i.imgur.com/gKDZiDJ.jpg'}, {'url': 'https://i.imgur.com/aDKNVP8.jpg'}, {'url': 'https://i.imgur.com/tXM1O8g.jpg'}, {'url': 'https://i.imgur.com/0QOwFDe.jpg'}, {'url': 'https://i.imgur.com/cVKa4v7.jpg'}, {'url': 'https://vivid.dcard.tw/Public/95ec71f0-cf82-4e03-8eb5-2c70e469d3d3/thumbnail.jpg'}]



In [None]:
## 下載圖檔
img_links = []
for article in articles:
    if len(article['media']):
        for img in article['media']:
            img_links.append(img['url'])
print('number of img:', len(img_links))

number of img: 38


In [None]:
import os

img_count = 0
for img_link in img_links:
    if not os.path.exists(r'/content/drive/MyDrive/PuiPui/puipui_image'):
        os.makedirs(r'/content/drive/MyDrive/PuiPui/puipui_image')
    
    img = requests.get(img_link)
    
    with open(r'/content/drive/MyDrive/PuiPui/puipui_image'+"/天竺鼠puipui_"+str(img_count)+'.jpg', 'wb') as file:
        file.write(img.content)
    img_count += 1

print("成功爬取")

成功爬取


In [None]:
import requests
import re
import os
import json

url = 'https://www.dcard.tw/service/api/v2/forums/puipui/posts'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
}
res = requests.get(url, headers=headers)
articles = res.json()# json 格式

if not os.path.exists(r'/content/drive/MyDrive/PuiPui/puipui_image'):
  os.makedirs(r'/content/drive/MyDrive/PuiPui/puipui_image')

# 下載圖檔
img_count = 0
for article in articles:
  if len(article['media']):
    for img in article['media']:
      img = requests.get(img['url'])
      with open(r'/content/drive/MyDrive/PuiPui/puipui_image'+"/天竺鼠puipui_"+str(img_count)+'.jpg', 'wb') as file:
        file.write(img.content)
      img_count += 1