Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Study] PChome 商品資訊 - 爬蟲 #11

Closed
chen-tf opened this issue Dec 4, 2022 · 5 comments
Closed

[Study] PChome 商品資訊 - 爬蟲 #11

chen-tf opened this issue Dec 4, 2022 · 5 comments
Assignees

Comments

@chen-tf
Copy link
Owner

chen-tf commented Dec 4, 2022

身為 使用者
我希望 有更多的電商平台商品可以收藏
如此 我可以有更多的機會買到便宜商品

背景

現行只有支援 momo shop,需要知道 PChome 是否也有可以單靠 http request,不用 javascript render 就可以得到商品狀態的方法

Definition of Done

可以透過 PChome 商品頁 URL 得到以下資訊

  1. 商品名稱
  2. 價格
  3. 上/下架狀態
@zhihdd zhihdd self-assigned this Dec 4, 2022
@t1ina2003
Copy link

https://24h.pchome.com.tw/prod/DHAEDE-1900FFWUE 為例

import requests

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"}
id = "DHAEDE-1900FFWUE" #商品ID, 可從url獲得
url = f"https://ecapi-cdn.pchome.com.tw/ecshop/prodapi/v2/prod?id={id}&fields=Seq,Id,Name,Price,SaleStatus"
result1 = requests.get(url, headers)
print(result1.text.replace("\\/","/").encode('utf-8').decode('unicode_escape')) #decode時避開slash,unicode轉中文

url2 = f"https://ecapi-cdn.pchome.com.tw/ecshop/prodapi/v2/prod/button&id={id}&fields=Seq,Id,Name,Price,Qty,ButtonType,SaleStatus,isPrimeOnly,SpecialQty,Device"
result2 = requests.get(url2, headers)
print(result2.text.replace("\\/","/").encode('utf-8').decode('unicode_escape')) #decode時避開slash,unicode轉中文
result1
{
  "DHAEDE-1900FFWUE-000": {
    "Seq": 33716849,
    "Id": "DHAEDE-1900FFWUE-000",
    "Name": "ACER Swift 5 SF514-55T-54WK 綠(i5-1135G7/8G/512G PCIe/W11/FHD/14)", #商品名稱
    "Price": { 
        "M": 33900, # 原價
        "P": 24900,  # 特價
        "Low": null, 
        "Prime": "" 
    }
  }
}
result2
{
    "Seq": 33716849,
    "Id": "DHAEDE-1900FFWUE-000",
    "Price": { "M": 33900, "P": 24900, "Prime": "", "Low": null },
    "Qty": 20,
    "ButtonType": "ForSale",
    "SaleStatus": 1, #上下架狀態, 0為下架
    "isPrimeOnly": 0,
    "SpecialQty": 0,
    "Device": []
  }
  
  1. 沒辦法同一個api就搞定三個項目
  2. DHAEDE-A900DZY0H為下架商品, 供測試, 會無法抓取商品名稱

@chen-tf
Copy link
Owner Author

chen-tf commented Dec 17, 2022

這樣前面我們需要一個 PCHome URL parser 來取得網址中的商品 ID,網址來源可能會有以下幾種

  • PChome WEB URL
  • PChome APP URL

@zhihdd
Copy link

zhihdd commented Dec 17, 2022

因為python 很不熟

  • 但商品資訊可以直接透過requests的套件內建的method "json()",直接將pchome的response 轉成json,就能來用了
  • 另外覺得不用編碼處理, 解碼的的事交給client 應該就可以?這點需要幫忙確認一下
import requests

url = "https://ecapi-cdn.pchome.com.tw/ecshop/prodapi/v2/prod?id=DYAT1K-A900FLZ63-000&&fields=Seq,Id,Name,Price,SaleStatus"
result1 = requests.get(url)
res= result1.json()
print(res)

response

{
  "DYAT1K-A900FLZ63-000": {
    "Seq": 34094219,
    "Id": "DYAT1K-A900FLZ63-000",
    "Name": "Google Pixel 7 Pro (12G/256G) 曜石黑",
    "Price": {
      "M": 0,
      "P": 28990,
      "Low": "None",
      "Prime": ""
    }
  }
}

最後再補充

  • querystring 中 "fields" 可以輸入需要的fields,可利用此降低回傳的大小,避免把pchome打掛
    ex:
fields=Seq,Id,Name,Nick,Store,PreOrdDate,SpeOrdDate,Price,Discount,Pic,Weight,ISBN

to

fields=Seq,Id,Name
  • Qty == 0 的時候pchome 的pchome 商品畫面會顯示 "缺貨" ,這點應該是要秀給使用者的?
  • 上面提到的 "SaleStatus" 這個field 並不是每一個商品都有value,所以不太能當作穩定的判斷依據

@t1ina2003
Copy link

  • 上面提到的 "SaleStatus" 這個field 並不是每一個商品都有value,所以不太能當作穩定的判斷依據

找時間來掃一下商品列表, 看看還有哪個可以當作上下架狀態.


關於 PCHome URL parser , 兩種string都ok.

from urllib.parse import urlparse

PCHome_web_url = "https://24h.pchome.com.tw/prod/DYAJIB-1900BZ121"
PCHome_app_url = """Apple Watch SE GPS, 44mm Silver Aluminium Case with Abyss Blue Sport Band

https://24h.pchome.com.tw/prod/DYAJIB-1900BZ121"""

def pchomeUrlParser(url: str) -> str:
  ''' extract productID from url '''
  parts = urlparse(PCHome_APP_url)
  directories = parts.path.strip('/').split('/')
  productID = directories[-1]
  return productID

print(pchomeUrlParser(PCHome_WEB_url))
print(pchomeUrlParser(PCHome_APP_url))

# output:
# DYAJIB-1900BZ121
# DYAJIB-1900BZ121

@chen-tf
Copy link
Owner Author

chen-tf commented Dec 22, 2022

感謝 @t1ina2003 @zhihdd 的 study 結果,接下來會關閉這張 study ticket,另開 feature ticket 進行實作的部分。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants