# **社群網路分析起手式，自己的爬蟲自己寫 | 極速上手體驗課**

In [None]:
# pip3 install selenium pandas bs4

## **01. 啟動模擬瀏覽器**

`profile.default_content_setting_values.notifications`：這是 Chrome 設定中負責控制網站通知行為的部分。
- 1：允許對於網站發出的通知。

- 2：阻擋通知。

![notifications](./asset/image/notification.png)

In [1]:
# 瀏覽器設定：允許所有網站通知行為

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()

# 允許所有網站通知行為
chrome_options.add_experimental_option(
    "prefs", 
    {
        "profile.default_content_setting_values.notifications": 1
    }
)

driver = webdriver.Chrome(options = chrome_options)

### **設定視窗大小**

In [2]:
driver.set_window_size(800, 800)

### **跳轉到指定頁面**

In [3]:
login_url = 'https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&next=https%3A%2F%2Fwww.facebook.com'
driver.get(login_url)

## **02. Facebook 登入**

In [4]:
# pip3 install python-dotenv

from dotenv import load_dotenv, find_dotenv
import os

load_dotenv(find_dotenv())

facebook_mail = os.environ.get("FACEBOOK_MAIL")
facebook_password = os.environ.get("FACEBOOK_PASSWORD")

# facebook_mail = "Your_email_address"
# facebook_password = "Your_password"

| 方法           | 用途                                        | 使用方法                                                     |
|----------------|---------------------------------------------|-------------------------------------------------------------|
| `find_element` | 用於查找單個網頁元素                        | ```element = driver.find_element(By.ID, "id")``` <br> 支援的查找方式：<br> - `By.ID`<br> - `By.NAME`<br> - `By.CLASS_NAME`<br> - `By.TAG_NAME`<br> - `By.LINK_TEXT`<br> - `By.PARTIAL_LINK_TEXT`<br> - `By.CSS_SELECTOR`<br> - `By.XPATH`                         |
| `send_keys`    | 用於向元素（如輸入框）輸入文字              | ```element.send_keys("text")```                |
| `click`        | 用於點擊元素（如按鈕、連結）                | ```element.click()```                          |


In [5]:
from selenium.webdriver.common.by import By

# 定位帳號密碼輸入框的元素
email_element = driver.find_element(By.CSS_SELECTOR,'#email_container input')
password_element = driver.find_element(By.CSS_SELECTOR,'._55r1._1kbt input')

# 輸入 Input
email_element.send_keys(facebook_mail)
password_element.send_keys(facebook_password)

In [6]:
login_button = driver.find_element(By.CSS_SELECTOR, '#loginbutton')

# 點擊登入按鈕
login_button.click()

## **03. 爬取貼文內容**

In [7]:
url = 'https://www.facebook.com/groups/510448169724216'

driver.get(url)

### **頁面滾動**

`execute_script` 可以幫助我們在 Webdriver 上執行 Javascript，做出更細緻、進階的操作，包含但不限於：頁面滾動、處理彈出窗口、修改網頁元素。

In [22]:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

In [24]:
import time

counter = 0
while counter <= 3:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # 等待 0.5 秒（確保網頁節點有載入，可根據需求調整等待時間）
    time.sleep(0.5)
    counter += 1

### **解析網頁內容**

In [45]:
from bs4 import BeautifulSoup

page_content = driver.page_source

# 因為要解析的對象是 HTML，所以需指定 html.parser 解析器
soup = BeautifulSoup(page_content, 'html.parser')

#### **抓取每一篇貼文元素**

In [46]:
elements = soup.select('.x1yztbdb.x1n2onr6.xh8yej3.x1ja2u2z')

# 輸出隨便一篇貼文元素看看
elements[3]

<div class="x1yztbdb x1n2onr6 xh8yej3 x1ja2u2z"><div class="x1n2onr6 x1ja2u2z"><div class=""><div class=""><div aria-describedby=":rn6: :rn7: :rn8: :rna: :rn9:" aria-labelledby=":rn5:" aria-posinset="3" class="x1a2a7pz" role="article"><div class="x78zum5 xdt5ytf" style="min-height: 1035.25px;"><div class="x9f619 x1n2onr6 x1ja2u2z x1s85apg" hidden=""></div></div></div></div></div></div></div>

#### **貼文作者**

In [39]:
for element in elements:
    try:
        name = element.select('.xt0psk2 span')[0].text
        print(name)
    except:
        continue

李凱新
蝦吃綺
Amber Hsueh
Riona Huang
Mei-ling Lin
龍龍
劉庭瑋


#### **按讚數**

In [29]:
for element in elements:
    try:
        like = element.select('.xt0b8zv.x2bj2ny.xrbpyxo.xl423tq span.x1e558r4')[0].text
        print(like)
    except:
        continue

42
132
86
88
51
85
58


#### **留言數**

In [30]:
for element in elements:
    try:
        comment = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[2].text
        print(comment)
    except:
        continue

11則留言
11則留言
18則留言
7則留言
14則留言
14則留言
10則留言


#### **分享數**

In [31]:
for element in elements:
    try:
        share = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[3].text
        print(share)
    except:
        continue

1 次分享
3次分享
4次分享
1 次分享


In [32]:
import re

for element in elements:
    try:
        link = element.select('.x1i10hfl.xjbqb8w.x1ejq31n.xd10rxx.x1sy0etr.x17r0tee.x972fbf.xcfux6l.x1qhh985.xm0m39n.x9f619.x1ypdohk.xt0psk2.xe8uvvx.xdj266r.x11i5rnm.xat24cr.x1mh8g0r.xexx8yu.x4uap5.x18d9i69.xkhd6sd.x16tdsg8.x1hl2dhg.xggy1nq.x1a2a7pz.x1sur9pj.xkrqix3.xi81zsa.xo1l8bm')[0].get('href')
        # 正規表達式
        base_url = re.match(r"(https://www\.facebook\.com/groups/510448169724216/posts/\d+)", link)

        if base_url:
            print(base_url.group(1))
        else:
            print(link)
    except:
        continue

https://www.facebook.com/groups/510448169724216/posts/1731560984279589
https://www.facebook.com/groups/510448169724216/posts/1731627564272931
https://www.facebook.com/groups/510448169724216/posts/1731731210929233
https://www.facebook.com/groups/510448169724216/posts/1731469790955375
https://www.facebook.com/groups/510448169724216/posts/1731274060974948
https://www.facebook.com/groups/510448169724216/posts/1731271610975193
https://www.facebook.com/groups/510448169724216/posts/1731320377636983


In [33]:
for element in elements:
    try:
        article = element.select_one('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xzsf02u.x1yc453h')
        print(article.get_text())
    except:
        continue

#台北吃到飽#MO-MO-PARADISE台北中山牧場#平日午餐金主想吃和牛想到附近的momo有出和牛饗晏就帶來吃了點菜依然使用QRcode每次最多可以點8種前菜的和牛握壽司肉質不錯但飯吃多容易飽肉品只點了和牛前腿心x6和牛三叉肉x6頂級自然牛x3其他肉品沒點2種和牛都好吃多汁香甜建議5-7分熟其他肉可忽略一品料理點的是軟殼蟹生春捲顧名思義就是生春捲包生菜和軟殼蟹生菜有點苦解膩還可以甜點的布丁口感不錯就是黑糖有點苦今天消費$0(感謝金主、讚嘆金主)
#新北吃到飽#蘆洲#發辣X鴛鴦麻辣火鍋5/16日本純血和牛吃到飽只要798元…… 查看更多
#台北吃到飽#君悅酒店#彩日本料理好久沒來了，超想念彩肥美的烤魚和我心中排名第一的握壽司，週五晚上用餐客人不多，可以悠哉取餐聊天真好！…… 查看更多
#台北吃到飽#彩日本料理晚餐每次要來彩吃飯就充滿期待這裡雖然不大餐點也不算多…… 查看更多
#台北吃到飽 #爭厚厚切牛排士林店被麵包冰品耽誤的平價牛排館！厚切豬和脆皮雞都很不錯，最便宜排餐250元起，兒童排餐180元，就有超美味的大蒜奶油吐司及霜淇淋吃到飽！飲料湯品也是喝到飽~菜色體驗在此，歡迎愛吃鬼們訂閱交流美食：https://youtu.be/gv52Pucz70I?si=uUaX_0diY187iArL
#台中吃到飽#自由小火鍋一個人360不用服務費單人不加錢…… 查看更多
#新北吃到飽#野人shabu林口店#高級冷藏肉品無限用餐價位/＄799+10%、999+10%…… 查看更多


## **04. 資料儲存**

In [47]:
import pandas as pd

name_list, like_list, comment_list, share_list, link_list, article_list = [], [], [], [], [], []

for element in elements:
    # 貼文作者
    try:
        name = element.select('.xt0psk2 span')[0].text
        name_list.append(name)
    except:
        continue

    # 按讚數
    try:
        like = element.select('.xt0b8zv.x2bj2ny.xrbpyxo.xl423tq span.x1e558r4')[0].text
        like_list.append(like)
    except:
        like_list.append(0)

    # 評論數
    try:
        comment = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[2].text
        comment_list.append(comment)
    except:
        comment_list.append(0)

    # 分享數
    try:
        share = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[3].text
        share_list.append(share)
    except:
        share_list.append(0)

    # 連結
    try:
        link = element.select('.x1i10hfl.xjbqb8w.x1ejq31n.xd10rxx.x1sy0etr.x17r0tee.x972fbf.xcfux6l.x1qhh985.xm0m39n.x9f619.x1ypdohk.xt0psk2.xe8uvvx.xdj266r.x11i5rnm.xat24cr.x1mh8g0r.xexx8yu.x4uap5.x18d9i69.xkhd6sd.x16tdsg8.x1hl2dhg.xggy1nq.x1a2a7pz.x1sur9pj.xkrqix3.xi81zsa.xo1l8bm')[0].get('href')
        # 正規表達式
        base_url = re.match(r"(https://www\.facebook\.com/groups/510448169724216/posts/\d+)", link)

        if base_url:
            link_list.append(base_url.group(1))
        else:
            link_list.append(link)
    except:
        link_list.append('')

    # 貼文內容
    try:
        article = element.select_one('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xzsf02u.x1yc453h')
        article_list.append(article.get_text())
    except:
        article_list.append('')

df = pd.DataFrame({
    '作者': name_list,
    '按讚數': like_list,
    '評論數': comment_list,
    '分享數': share_list,
    '連結': link_list,
    '貼文內容': article_list
})
df

Unnamed: 0,作者,按讚數,評論數,分享數,連結,貼文內容
0,李凱新,42,11,0,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#MO-MO-PARADISE台北中山牧場#平日午餐金主想吃和牛想到附近的mom...
1,Yumimika Cheng,177,37則留言,1 次分享,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#大倉久和歐風館#中餐時段今天來大倉久和歐風館吃中餐，要去拿爐烤牛排的時候找不到...
2,王作順,62,8則留言,1 次分享,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#三創高麗園好久沒來三創高麗園了本月生日優惠打8折每次來高麗園都以副食為主 ……...
3,黃威力,61,9則留言,3次分享,https://www.facebook.com/groups/51044816972421...,#桃園吃到飽#甩鍋雞韓式炒雞 桃園餐廳 - 甩鍋雞 韓式炒雞 吃到飽 att店 之…… 查看更多
4,龍龍,100,18則留言,2次分享,https://www.facebook.com/groups/51044816972421...,#台中吃到飽#蓮荷創意蔬食百匯餐廳素食吃到飽餐廳蠻多異國料理，看得出廚師的用心…… 查看更多
5,Iver Ever,191,2次分享,0,https://www.facebook.com/groups/51044816972421...,
6,裴宇翔,253,46則留言,0,https://www.facebook.com/groups/51044816972421...,#裴社長 #澳門吃到飽 #macau今天晚上要吃哪一間Buffet龍蝦 生蠔 生猛海鮮. 我...
7,朱投王,152,25則留言,3次分享,https://www.facebook.com/groups/51044816972421...,"#台北吃到飽#好食多大肉盤終於輪到我來吃了, 今天主攻日本F1國產和牛, 因為點餐系統當機 ..."
8,Hui CW,67,5則留言,0,https://www.facebook.com/groups/51044816972421...,#香港吃到飽#國泰航空貴賓室 The Deck這個貴賓室除了擔擔麵以外最特別的就是還有叻沙麵...
9,陳菁菁,467,63則留言,29次分享,https://www.facebook.com/groups/51044816972421...,20240512 泰蝦樂，首訪，流水道泰國蝦吃到飽，假日晚餐#桃園吃到飽#泰蝦樂#泰國蝦自由...
