# **社群網路分析起手式，自己的爬蟲自己寫 | 極速上手體驗課**

In [None]:
# pip3 install selenium pandas bs4

## **01. 啟動模擬瀏覽器**

`profile.default_content_setting_values.notifications`：這是 Chrome 設定中負責控制網站通知行為的部分。
- 1：允許對於網站發出的通知。

- 2：阻擋通知。

![notifications](./asset/image/notification.png)

In [114]:
# 瀏覽器設定：允許所有網站通知行為

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()

# 允許所有網站通知行為
chrome_options.add_experimental_option(
    "prefs", 
    {
        "profile.default_content_setting_values.notifications": 1
    }
)

driver = webdriver.Chrome(options = chrome_options)


### **設定視窗大小**

In [115]:
driver.set_window_size(800, 800)

### **跳轉到指定頁面**

In [116]:
login_url = 'https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&next=https%3A%2F%2Fwww.facebook.com'
driver.get(login_url)

## **02. Facebook 登入**

In [118]:
# pip3 install python-dotenv

from dotenv import load_dotenv, find_dotenv
import os

load_dotenv(find_dotenv())

facebook_mail = os.environ.get("FACEBOOK_MAIL")
facebook_password = os.environ.get("FACEBOOK_PASSWORD")

# facebook_mail = "Your_email_address"
# facebook_password = "Your_password"

| 方法           | 用途                                        | 使用方法                                                     |
|----------------|---------------------------------------------|-------------------------------------------------------------|
| `find_element` | 用於查找單個網頁元素                        | ```element = driver.find_element(By.ID, "id")``` <br> 支援的查找方式：<br> - `By.ID`<br> - `By.NAME`<br> - `By.CLASS_NAME`<br> - `By.TAG_NAME`<br> - `By.LINK_TEXT`<br> - `By.PARTIAL_LINK_TEXT`<br> - `By.CSS_SELECTOR`<br> - `By.XPATH`                         |
| `send_keys`    | 用於向元素（如輸入框）輸入文字              | ```element.send_keys("text")```                |
| `click`        | 用於點擊元素（如按鈕、連結）                | ```element.click()```                          |


In [119]:
from selenium.webdriver.common.by import By

# 定位帳號密碼輸入框的元素
email_element = driver.find_element(By.CSS_SELECTOR,'#email_container input')
password_element = driver.find_element(By.CSS_SELECTOR,'._55r1._1kbt input')

# 輸入 Input
email_element.send_keys(facebook_mail)
password_element.send_keys(facebook_password)


In [120]:
login_button = driver.find_element(By.CSS_SELECTOR, '#loginbutton')

# 點擊登入按鈕
login_button.click()

## **03. 爬取貼文內容**

In [122]:
url = 'https://www.facebook.com/groups/510448169724216'

driver.get(url)

### **頁面滾動**

`execute_script` 可以幫助我們在 Webdriver 上執行 Javascript，做出更細緻、進階的操作，包含但不限於：頁面滾動、處理彈出窗口、修改網頁元素。

In [123]:


driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")



In [124]:
import time

counter = 0
while counter <= 3:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # 等待 0.5 秒（確保網頁節點有載入，可根據需求調整等待時間）
    time.sleep(0.5)
    counter += 1

### **解析網頁內容**

In [125]:
from bs4 import BeautifulSoup

page_content = driver.page_source

# 因為要解析的對象是 HTML，所以需指定 html.parser 解析器
soup = BeautifulSoup(page_content, 'html.parser')

#### **抓取每一篇貼文元素**

In [127]:
elements = soup.select('.x1yztbdb.x1n2onr6.xh8yej3.x1ja2u2z')

# 輸出隨便一篇貼文元素看看
elements[1]

<div class="x1yztbdb x1n2onr6 xh8yej3 x1ja2u2z"><div class="x1n2onr6 x1ja2u2z"><div class=""><div class=""><div aria-describedby=":Rahdltbjblalat5bb9l5qq9papd5aqH1: :Rahdltbjblalat5bb9l5qq9papd5aqH2: :Rahdltbjblalat5bb9l5qq9papd5aqH3: :Rahdltbjblalat5bb9l5qq9papd5aqH5: :Rahdltbjblalat5bb9l5qq9papd5aqH4:" aria-labelledby=":R1akratbjblalat5bb9l5qq9papd5aq:" aria-posinset="1" class="x1a2a7pz" role="article"><div class="x78zum5 xdt5ytf"><div class="x9f619 x1n2onr6 x1ja2u2z"><div class="x78zum5 x1n2onr6 xh8yej3"><div class="x9f619 x1n2onr6 x1ja2u2z x1jx94hy x1qpq9i9 xdney7k xu5ydu1 xt3gfkd xh8yej3 x6ikm8r x10wlt62 xquyuld" style="border-radius:max(0px, min(var(--card-corner-radius), calc((100vw - 4px - 100%) * 9999))) / var(--card-corner-radius)"><div><div></div><div class="x1s85apg" data-0="0" data-1="1" data-10="10" data-11="11" data-12="12" data-13="13" data-14="14" data-15="15" data-16="16" data-17="17" data-18="18" data-19="19" data-2="2" data-3="3" data-4="4" data-5="5" data-6="6" dat

#### **貼文作者**

In [129]:
for element in elements:
    try:
        name = element.select('.xt0psk2 span')[0].text
        print(name)
    except:
        continue

蝦吃綺
李凱新
廖豆豆
Coca Chiu
Vincent Cheng
Kelly Lin
Pat Rick
Yeu Tsern Shieh


#### **按讚數**

In [130]:
for element in elements:
    try:
        like = element.select('.xt0b8zv.x2bj2ny.xrbpyxo.xl423tq span.x1e558r4')[0].text
        print(like)
    except:
        continue

116
85
12
49
72
140
172
107


#### **留言數**

In [131]:
for element in elements:
    try:
        comment = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[2].text
        print(comment)
    except:
        continue

14則留言
8則留言
5則留言
13則留言
11則留言
42則留言
26則留言


#### **分享數**

In [132]:
for element in elements:
    try:
        share = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[3].text
        print(share)
    except:
        continue

1 次分享
1 次分享
2次分享
3次分享
13次分享
3次分享


In [133]:
import re

for element in elements:
    try:
        link = element.select('.x1i10hfl.xjbqb8w.x1ejq31n.xd10rxx.x1sy0etr.x17r0tee.x972fbf.xcfux6l.x1qhh985.xm0m39n.x9f619.x1ypdohk.xt0psk2.xe8uvvx.xdj266r.x11i5rnm.xat24cr.x1mh8g0r.xexx8yu.x4uap5.x18d9i69.xkhd6sd.x16tdsg8.x1hl2dhg.xggy1nq.x1a2a7pz.x1sur9pj.xkrqix3.xi81zsa.xo1l8bm')[0].get('href')
        # 正規表達式
        base_url = re.match(r"(https://www\.facebook\.com/groups/510448169724216/posts/\d+)", link)

        if base_url:
            print(base_url.group(1))
        else:
            print(link)
    except:
        continue

https://www.facebook.com/groups/510448169724216/posts/1733678310734523
https://www.facebook.com/groups/510448169724216/posts/1732474707521550
https://www.facebook.com/groups/510448169724216/posts/1733518524083835
https://www.facebook.com/groups/510448169724216/posts/1733378160764538
https://www.facebook.com/groups/510448169724216/posts/1733553267413694
https://www.facebook.com/groups/510448169724216/posts/1733448197424201
https://www.facebook.com/groups/510448169724216/posts/1733448907424130
https://www.facebook.com/groups/510448169724216/posts/1732208847548136


In [134]:


for element in elements:
    try:
        article = element.select_one('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xzsf02u.x1yc453h')
        print(article.get_text())
    except:
        continue



#新北吃到飽#Fun鍋子5/19星期日這次回訪fun鍋子，多了火鍋囉，而且提供的不是一般清湯鍋，…… 查看更多
#台北吃到飽#彩日本料理#假日晚餐今天跟朋友一起來吃…… 查看更多
最近看一下評論有人說探索廚房品質變差了，因為家人有訂想問一下意見如果同樣價格還有哪家不錯的在台北
#高雄吃到飽#梨花A5和牛放題#高雄帝王蟹吃到飽5月20日午餐吃神戶之宴$1799/位…… 查看更多
#泰國吃到飽#芭達雅Terminal 21 6F SHABUSHI 泰式旋轉小火鍋 餐價399+7% =427THB…… 查看更多
#台北吃到飽#欣葉日式料理520是個大日子，當然要去buffet開心吃到飽。自從1月公司部門聚餐在欣葉新光三越A11, 我又重拾對它的信心，因爲前兩年菜色縮水得令人嘆息，很久不再光顧。四月帶爸媽去也讓長輩滿意，今天是第六訪了～…… 查看更多
#台北吃到飽#好食多涮涮屋第一次朝聖好食多，因為同行有人不吃牛，所以選擇$499價位試水溫。…… 查看更多


## **04. 資料儲存**

In [136]:
import pandas as pd

name_list, like_list, comment_list, share_list, link_list, article_list = [], [], [], [], [], []

for element in elements:
    # 貼文作者
    try:
        name = element.select('.xt0psk2 span')[0].text
        name_list.append(name)
    except:
        continue

    # 按讚數
    try:
        like = element.select('.xt0b8zv.x2bj2ny.xrbpyxo.xl423tq span.x1e558r4')[0].text
        like_list.append(like)
    except:
        like_list.append(0)

    # 評論數
    try:
        comment = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[2].text
        comment_list.append(comment)
    except:
        comment_list.append(0)

    # 分享數
    try:
        share = element.select('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xi81zsa')[3].text
        share_list.append(share)
    except:
        share_list.append(0)

    # 連結
    try:
        link = element.select('.x1i10hfl.xjbqb8w.x1ejq31n.xd10rxx.x1sy0etr.x17r0tee.x972fbf.xcfux6l.x1qhh985.xm0m39n.x9f619.x1ypdohk.xt0psk2.xe8uvvx.xdj266r.x11i5rnm.xat24cr.x1mh8g0r.xexx8yu.x4uap5.x18d9i69.xkhd6sd.x16tdsg8.x1hl2dhg.xggy1nq.x1a2a7pz.x1sur9pj.xkrqix3.xi81zsa.xo1l8bm')[0].get('href')
        # 正規表達式
        base_url = re.match(r"(https://www\.facebook\.com/groups/510448169724216/posts/\d+)", link)

        if base_url:
            link_list.append(base_url.group(1))
        else:
            link_list.append(link)
    except:
        link_list.append('')

    # 貼文內容
    try:
        article = element.select_one('.x193iq5w.xeuugli.x13faqbe.x1vvkbs.x1xmvt09.x1lliihq.x1s928wv.xhkezso.x1gmr53x.x1cpjm7i.x1fgarty.x1943h6x.xudqn12.x3x7a5m.x6prxxf.xvq8zen.xo1l8bm.xzsf02u.x1yc453h')
        article_list.append(article.get_text())
    except:
        article_list.append('')

df = pd.DataFrame({
    '作者': name_list,
    '按讚數': like_list,
    '評論數': comment_list,
    '分享數': share_list,
    '連結': link_list,
    '貼文內容': article_list
})
df

Unnamed: 0,作者,按讚數,評論數,分享數,連結,貼文內容
0,蝦吃綺,116,14則留言,1 次分享,https://www.facebook.com/groups/51044816972421...,#新北吃到飽#Fun鍋子5/19星期日這次回訪fun鍋子，多了火鍋囉，而且提供的不是一般清湯...
1,李凱新,85,8則留言,0,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#彩日本料理#假日晚餐今天跟朋友一起來吃…… 查看更多
2,廖豆豆,12,0,0,https://www.facebook.com/groups/51044816972421...,最近看一下評論有人說探索廚房品質變差了，因為家人有訂想問一下意見如果同樣價格還有哪家不錯的在台北
3,Coca Chiu,49,5則留言,1 次分享,https://www.facebook.com/groups/51044816972421...,#高雄吃到飽#梨花A5和牛放題#高雄帝王蟹吃到飽5月20日午餐吃神戶之宴$1799/位…… ...
4,Vincent Cheng,72,13則留言,2次分享,https://www.facebook.com/groups/51044816972421...,#泰國吃到飽#芭達雅Terminal 21 6F SHABUSHI 泰式旋轉小火鍋 餐價39...
5,Kelly Lin,140,11則留言,3次分享,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#欣葉日式料理520是個大日子，當然要去buffet開心吃到飽。自從1月公司部門...
6,Pat Rick,172,42則留言,13次分享,https://www.facebook.com/groups/51044816972421...,
7,Yeu Tsern Shieh,107,26則留言,3次分享,https://www.facebook.com/groups/51044816972421...,#台北吃到飽#好食多涮涮屋第一次朝聖好食多，因為同行有人不吃牛，所以選擇$499價位試水溫。...
