# Facebook爬蟲
這裡簡單介紹一下爬蟲的邏輯再開始爬進入正題吧！
- 爬取文章內容
 - 貼文人ID
 - 貼文人暱稱
 - 貼文時間
 - 貼文類型
 - 貼文內容
 - 心情：分為Like, Haha, Sad, Wow, Angry...etc.
 - 留言數
 - 分享數
 - 貼文連結


- 爬取留言內容
 - 留言人ID
 - 留言人暱稱
 - 留言時間
 - 回覆對象ID
 - 回覆對象暱稱
 - 貼文連結
 

# 載入使用套件

In [1]:
import pandas as pd
import re, time, requests, datetime, gc
from selenium import webdriver
from bs4 import BeautifulSoup

# 截取塗鴉墻貼文資訊並比對先前資料
如果每次都把所有貼文抓下來，無疑會浪費程式效能，在這裡我將載入先前的資料，並將各篇貼文分成以下三類：

- **新貼文**：截取貼文內容與留言
- **舊貼文但有新留言**：截取所有留言但只將新留言加入資料庫
- **舊貼文也無新留言**：不處理

可想而知，如果設定排程每天執行，大多數資料都會是「舊貼文也無新留言」的類型，加入這個判斷機制將大幅節省系統效能

In [2]:
# 點擊不要現在註冊帳號
def clickNotNow():
    try:
        driver.find_element_by_xpath('//a[@id="expanding_cta_close_button"]').click()
    except:
        time.sleep(0.5)

# 截取塗鴉墻上貼文的留言數
def GetWall_PostCommentCounts(i):
    try:
        CommentCounts = i.find('a', {'data-testid':'UFI2CommentsCount/root'}).text.split(' ',2)[0]
        if 'K' in CommentCounts:
            CommentCounts = int(float(CommentCounts.split('K')[0])*1000)
        else:
            CommentCounts = int(CommentCounts)
    except:
        CommentCounts = 0
    return CommentCounts

# 截取塗鴉墻上貼文的連結
def GetWall_PostLink(i):
    Link = 'https://www.facebook.com' + i.find('a',{'class':'_5pcq'}).attrs['href'].split('?',2)[0]
    return Link

# 截取塗鴉墻上貼文的發佈時間
def GetWall_PostTime(i):
    try:
        Time = i.find('abbr').attrs['title']
        Time = datetime.datetime.strptime(Time, '%m/%d/%y, %I:%M %p')
        Time = Time.strftime("%Y-%m-%d %H:%M")
    except:
        Time = 'Not Post'
    return Time

def CarwlList(urls, n, Posts):
    CheckList = pd.DataFrame(columns = ['Link','Time','CommentCounts']) 
    for i in urls:
        driver.get(i)        
        for i in range(n):
            try:
                time.sleep(1)
                driver.find_element_by_css_selector('a.pam.uiBoxLightblue.uiMorePagerPrimary').click() # 加載更多貼文的按鈕
            except:
                time.sleep(1)
            # 這裡會跳出要我們登入的大畫面，找到「稍後再說」的按鈕並點擊
            clickNotNow()
            soup = BeautifulSoup(driver.page_source)
            for i in soup.find_all('div', {'class':'_5pcr userContentWrapper'}):
                CheckList = pd.concat([CheckList, pd.DataFrame(data = [{'Link':GetWall_PostLink(i),
                                                                        'Time':GetWall_PostTime(i),
                                                                        'CommentCounts':GetWall_PostCommentCounts(i)}],
                                                               columns = ['Link','Time','CommentCounts'])],
                                       ignore_index = True)
    
    CarwlList = pd.merge(left = CheckList,
                         right = Posts.loc[:,['Link','CommentCounts']],
                         how='left',
                         on='Link',
                         suffixes=('_c', '_p'),
                         indicator=True)
    CarwlList = pd.concat([CarwlList.loc[CarwlList._merge == 'left_only', :], # 新貼文
                           CarwlList.loc[CarwlList.CommentCounts_c > CarwlList.CommentCounts_p, :]], # 舊貼文但有新留言
                          ignore_index=True)
    CarwlList = CarwlList.drop_duplicates(subset = 'Link',
                                          keep= 'first',
                                          inplace = False)
    return CarwlList

# 爬取特定貼文的內容與留言
## 展開貼文與留言

In [3]:
# 選擇按照New或Oldest來顯示留言(選擇最相關或所有留言都不會會真的顯示所有留言)
def clickOldest():
    driver.find_element_by_xpath('//a[@data-testid="UFI2ViewOptionsSelector/link"]').click()
    time.sleep(1)
    try:
        driver.find_element_by_partial_link_text('Comments shown in chronological order with the oldest comments at the top.').click()
    except:
        try:
            driver.find_element_by_partial_link_text('New comments and those with new replies appear at the top.').click()
        except:
            print('Plz, Check this post arragne type!')

# 打開各篇貼文並展開所有留言 與 留言的留言
def PostExpand():
    time.sleep(1)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(1)
    clickNotNow()
    time.sleep(0.5)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(1)
    clickNotNow()
    time.sleep(0.5)
    driver.find_element_by_xpath('//div[@class="_5pcr userContentWrapper"]//a[@data-testid="UFI2CommentsCount/root"]').click()    
    time.sleep(0.5)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(0.5)
    clickNotNow()
    time.sleep(0.5)
    clickOldest()
    time.sleep(1)
    clickNotNow()
    
    # 偵測是否有「更多留言」（第一層），若有則點擊
    while len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_0"]'))>0:
        for i in driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_0"]'):
            driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            # 若彈出註冊視窗點擊not now!
            clickNotNow()
            try:
                i.click()
                time.sleep(1)
            except:
                time.sleep(0.5)   
    # 偵測是否有「更多留言的留言」（第二層），若有則點擊
    while len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))>0:
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
        for i in driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'):
            driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            # 若彈出註冊視窗點擊not now!
            clickNotNow()
            try:
                i.click()
                time.sleep(1)
            except:
                time.sleep(0.5)
    # 偵測是否有「更多留言的留言」（第二層），若有則點擊
    while len(driver.find_elements_by_xpath('//a[@class="_5v47 fss"]'))>0:
        for i in driver.find_elements_by_xpath('//a[@class="_5v47 fss"]'):
            driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            # 若彈出註冊視窗點擊not now!
            try:
                i.click()
                time.sleep(0.5)
            except:
                time.sleep(0.5)

## 爬取貼文內容與留言資訊

In [4]:
# 貼文留言數
def Post_Name(userContent):
    return userContent.find('img').attrs['aria-label']

# 貼文留言數
def Post_ID(userContent):
    return userContent.find('a', {'class':'_5pb8 o_c3pynyi2g _8o _8s lfloat _ohe'}).attrs['href'].split('/?ref')[0].split('https://www.facebook.com/')[1]

# 貼文時間
def Post_Time(userContent):
    try:
        Time = userContent.find('abbr').attrs['title']
        Time = datetime.datetime.strptime(Time, '%m/%d/%y, %I:%M %p')
        Time = Time.strftime("%Y-%m-%d %H:%M")
    except:
        Time = "Error, Please check this post's condiction!" 
    return Time

# 貼文內容
def Post_Content(userContent):
    try:
        Content = userContent.find('div', {'class':'_5pbx userContent _3576'}).text
    except:
        Content = "There's No Text Content!"
    return Content

# 貼文留言數
def Post_Comments(userContent):
    try:
        CommentCounts = userContent.find('a', {'data-testid':'UFI2CommentsCount/root'}).text.split(' ',2)[0]
        if 'K' in CommentCounts:
            CommentCounts = int(float(CommentCounts.split('K')[0])*1000)
        else:
            CommentCounts = int(CommentCounts)
    except:
        CommentCounts = 0 
    return CommentCounts

# 貼文分享數
def Post_Shares(userContent):
    try:
        ShareCounts = userContent.find('span', {'class':'_355t _4vn2'}).text.split(' ',2)[0]
        if 'K' in ShareCounts:
            ShareCounts = int(float(ShareCounts.split('K')[0])*1000)
        else:
            ShareCounts = int(ShareCounts)
    except:
        ShareCounts = 0
    return ShareCounts

# 貼文按讚心情數
def Post_Likes(userContent):
    try:
        LikeCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_LIKE'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in LikeCounts:
            LikeCounts = int(float(LikeCounts.split('K')[0])*1000)
        else:
            LikeCounts = int(LikeCounts)
    except:
        LikeCounts = 0
    return LikeCounts

# 貼文愛心心情數
def Post_Loves(userContent):
    try:
        LoveCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_LOVE'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in LoveCounts:
            LoveCounts = int(float(LoveCounts.split('K')[0])*1000)
        else:
            LoveCounts = int(LoveCounts)
    except:
        LoveCounts = 0
    return LoveCounts

# 貼文哈哈心情數
def Post_Hahas(userContent):
    try:
        HahaCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_HAHA'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in HahaCounts:
            HahaCounts = int(float(HahaCounts.split('K')[0])*1000)
        else:
            HahaCounts = int(HahaCounts)
    except:
        HahaCounts = 0
    return HahaCounts

# 貼文Wow心情數
def Post_Wows(userContent):
    try:
        WowCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_WOW'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in WowCounts:
            WowCounts = int(float(WowCounts.split('K')[0])*1000)
        else:
            WowCounts = int(WowCounts)
    except:
        WowCounts = 0
    return WowCounts

# 貼文Sad心情數
def Post_Sads(userContent):
    try:
        SadCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_SORRY'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in SadCounts:
            SadCounts = int(float(SadCounts.split('K')[0])*1000)
        else:
            SadCounts = int(SadCounts)
    except:
        SadCounts = 0
    return SadCounts

# 貼文Angry心情數
def Post_Angrys(userContent):
    try:
        AngryCounts = userContent.find('span', {'data-testid':'UFI2TopReactions/tooltip_ANGER'}).find('a').attrs['aria-label'].split(' ',2)[0]
        if 'K' in AngryCounts:
            AngryCounts = int(float(AngryCounts.split('K')[0])*1000)
        else:
            AngryCounts = int(AngryCounts)
    except:
        AngryCounts = 0
    return AngryCounts

# 文章內容與互動摘要
def PostInfo(soup):
    # 貼文區
    userContent = soup.find('div', {'class':'_5pcr userContentWrapper'})
    PostContent = pd.DataFrame(data = [{'Name':Post_Name(userContent),
                                        'ID':Post_ID(userContent),
                                        'Time':Post_Time(userContent),
                                        'Content':Post_Content(userContent),
                                        'Comments':Post_Comments(userContent),
                                        'Shares':Post_Shares(userContent),
                                        'Likes':Post_Likes(userContent),
                                        'Loves':Post_Loves(userContent),
                                        'Hahas':Post_Hahas(userContent),
                                        'Wows':Post_Wows(userContent),
                                        'Sads':Post_Sads(userContent),
                                        'Angrys':Post_Angrys(userContent),
                                        'Updatetime':datetime.datetime.now().strftime("%Y-%m-%d %H:%M"),
                                        'Link':driver.current_url}],
                            columns = ['Name', 'ID', 'Time', 'Content', 'Comments', 'Shares', 'Likes', 'Loves', 'Hahas', 'Wows', 'Sads', 'Angrys', 'Updatetime', 'Link'])
    return PostContent

# 留言內容
def Comment_Content(element):
    try:
        Content = element.find('span', {'dir':'ltr'}).text
    except:
        Content = 'img'
    return Content

def CommentsInfo(soup):  
    PostComments = pd.DataFrame()
    userContent = soup.find('div', {'class':'_5pcr userContentWrapper'})
    try:
        for i in userContent.select('ul._7a9a > li'):
            # 先抓留言並放在Comment
            Comment = pd.DataFrame(data=[{'ID':i.find('a', {'class':' _3mf5 _3mg0'}).attrs['data-hovercard'].split('id=',2)[1],
                                          'Name':i.find('img').attrs['alt'],
                                          'Time':datetime.datetime.strptime(i.find('abbr',{'class':'livetimestamp'}).attrs['data-tooltip-content'], '%A, %B %d, %Y at %I:%M %p').strftime("%Y-%m-%d %H:%M"),
                                          'Content':Comment_Content(i),
                                          'RepID':userContent.find('div', {'class':'_5pcp _5lel _2jyu _232_'}).attrs['id'].split(';')[0].split('feed_subtitle_')[-1],
                                          'RepName':userContent.find('img').attrs['aria-label'],
                                          'Link':driver.current_url}],
                                   columns = ['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link'])
            PostComments = pd.concat([PostComments, Comment], ignore_index=True)
            # 留言的留言
            for j in i.findAll('div', {'data-testid':'UFI2Comment/root_depth_1'}):
                Comment = pd.DataFrame(data=[{'ID':j.find('a', {'class':' _3mf5 _3mg1'}).attrs['data-hovercard'].split('id=',2)[1],
                                              'Name':j.find('img').attrs['alt'],
                                              'Time':datetime.datetime.strptime(j.find('abbr',{'class':'livetimestamp'}).attrs['data-tooltip-content'], '%A, %B %d, %Y at %I:%M %p').strftime("%Y-%m-%d %H:%M"),
                                              'Content':Comment_Content(j),
                                              'RepID':i.find('a', {'class':' _3mf5 _3mg0'}).attrs['data-hovercard'].split('id=',2)[1],
                                              'RepName':i.find('img').attrs['alt'],
                                              'Link':driver.current_url}],
                                       columns = ['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link'])
                PostComments = pd.concat([PostComments, Comment], ignore_index=True)
        PostComments['Updatetime'] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
    except:
        print('Crawl Comments Failed!')
    return PostComments

# 更新貼文與留言資訊
將最新抓到的貼文併回Post表格，並以ID與Time作為關鍵(key)值，取最新抓取的資料(Updatetime)

In [5]:
def UpdateData(DateFrame_o,DateFrame_n):
    DataFrame = pd.concat([DateFrame_o, DateFrame_n], ignore_index=True)
    DataFrame = DataFrame.sort_values(by = 'Updatetime', ascending = False)
    DataFrame = DataFrame.drop_duplicates(subset = ['ID', 'Time', 'Content'],
                                          keep= 'first',
                                          inplace = False)
    return DataFrame

# 爬資料
## 讀取先前資料
若在桌面偵測不到檔案，就創造新的dataframe

In [6]:
try:
    Posts = pd.read_pickle('C:/Users/TL_Yu/Desktop/Posts.plk')
except:
    Posts = pd.DataFrame(columns=['Name', 'ID', 'Time', 'Content', 'Comments', 'Shares', 'Likes', 'Loves', 'Hahas', 'Wows', 'Sads', 'Angrys', 'Updatetime', 'Link'])
try:
    Comments = pd.read_pickle('C:/Users/TL_Yu/Desktop/Comments.plk')
except:
    Comments = pd.DataFrame(columns = ['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link', 'Updatetime'])
    
urls = ['https://www.facebook.com/tstartel/',
        'https://www.facebook.com/chtmobile/',
        'https://www.facebook.com/taiwanmobile/',
        'https://www.facebook.com/fareastone/',
        'https://www.facebook.com/Aptg.tw/']

## 產出待爬清單

In [7]:
driver = webdriver.Chrome()
driver.get('https://www.facebook.com/')
time.sleep(1)
driver.find_element_by_partial_link_text('English').click()

In [8]:
CarwlList = CarwlList(urls=urls, n=10, Posts = Posts)
CarwlList

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0,Link,Time,CommentCounts_c,CommentCounts_p,_merge
0,https://www.facebook.com/tstartel/posts/328473...,2019-05-19 20:52,65,,left_only
7,https://www.facebook.com/tstartel/posts/318846...,2019-04-11 04:48,59,,left_only
12,https://www.facebook.com/tstartel/videos/58966...,2019-03-31 21:03,0,,left_only
13,https://www.facebook.com/chtmobile/photos/a.26...,2019-05-21 02:00,1,,left_only
14,https://www.facebook.com/chtmobile/posts/27207...,2019-05-20 02:00,0,,left_only
15,https://www.facebook.com/chtmobile/photos/a.26...,2019-05-17 02:00,2,,left_only
19,https://www.facebook.com/chtmobile/photos/a.26...,2019-05-14 02:42,1,,left_only
20,https://www.facebook.com/chtmobile/posts/27033...,2019-05-08 02:00,2,,left_only
21,https://www.facebook.com/chtmobile/videos/6635...,2019-04-29 03:18,5,,left_only
28,https://www.facebook.com/chtmobile/photos/a.26...,2019-04-19 02:00,4,,left_only


如同上面的說明，當CheckLists「merge」欄位為「left_only」時表示該貼文是新貼文；

而當「merge」欄位為「both」，但「Comments_c」的值大於「Comments_p」時，表示該貼文是舊資料但有新留言。

以下將截取這兩類的貼文作為爬取清單

## 執行爬蟲

In [9]:
for i in CarwlList.Link:
    print('Dealing with: ' + i)
    driver.get(i)
    try:
        PostExpand()
        time.sleep(2)
        print('Expand Succed!')
        try:
            soup = BeautifulSoup(driver.page_source)
            time.sleep(1)
            nPost = PostInfo(soup)
            Posts = UpdateData(DateFrame_o = Posts, DateFrame_n = nPost)
            print('Update PostInfo complete!')
            try:
                nComments = CommentsInfo(soup)
                Comments = UpdateData(DateFrame_o = Comments,DateFrame_n = nComments)
                print('Update CommentsInfo complete!')
            except:
                print('Update CommentsInfo Failed!')
        except:
            print('Crawl Post or Comments Failed!')
    except:
        print('Expand Failed!')
    gc.collect()
    print('Time Log: ' + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + '\n\n------------------')

Dealing with: https://www.facebook.com/tstartel/posts/3284730548219636
Expand Failed!
Time Log: 2019-05-27 12:47:44

------------------
Dealing with: https://www.facebook.com/tstartel/posts/3188462841179741
Expand Failed!
Time Log: 2019-05-27 12:47:52

------------------
Dealing with: https://www.facebook.com/tstartel/videos/589662731510587/
Expand Failed!
Time Log: 2019-05-27 12:47:57

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403546609/2722896354406656/
Expand Failed!
Time Log: 2019-05-27 12:48:06

------------------
Dealing with: https://www.facebook.com/chtmobile/posts/2720795567950068
Expand Failed!
Time Log: 2019-05-27 12:48:11

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403546609/2722488054447486/
Expand Failed!
Time Log: 2019-05-27 12:48:19

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403546609/2715269541836004/
Expand Failed!
Time Log: 2019-05-27 12:48:26

--

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 13:17:33

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403546609/2564181830278110/
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 13:17:48

------------------
Dealing with: https://www.facebook.com/chtmobile/posts/2542217335807893
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:17:20

------------------
Dealing with: https://www.facebook.com/chtmobile/posts/2543686328994327
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:17:43

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403546609/2545583138804646/
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:18:16

------------------
Dealing with: https://www.facebook.com/chtmobile/photos/a.264830403

Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:28:53

------------------
Dealing with: https://www.facebook.com/taiwanmobile/photos/a.1448377238716046/2473289892891437/
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:33:20

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2471318653088561
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:33:34

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2471300693090357
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:33:48

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2471060423114384
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:34:07

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2471291746424585

Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:51:17

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2457899267763833
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:51:31

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2457908234429603
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:51:47

------------------
Dealing with: https://www.facebook.com/taiwanmobile/photos/a.1448377238716046/2457723184448108/
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:52:05

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2457593001127793
Expand Succed!
Update PostInfo complete!
Update CommentsInfo complete!
Time Log: 2019-05-27 14:52:19

------------------
Dealing with: https://www.facebook.com/taiwanmobile/posts/2457114721175621

WebDriverException: Message: chrome not reachable
  (Session info: chrome=74.0.3729.169)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 10.0.17763 x86_64)


In [10]:
Posts = Posts.reset_index(drop=True)
Posts = Posts.loc[:,['Name', 'ID', 'Time', 'Content', 'Comments', 'Shares', 'Likes', 'Loves', 'Hahas', 'Wows', 'Sads', 'Angrys', 'Updatetime', 'Link']]
Posts

Unnamed: 0,Name,ID,Time,Content,Comments,Shares,Likes,Loves,Hahas,Wows,Sads,Angrys,Updatetime,Link
0,台灣大哥大與你在一起,taiwanmobile,2019-04-22 09:00,#新官上任第三把火 🔥熱騰騰的第3把火，你沒看錯!🔥 林之晨 要送你 『靈 芝 橙』 喜上...,130,64,305,0,53,0,0,30,2019-05-27 14:53,https://www.facebook.com/taiwanmobile/posts/24...
1,台灣大哥大與你在一起,taiwanmobile,2019-04-22 17:16,【🕘8點晨報】 今年目前為止蘋果的發表會有別於以往，到現在還沒有正式發表硬體產品，反而使得大...,3,6,83,1,0,0,0,0,2019-05-27 14:52,https://www.facebook.com/taiwanmobile/posts/24...
2,台灣大哥大與你在一起,taiwanmobile,2019-04-22 22:39,小編花了好多個加班的夜晚完成的『#靈芝橙』加碼送專案，收到大家雪片飛來的心聲，雖然玻璃心已碎...,12,12,129,3,5,0,0,0,2019-05-27 14:52,https://www.facebook.com/taiwanmobile/photos/a...
3,台灣大哥大與你在一起,taiwanmobile,2019-04-23 17:00,【🕘8點晨報】 報稅季節即將來臨，網路報稅是蠻多人的報稅選項之一，使得釣魚網站也在伺機而動，...,1,5,53,0,0,2,0,0,2019-05-27 14:51,https://www.facebook.com/taiwanmobile/posts/24...
4,台灣大哥大與你在一起,taiwanmobile,2019-04-23 04:00,"狂賀🎊 郭婞淳 KUO, Hsing-Chun舉破世界紀錄👏👏👏 #台灣之光 #富邦_台灣大...",8,11,242,3,0,2,0,0,2019-05-27 14:51,https://www.facebook.com/taiwanmobile/posts/24...
5,台灣大哥大與你在一起,taiwanmobile,2019-04-23 21:26,[長安妖世繪　小額消費送好康] 📣RPG卡牌遊戲二次元奇幻題材手遊，收妖施法人人都是捉妖大...,5,4,48,0,0,0,0,0,2019-05-27 14:51,https://www.facebook.com/taiwanmobile/photos/a...
6,台灣大哥大與你在一起,taiwanmobile,2019-04-25 03:01,📣旗艦王者 大降價拉📣 最近小編的閨密們都在討論自己的拍照技術📸 如果不靠修修修修修的話 原...,20,10,149,1,0,0,0,1,2019-05-27 14:50,https://www.facebook.com/taiwanmobile/photos/a...
7,台灣大哥大與你在一起,taiwanmobile,2019-04-25 17:00,【🕘8點晨報】 OPPO Reno總算在台正式露面，這次採用全新的設計，著實讓人驚豔，在鏡頭...,5,6,80,1,0,0,0,0,2019-05-27 14:48,https://www.facebook.com/taiwanmobile/posts/24...
8,台灣大哥大與你在一起,taiwanmobile,2019-04-26 17:00,【🕘8點晨報】 春夏就是個賞花的季節，在櫻花、海芋和桐花等陸續登場，緊接著就是繡球花季，色彩...,5,9,116,1,0,0,0,0,2019-05-27 14:47,https://www.facebook.com/taiwanmobile/posts/24...
9,台灣大哥大與你在一起,taiwanmobile,2019-04-26 21:00,『舊手機完全是個尷尬的存在』 買新機很嗨森，但是舊手機該何去何從 (👀 你的舊手機都去哪了?...,31,10,126,1,4,0,0,0,2019-05-27 14:47,https://www.facebook.com/taiwanmobile/posts/24...


In [11]:
Comments = Comments.reset_index(drop=True)
Comments = Comments.loc[:,['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link', 'Updatetime']]
Comments

Unnamed: 0,ID,Name,Time,Content,RepID,RepName,Link,Updatetime
0,100022308163111,林惠珍,2019-04-23 13:40,img,1448357445384692,台灣大哥大與你在一起,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
1,1448357445384692,台灣大哥大與你在一起,2019-04-24 20:22,img,100022308163111,林惠珍,https://www.facebook.com/taiwanmobile/posts/24...,2019-05-27 14:52
2,1247673748,Miula Hung,2019-04-23 18:37,小編蠻有創意的，哈,1448357445384692,台灣大哥大與你在一起,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
3,100001889527106,張庭榮,2019-04-23 13:55,再次上來搞笑？送完靈芝橙 再加送LINE POINTS 100點？台灣大哥大還真有誠意～哈哈...,1448357445384692,台灣大哥大與你在一起,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
4,1448357445384692,台灣大哥大與你在一起,2019-04-23 14:08,謝謝庭榮哥，您的體諒小編收下了❤️老闆很挺小編的創意，相信還是有挺我的粉絲的💪💪,100001889527106,張庭榮,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
5,100001889527106,張庭榮,2019-04-23 14:14,台灣大哥大與你在一起 這不叫創意啦，辦這種活動，等於看不起用戶好嗎？送靈芝1瓶＋柳橙汁1瓶＋...,100001889527106,張庭榮,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
6,1448357445384692,台灣大哥大與你在一起,2019-04-23 17:13,好喔！,100001889527106,張庭榮,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
7,100014224083864,Mike Lu,2019-04-23 18:22,這圖蠻好笑的傳說台台小編兩人著急的熱鍋上的螞蟻,1448357445384692,台灣大哥大與你在一起,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
8,100026280560132,燕巢,2019-04-23 17:40,不錯阿 靈芝橙很有創意 可能大家期待的是零月付XD~,1448357445384692,台灣大哥大與你在一起,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52
9,1448357445384692,台灣大哥大與你在一起,2019-04-23 20:45,Mark Lin 公司同事只是感情很好而已啦😅，有機會希望也可以讓您感受我們的溫暖喔❤️,100009492064507,Mark Lin,https://www.facebook.com/taiwanmobile/photos/a...,2019-05-27 14:52


In [12]:
Posts.to_pickle('C:/Users/TL_Yu/Desktop/Posts.plk')
Comments.to_pickle('C:/Users/TL_Yu/Desktop/Comments.plk')

In [24]:
#Posts.to_csv('C:/Users/TL_Yu/Desktop/Posts.csv')
#Comments.to_csv('C:/Users/TL_Yu/Desktop/Comments.csv')

# 維護測試區

In [13]:
driver = webdriver.Chrome()
driver.get('https://www.facebook.com/')
time.sleep(1)
driver.find_element_by_partial_link_text('English').click()

In [62]:
# https://www.facebook.com/tstartel/posts/3284730548219636
# https://www.facebook.com/tstartel/videos/1100239263492734/?permPage=1
driver.get('https://www.facebook.com/tstartel/posts/3274403725918985')
ClosePopup()
ClickOldest()
MoreComments()
MoreReplies2()

Click More Comments times： 0
Click More Comments times： 1
Click More Comments times： 2
Click More Comments times： 3
Click More Comments times： 4
Click More Comments times： 5
Click More Comments times： 6
Click More Comments times： 7
Click More Comments times： 8
Click More Comments times： 9
Click More Comments times： 10
Click More Comments times： 11
Click More Comments times： 12
Click More Replies times： 1
Click More Replies times： 2
Click More Replies times： 3
Click More Replies times： 4
Click More Replies times： 5
Click More Replies times： 6
Click More Replies times： 7
Click More Replies times： 8
Click More Replies times： 9
Click More Replies times： 10
Click More Replies times： 11
Click More Replies times： 12
Click More Replies times： 13
Click More Replies times： 14
Click More Replies times： 15
Click More Replies times： 16
Click More Replies times： 17
Click More Replies times： 18
Click More Replies times： 19
Click More Replies times： 20
Click More Replies times： 21
Click More Replies t

In [61]:
# 關閉彈窗
def ClosePopup():
    time.sleep(1)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(1)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    clickNotNow()
    driver.find_element_by_xpath('//div[@class="_5pcr userContentWrapper"]//a[@data-testid="UFI2CommentsCount/root"]').click()
    time.sleep(1)
    clickNotNow()
    time.sleep(0.5)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(0.5)
    clickNotNow()
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(0.5)
    clickNotNow()

# 選擇按照New或Oldest來顯示留言(選擇最相關或所有留言都不會會真的顯示所有留言)
def ClickOldest():
    time.sleep(2)
    driver.find_element_by_xpath('//a[@data-testid="UFI2ViewOptionsSelector/link"]').click()
    time.sleep(1)
    try:
        driver.find_element_by_partial_link_text('Comments shown in chronological order with the oldest comments at the top.').click()
    except:
        try:
            driver.find_element_by_partial_link_text('New comments and those with new replies appear at the top.').click()
        except:
            print('Plz, Check this post arragne type!')
    time.sleep(1)
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    clickNotNow()

# 偵測是否有「更多留言」（第一層），若有則點擊
def MoreComments():
    k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_0"]'))
    # 卡住的次數超過100次就停止
    l = 0
    while (k != 0) & (l <= 150):
        print('Click More Comments： ' + str(l) + ' times.')
        try:
            driver.find_element_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_0"]').click()
            time.sleep(1)
            # 若彈出註冊視窗點擊not now!
            clickNotNow()
            k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_0"]'))
        except:
            time.sleep(0.1)
        finally:
            l += 1
    time.sleep(1)

# 偵測是否有「更多回覆」（第二層），若有則點擊
def MoreReplies():
    k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))
    # 卡住的次數超過100次就停止
    l = 0
    while (k != 0) & (l <= 150):
        l += 1
        print('Click More Replies： ' + str(l) + ' times.')
        for i in driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'):
            try:
                i.click()
                time.sleep(0.1)
                clickNotNow()
                k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))
            except:
                time.sleep(0.1)
        time.sleep(1)
        
# 偵測是否有「更多回覆」（第二層），若有則點擊
def MoreReplies2():
    k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))
    # 卡住的次數超過100次就停止
    l = 0
    while (k != 0) & (l <= 150):
        l += 1
        print('Click More Replies： ' + str(l) + ' times.')
        driver.find_element_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]').click()
        time.sleep(1)
        k = len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))
    time.sleep(1)
    
# 打開各篇貼文並展開所有留言 與 留言的留言
def PostExpand():
    ClosePopup()
    ClickOldest()
 
    # 偵測是否有「更多留言的留言」（第二層），若有則點擊
    while len(driver.find_elements_by_xpath('//a[@data-testid="UFI2CommentsPagerRenderer/pager_depth_1"]'))>0:
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

        
        
        
    # 偵測是否有「更多留言的留言」（第二層），若有則點擊
    while len(driver.find_elements_by_xpath('//a[@class="_5v47 fss"]'))>0:
        for i in driver.find_elements_by_xpath('//a[@class="_5v47 fss"]'):
            driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            # 若彈出註冊視窗點擊not now!
            try:
                i.click()
                time.sleep(0.5)
            except:
                time.sleep(0.5)

In [15]:
PostExpand()

Plz, Check this post arragne type!


In [15]:
soup = BeautifulSoup(driver.page_source)

In [16]:
PostInfo(soup)

Unnamed: 0,Name,ID,Time,Content,Comments,Shares,Likes,Loves,Hahas,Wows,Sads,Angrys,Updatetime,Link
0,台灣之星,tstartel,2019-05-25 21:01,想知道走在路上被行注目禮👀的感覺嗎? 明天 #週1福利日 美女速成班 看這👉 http:/...,42,1,225,8,0,0,0,0,2019-05-27 01:05,https://www.facebook.com/tstartel/photos/a.854...


In [26]:
soup = BeautifulSoup(driver.page_source)
CommentsInfo(soup)

Unnamed: 0,ID,Name,Time,Content,RepID,RepName,Link,Updatetime
0,100008036701674,陳國進,2019-05-26 12:44,我今天要去屏東市😆😆😆,360044337354953,台灣之星,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
1,360044337354953,台灣之星,2019-05-26 13:10,要做好防曬唷!!!,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
2,100000293657248,賴清溪,2019-05-26 13:16,貼心😏,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
3,360044337354953,台灣之星,2019-05-26 13:28,🤗,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
4,100008036701674,陳國進,2019-05-26 17:26,賴清溪 真的很貼心呢～,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
5,100008036701674,陳國進,2019-05-26 17:26,台灣之星 好哦～謝謝小編,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
6,360044337354953,台灣之星,2019-05-26 17:35,如何~今天有沒有看見什麼有趣的事呢~~~國進~~~,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
7,100008036701674,陳國進,2019-05-26 19:33,台灣之星 沒有~~但以上速度超穩~~,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
8,360044337354953,台灣之星,2019-05-26 19:41,期待你分享小祕密唷(〃ω〃),100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14
9,100008036701674,陳國進,2019-05-26 19:48,台灣之星 下載80.5 上傳45.0,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.854...,2019-05-27 01:14


In [53]:
for i in :
    print('Dealing with: ' + i)
    
    try:
        PostExpand()
        time.sleep(1)
        print('Expand Succed!')
        try:
            soup = BeautifulSoup(driver.page_source)
            time.sleep(1)
            nPost = PostInfo(soup)
            Posts = UpdateData(DateFrame_o = Posts, DateFrame_n = nPost)
            print('Update PostInfo complete!')
            try:
                nComments = CommentsInfo(soup)
                Comments = UpdateData(DateFrame_o = Comments,DateFrame_n = nComments)
                print('Update CommentsInfo complete!')
            except:
                print('Update CommentsInfo Failed!')
        except:
            print('Crawl Post or Comments Failed!')
    except:
        print('Expand Failed!')
    gc.collect()
    print('Time Log: ' + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + '\n\n------------------')

In [23]:
def CommentsInfo(soup):  
    PostComments = pd.DataFrame()
    userContent = soup.find('div', {'class':'_5pcr userContentWrapper'})
    try:
        for i in userContent.select('ul._7a9a > li'):
            # 先抓留言並放在Comment
            Comment = pd.DataFrame(data=[{'ID':i.find('a', {'class':' _3mf5 _3mg0'}).attrs['data-hovercard'].split('id=',2)[1],
                                          'Name':i.find('img').attrs['alt'],
                                          'Time':datetime.datetime.strptime(i.find('abbr',{'class':'livetimestamp'}).attrs['data-tooltip-content'], '%A, %B %d, %Y at %I:%M %p').strftime("%Y-%m-%d %H:%M"),
                                          'Content':Comment_Content(i),
                                          'RepID':userContent.find('div', {'class':'_5pcp _5lel _2jyu _232_'}).attrs['id'].split(';')[0].split('feed_subtitle_')[-1],
                                          'RepName':userContent.find('img').attrs['aria-label'],
                                          'Link':driver.current_url}],
                                   columns = ['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link'])
            PostComments = pd.concat([PostComments, Comment], ignore_index=True)
            # 留言的留言
            for j in i.findAll('div', {'data-testid':'UFI2Comment/root_depth_1'}):
                Comment = pd.DataFrame(data=[{'ID':j.find('a', {'class':' _3mf5 _3mg1'}).attrs['data-hovercard'].split('id=',2)[1],
                                              'Name':j.find('img').attrs['alt'],
                                              'Time':datetime.datetime.strptime(j.find('abbr',{'class':'livetimestamp'}).attrs['data-tooltip-content'], '%A, %B %d, %Y at %I:%M %p').strftime("%Y-%m-%d %H:%M"),
                                              'Content':GetPost_CommentContent(j),
                                              'RepID':i.find('a', {'class':' _3mf5 _3mg0'}).attrs['data-hovercard'].split('id=',2)[1],
                                              'RepName':i.find('img').attrs['alt'],
                                              'Link':driver.current_url}],
                                       columns = ['ID', 'Name', 'Time', 'Content', 'RepID', 'RepName', 'Link'])
                PostComments = pd.concat([PostComments, Comment], ignore_index=True)
        PostComments['Updatetime'] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
    except:
        print('Crawl Comments Failed!')
    return PostComments

In [24]:
soup = BeautifulSoup(driver.page_source)
CommentsInfo(soup)

Crawl Comments Failed!


Unnamed: 0,ID,Name,Time,Content,RepID,RepName,Link
0,100008036701674,陳國進,2019-05-26 12:44,我今天要去屏東市😆😆😆,360044337354953,台灣之星,https://www.facebook.com/tstartel/photos/a.854...


In [56]:
soup = BeautifulSoup(driver.page_source)
PostInfo(soup)

Unnamed: 0,Name,ID,Time,Content,Comments,Shares,Likes,Loves,Hahas,Wows,Sads,Angrys,Updatetime,Link
0,台灣之星,tstartel,2019-05-15 22:11,台灣之星2019全新品牌主張 你的電信就該是這個樣子⚡今日登場 全民一起辦活動⚡同步開跑‼️...,1700,88,732,12,0,0,0,20,2019-05-27 00:43,https://www.facebook.com/tstartel/posts/327440...


In [57]:
CommentsInfo(soup)

Unnamed: 0,ID,Name,Time,Content,RepID,RepName,Link,Updatetime
0,100004563705267,李維尼,2019-05-16 13:14,Line 好友不是說299只到今天上午09:59。,360044337354953,台灣之星,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
1,360044337354953,台灣之星,2019-05-16 13:25,我們聽到大家的心聲~加碼到5/31唷!!!!,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
2,100003595572374,魏坤輝,2019-05-16 13:33,網頁進不去,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
3,360044337354953,台灣之星,2019-05-16 13:57,小編測試可以唷~~http://bit.ly/2HihMLD,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
4,100004563705267,李維尼,2019-05-16 14:03,台灣之星 請問我去年有辦188了，但188是21m吃到飽，現在299是不限速吃到飽，我可以去...,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
5,360044337354953,台灣之星,2019-05-16 14:23,老客戶免費勁速體驗底家~快去申請體驗吧~https://doc.tstartel.com/e...,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
6,100004563705267,李維尼,2019-05-16 14:26,台灣之星 謝謝，已申請，OK的話，馬上來辦。,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
7,100004563705267,李維尼,2019-05-16 14:39,台灣之星 看來我家是各大電信悲劇，競速是23.9/11.6，和188吃到飽，差不多。,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
8,360044337354953,台灣之星,2019-05-16 14:50,行動網路容易受使用地點附近的房屋密集度、建築裝潢使用材質、附近使用人數多寡…等等因素影響，我...,100004563705267,李維尼,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44
9,100001860123628,Wu Hung Chuan,2019-05-16 13:15,499沒有比較划算，你們的費率本來就超過1分1元，會打到499的算起來還是499為什麼要辦4...,360044337354953,台灣之星,https://www.facebook.com/tstartel/posts/327440...,2019-05-27 00:44


In [50]:
clickOldest()

In [55]:
Posts = UpdateData(DateFrame_o = Posts, DateFrame_n = nPost)
Posts

Unnamed: 0,Name,ID,Time,Content,CommentCounts,Shares,Like,Love,Haha,Wow,Sad,Angry,Updatetime,Link
2,台灣之星,360044337354953,2019-05-18 21:00,好想去日本🇯🇵看繡球花唷 小編們聽 #U姐編 嘟嘟囔囔一週快煩死了 交換個眼色，作戰計劃開始...,55,13,626,8,2,0,0,0,2019-05-20 23:47,https://www.facebook.com/tstartel/photos/a.413...
0,台灣之星,360044337354953,2019-05-19 02:00,#週1福利日 #每週一1230 #週週登場 有沒有人跟 #00C編 一樣 手機沒電會狂症發作...,13,4,299,3,0,5,0,0,2019-05-20 23:46,https://www.facebook.com/tstartel/photos/a.413...
1,台灣之星,360044337354953,2019-05-19 20:52,反正只要一兩塊錢 買個袋子裝一裝，比較方便啦😆 環保餐具用完還要洗，好麻煩 直接拿竹筷和塑膠...,35,13,79,1,0,0,3,0,2019-05-20 23:45,https://www.facebook.com/tstartel/posts/328473...


In [56]:
Comments = UpdateData(DateFrame_o = Comments,DateFrame_n = nComments)
Comments

Unnamed: 0,ID,Name,Time,Content,RepID,RepName,Link,Updatetime
48,360044337354953,台灣之星,2019-05-19 15:52,看美美照片一樣心情會hen好,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
61,360044337354953,台灣之星,2019-05-19 19:40,img,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
70,100008036701674,陳國進,2019-05-20 17:18,台灣之星 還好我16:00回到家過一下才下雨~~,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
69,360044337354953,台灣之星,2019-05-20 16:48,有沒有記得帶雨傘呀~,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
68,100008036701674,陳國進,2019-05-20 16:44,台灣之星 小編，下大雨了,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
67,360044337354953,台灣之星,2019-05-19 21:08,加油加油(๑˃̵ᴗ˂̵)ﻭ其實上課是很幸福D~~晚安😉😉,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
66,100008036701674,陳國進,2019-05-19 21:03,台灣之星 明天又要上課，小編晚安~~,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
65,360044337354953,台灣之星,2019-05-19 20:47,早點洗洗睡明天又是新的一天~(✿◠‿◠),100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
64,100008036701674,陳國進,2019-05-19 20:42,台灣之星 是啊～,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47
63,360044337354953,台灣之星,2019-05-19 20:41,國進到家啦❓,100008036701674,陳國進,https://www.facebook.com/tstartel/photos/a.413...,2019-05-20 23:47


In [16]:
driver.quit()

# 其他

In [18]:
# 原本想要登入的方式獲取連結，但FB會偵測異常行為，暫時先不登入
# 關閉Chrome的「通知」提醒
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")

driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.facebook.com/'
driver.get(url)
time.sleep(3)

# 啟動瀏覽器並登入Facebook
username = driver.find_element_by_id('email')
username.send_keys('61034b001@gms.ndhu.edu.tw')
passwd=driver.find_element_by_id('pass')
passwd.send_keys('19920309')
button=driver.find_element_by_id('loginbutton')
button.click()