# Facebook Data Crawling
In this notebook, we will be crawling data from Facebook using the Facebook Graph API. We will be using the facebook-scraper

## Install the required library
We will be using the facebook-scraper library to crawl data from Facebook. We will install this library using pip.

In [1]:
%pip install facebook_scraper pandas numpy

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
from facebook_scraper import get_posts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Crawl the data using facebook_scraper
Now we can get the data from Facebook using the facebook_scraper library. We will be using the get_posts function to get the posts from the fanpage. This function will return a list of dictionaries, where each dictionary represents a post. We will be saving this list of dictionaries to a json file. More information about what you can do with the facebook_scraper library can be found here: https://github.com/kevinzg/facebook-scraper

## Define variables
First we have to define some variables that we will be using throughout the notebook. 
- FANPAGE_LINK: The link to the fanpage that we want to crawl data from. This can be found by going to the fanpage and copying the link from the address bar. For example, the link to the fanpage of the [Nintendo Switch](https://www.facebook.com/NintendoSwitch/) is https://www.facebook.com/NintendoSwitch/. We will be using this link as the value for FANPAGE_LINK.

- COOKIE_PATH: The path to the cookie file that we will be using to authenticate with Facebook. This cookie file can be obtained by logging into Facebook and copying the cookie from the browser. For example, in Chromium, use extension [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid) to get the cookie file. Then save the cookie to a file and use the path to this file as the value for COOKIE_PATH. <span style="color:red; font-weight:bold">USE COOKIE FROM A FAKE ACCOUNT, OTHERWISE YOUR REAL ACCOUNT MIGHT GET BANNED.</span>.


- FOLDER_NAME: The name of the folder that we will be saving the data to. This folder will be created in the same directory as this notebook.

In [3]:
FANPAGE_LINK ="SpaceSpeakersCM"
FOLDER_PATH = "Data/"
COOKIE_PATH = "./cookies.txt"
PAGES_NUMBER = 10 # Number of pages to crawl

In [4]:
post_list = []
for post in get_posts(FANPAGE_LINK,
                    options={"comments": True, "reactions": True, "allow_extra_requests": True},
                    extra_info=True, pages=PAGES_NUMBER, cookies=COOKIE_PATH):
    post_list.append(post)
    print(post)

{'post_id': '910863380635703', 'text': 'TRÒN 1 NĂM DIỄN RA KOSMIK 💥☄️\nMột trải nghiệm mà tất cả những người yêu quý âm nhạc của SpaceSpeakers sẽ không bao giờ quên!\n\nCảm ơn các bạn vì đã luôn đồng hành\nForever grateful for you bois 🔥\n\n#KOSMIK #SpaceSpeakers', 'post_text': 'TRÒN 1 NĂM DIỄN RA KOSMIK 💥☄️\nMột trải nghiệm mà tất cả những người yêu quý âm nhạc của SpaceSpeakers sẽ không bao giờ quên!\n\nCảm ơn các bạn vì đã luôn đồng hành\nForever grateful for you bois 🔥\n\n#KOSMIK #SpaceSpeakers', 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 11, 12, 12, 4, 16), 'timestamp': 1699765456, 'image': None, 'image_lowquality': 'https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5256-10/370217378_865345835144534_2883399909253735568_n.jpg?stp=cp0_dst-jpg_e15_q65_s320x320&_nc_cat=102&ccb=1-7&_nc_sid=f3b36a&efg=eyJpIjoidCJ9&_nc_ohc=edNeGStcLk8AX-WQb1z&_nc_ht=scontent.fhan2-3.fna&oh=00_AfBpPWGoZLS8091wNDm4EHxC2Ocw9LIP1msFydYSCd2T-Q&oe=655AE66A', 'images': [], 'images_desc



{'post_id': '907540044301370', 'text': 'Space bois & Space girls,\nIts Maroon 5 🔥🔥🔥\nSee ya in PhuQuoc.\n\n#Vinwonders #Vinfast #Maroon5', 'post_text': 'Space bois & Space girls,\nIts Maroon 5 🔥🔥🔥\nSee ya in PhuQuoc.\n\n#Vinwonders #Vinfast #Maroon5', 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 11, 5, 13, 24, 12), 'timestamp': 1699165452, 'image': 'https://m.facebook.com/photo/view_full_size/?fbid=907538947634813&ref_component=mbasic_photo_permalink&ref_page=%2Fwap%2Fphoto.php&refid=13&_ft_=encrypted_tracking_data.0AY_9fg4KoajaaPMy3px9mlHDyOHCAL26vZ-lY_-F3_K0I8Rz10HRkB8PmdWsBgiAn6y6BNFys_rSUMMbWawBI8RJAm5xkAvNKoum-DKpV0hFiT7syTb7-jIGfNhgebmf0SCOqQpxajeJsJiEEnTjfBtatJ7vg6lcvD-tIS5raC2mpWWcpds1l7y1uilx44-55Dyy0yEbKYoEPV4m3dxzXNRiuERuuCj82EIsBt3K5b37qmhrrkQEMpUbEk8u2gwrB_0gJUx_RZWndZixWS4_FEU-_KrDsH9rklWY0AUT038B507jkiKglacS2FYJvXeiuy0t-aSGVVNVpW43AMWpGrei-\\-\\naJom1PhdATqupBtdyuZ_FHo9Tqeox7T_rchwCjhwy5_Z_KOfLuQvey2nghqnMpbbUo2Og4M8Vpfxy39py8f29uGRmaM7u-3MG1



{'post_id': '899471695108205', 'text': "The Sáng Loá (2023) starring Xuân Đan aka Binz Da Poet 🧔🏼\u200d♂️ Mọi người đã nghe 'Hit Me Up' chưa?\n\n#Binz #Touliver #SSLabel #HitMeUp #DanXinhInLove #WarnerMusicVietnam", 'post_text': "The Sáng Loá (2023) starring Xuân Đan aka Binz Da Poet 🧔🏼\u200d♂️ Mọi người đã nghe 'Hit Me Up' chưa?\n\n#Binz #Touliver #SSLabel #HitMeUp #DanXinhInLove #WarnerMusicVietnam", 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 10, 20, 20, 41, 1), 'timestamp': 1697809261, 'image': 'https://scontent.fhan2-3.fna.fbcdn.net/v/t39.30808-6/394270636_899471371774904_9172027944298825735_n.jpg?stp=cp0_dst-jpg_e15_fr_q65&_nc_cat=109&ccb=1-7&_nc_sid=5f2048&efg=eyJpIjoidCJ9&_nc_ohc=tP7OKW7QUmgAX8_4O6G&_nc_ht=scontent.fhan2-3.fna&oh=00_AfBqsJHdd6cH_Bt-1TpwY184ze3wlnzFebjqppzHRJssjw&oe=655A4500&manual_redirect=1', 'image_lowquality': 'https://scontent.fhan2-3.fna.fbcdn.net/v/t39.30808-6/394270636_899471371774904_9172027944298825735_n.jpg?stp=cp0_dst-jp



{'post_id': '898972625158112', 'text': "'Hit Me Up' MV ra mắt 📲 Binz Da Poet\n#Binz #Touliver #SpaceSpeakers #SSLabel #HitMeUp #WarnerMusicVietnam", 'post_text': "'Hit Me Up' MV ra mắt 📲 Binz Da Poet\n#Binz #Touliver #SpaceSpeakers #SSLabel #HitMeUp #WarnerMusicVietnam", 'shared_text': '', 'original_text': None, 'time': datetime.datetime(2023, 10, 19, 20, 59, 54), 'timestamp': 1697723994, 'image': None, 'image_lowquality': 'https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5256-10/393976603_312957078120481_2985572976268204616_n.jpg?stp=cp0_dst-jpg_e15_p320x320_q65&_nc_cat=101&ccb=1-7&_nc_sid=f3b36a&efg=eyJpIjoidCJ9&_nc_ohc=aj8eQ6WEco0AX_4rhjp&_nc_ht=scontent.fhan2-3.fna&oh=00_AfAoUn3KlgdHPjDrJpuOGXbjoN58ahA-akOoyX5ZwXoBPg&oe=655A5367', 'images': [], 'images_description': [], 'images_lowquality': ['https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5256-10/393976603_312957078120481_2985572976268204616_n.jpg?stp=cp0_dst-jpg_e15_p320x320_q65&_nc_cat=101&ccb=1-7&_nc_sid=f3b36a&efg=eyJpIjoidCJ9&_nc_oh

In [5]:
len(post_list)

100

In [7]:
print(type(post_list[0]))
x = post_list[0].keys()
x

<class 'dict'>


dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'original_text', 'time', 'timestamp', 'image', 'image_lowquality', 'images', 'images_description', 'images_lowquality', 'images_lowquality_description', 'video', 'video_duration_seconds', 'video_height', 'video_id', 'video_quality', 'video_size_MB', 'video_thumbnail', 'video_watches', 'video_width', 'likes', 'comments', 'shares', 'post_url', 'link', 'links', 'user_id', 'username', 'user_url', 'is_live', 'factcheck', 'shared_post_id', 'shared_time', 'shared_user_id', 'shared_username', 'shared_post_url', 'available', 'comments_full', 'reactors', 'w3_fb_url', 'reactions', 'reaction_count', 'with', 'page_id', 'sharers', 'image_id', 'image_ids', 'was_live', 'fetched_time'])

## Convert list of dicts to df

Now we can convert the list of dictionaries to a pandas dataframe. We will be using the pandas library to do this. We will also be saving the dataframe to a xlxs or csv file.

In [8]:
# Initialize dataframe to scrape Facebook post
post_df_full = pd.DataFrame(columns=post_list[0].keys(), index=range(len(post_list)), data=post_list)
# To df
path1 = FOLDER_PATH + FANPAGE_LINK + ".csv"
post_df_full.to_csv(path1, index=False)
print(path1)

path2 = FOLDER_PATH + FANPAGE_LINK + ".xlsx"
post_df_full.to_excel(path2, index=False, engine='openpyxl')
print(path2)

Data/SpaceSpeakersCM.csv
Data/SpaceSpeakersCM.xlsx


In [9]:
# converting list to array
arr = np.array(post_list)
path3 = FOLDER_PATH + FANPAGE_LINK + ".npy"
np.save(path3, arr)    # .npy extension is added if not given
print(path3)

Data/SpaceSpeakersCM.npy


In [10]:
post_df_full

Unnamed: 0,post_id,text,post_text,shared_text,original_text,time,timestamp,image,image_lowquality,images,...,w3_fb_url,reactions,reaction_count,with,page_id,sharers,image_id,image_ids,was_live,fetched_time
0,910863380635703,TRÒN 1 NĂM DIỄN RA KOSMIK 💥☄️\nMột trải nghiệm...,TRÒN 1 NĂM DIỄN RA KOSMIK 💥☄️\nMột trải nghiệm...,,,2023-11-12 12:04:16,1699765456,,https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5...,[],...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 446, 'love': 111, 'haha': 1, 'wow': 3...",567,,163763810381299,,,[],False,2023-11-16 14:37:46.904763
1,907540044301370,"Space bois & Space girls,\nIts Maroon 5 🔥🔥🔥\nS...","Space bois & Space girls,\nIts Maroon 5 🔥🔥🔥\nS...",,,2023-11-05 13:24:12,1699165452,https://m.facebook.com/photo/view_full_size/?f...,https://scontent.fhan2-4.fna.fbcdn.net/v/t39.3...,[https://m.facebook.com/photo/view_full_size/?...,...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 376, 'love': 63, 'wow': 4, 'care': 2}",445,,163763810381299,,907538947634813,[907538947634813],False,2023-11-16 14:37:49.067014
2,906705411051500,'Đan Xinh In Love' = Nhật kí tình Đan 💕\n\n#Bi...,'Đan Xinh In Love' = Nhật kí tình Đan 💕\n\n#Bi...,,,2023-11-03 20:08:32,1699016912,,https://scontent.fhan2-4.fna.fbcdn.net/v/t15.5...,[],...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 490, 'love': 51, 'haha': 2, 'care': 4...",548,,163763810381299,,,[],False,2023-11-16 14:37:52.855720
3,906220787766629,Make Binz Da Poet feel loved 💗\n\n#Binz #Touli...,Make Binz Da Poet feel loved 💗\n\n#Binz #Touli...,,,2023-11-02 20:40:00,1698932400,,https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5...,[],...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 1681, 'love': 186, 'haha': 7, 'wow': ...",1884,,163763810381299,,,[],False,2023-11-16 14:38:01.920520
4,906176837771024,It's Xuân Đan's world and we're just living in...,It's Xuân Đan's world and we're just living in...,,,2023-11-02 20:00:09,1698930009,https://scontent.fhan2-4.fna.fbcdn.net/v/t39.3...,https://scontent.fhan2-4.fna.fbcdn.net/v/t39.3...,[https://scontent.fhan2-4.fna.fbcdn.net/v/t39....,...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 116, 'love': 23, 'care': 1, 'sad': 1}",141,,163763810381299,,906176684437706,[906176684437706],False,2023-11-16 14:38:49.235092
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,811294883925887,Space Date 3 sẵn sàng vào cuối tuần này 🌪\n\nH...,Space Date 3 sẵn sàng vào cuối tuần này 🌪\n\nH...,,,2023-05-16 21:00:07,1684245607,,https://scontent.fhan2-3.fna.fbcdn.net/v/t15.5...,[],...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 333, 'love': 117, 'haha': 1, 'wow': 2...",458,,163763810381299,,,[],False,2023-11-16 15:00:11.361953
96,811306187258090,Một sự kết hợp lần đầu tiên 🎸\n\nTouliver x $A...,Một sự kết hợp lần đầu tiên 🎸\n\nTouliver x $A...,,,2023-05-16 20:00:03,1684242003,https://m.facebook.com/photo/view_full_size/?f...,https://scontent.fhan2-3.fna.fbcdn.net/v/t39.3...,[https://m.facebook.com/photo/view_full_size/?...,...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 3698, 'love': 701, 'haha': 5, 'wow': ...",4451,,163763810381299,,811306163924759,[811306163924759],False,2023-11-16 15:00:23.416783
97,810742723981103,Space Jam Volume 01 - Album Release Event Reca...,Space Jam Volume 01 - Album Release Event Reca...,,,2023-05-15 20:00:32,1684155632,,https://scontent.fhan2-4.fna.fbcdn.net/v/t15.5...,[],...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 1792, 'love': 101, 'haha': 1, 'care': 5}",1899,,163763810381299,,,[],False,2023-11-16 15:00:33.949789
98,810747570647285,Space Date #3 is coming ☄️🌪🔥🌟\nNext stop: KTX ...,Space Date #3 is coming ☄️🌪🔥🌟\nNext stop: KTX ...,,,2023-05-15 18:24:02,1684149842,https://scontent.fhan2-4.fna.fbcdn.net/v/t39.3...,https://scontent.fhan2-4.fna.fbcdn.net/v/t39.3...,[https://scontent.fhan2-4.fna.fbcdn.net/v/t39....,...,https://www.facebook.com/SpaceSpeakersCM/posts...,"{'like': 951, 'love': 269, 'wow': 13, 'care': ...",1247,,163763810381299,,,[],False,2023-11-16 15:00:45.263020
