# Facebook Scraping

This code segment is a Python function designed to scrape data from a Facebook page using BeautifulSoup and Selenium libraries. Here's a breakdown:

1. Import necessary libraries:
    ```python
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.common.exceptions import NoSuchElementException, TimeoutException
    from selenium.webdriver.common.by import By
    import time
    import pandas as pd
    ```

2. Define the function `get_facebook_post_data`:
    - Parameters:
        - `page_url`: URL of the Facebook page to scrape.
        - `scroll_count` (optional): Number of times to scroll down to load more posts.
    - Returns: A DataFrame containing extracted post data.

3. Inside the function:
    - It initializes a Chrome WebDriver assuming the path to `chromedriver` is valid.
    - Navigates to the provided Facebook page URL.
    - Clicks the "See More Posts" button if present.
    - Scrolls down to load more posts based on the `scroll_count`.
    - Uses BeautifulSoup to parse the page source.
    - Extracts post data (title, link, date, reactions, comments) using specified class names.
    - Constructs a DataFrame from the extracted data and returns it.
    - Catches `NoSuchElementException` and `TimeoutException` errors and prints them.

4. Usage example:
    ```python
    series = get_facebook_post_data('https://web.facebook.com/alwa3d4', scroll_count=80)
    ```

Note: Ensure you have the required Chrome WebDriver (`chromedriver`) installed and available at the specified path before executing this code.


In [None]:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.common.by import By  # Use By for cleaner selectors
import time
import pandas as pd


def get_facebook_post_data(page_url, scroll_count=2):
    """
    Extracts post data (title, link, reactions, comments) from a Facebook page.
    Args:
        page_url (str): The URL of the Facebook page.
        scroll_count (int, optional): The number of times to scroll down to load more posts. Defaults to 2.
    Returns:
        pandas.DataFrame: A DataFrame containing the extracted post data, or an empty DataFrame
            if an error occurs.
    """

    try:
        driver = webdriver.Chrome('C:/Users/sejja/chromedriver')  # Assuming valid path

        driver.get(page_url)
        time.sleep(2)  # Adjust sleep time as needed

        # Click the "See More Posts" button if present (using By for reliability)
        try:
            button = driver.find_element(By.CLASS_NAME, "x1tk7jg1")  # Class name of the button
            button.click()
        except NoSuchElementException:
            pass  # Ignore if button not found

        last_height = driver.execute_script("return document.body.scrollHeight")
        for _ in range(scroll_count):
            driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            time.sleep(2)  # Adjust sleep time as needed
            new_height = driver.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        src = driver.page_source
        soup = BeautifulSoup(src, 'lxml')

        data = []
        for post in soup.find_all('div', {'class': 'x1yztbdb x1n2onr6 xh8yej3 x1ja2u2z'}):
            try:
                title_element = post.find('div', {'class': 'xdj266r x11i5rnm xat24cr x1mh8g0r x1vvkbs x126k92a'})
                link_element = post.find('a', {'class': 'x1i10hfl xjbqb8w x1ejq31n xd10rxx x1sy0etr x17r0tee x972fbf xcfux6l x1qhh985 xm0m39n x9f619 x1ypdohk xt0psk2 xe8uvvx xdj266r x11i5rnm xat24cr x1mh8g0r xexx8yu x4uap5 x18d9i69 xkhd6sd x16tdsg8 x1hl2dhg xggy1nq x1a2a7pz x1heor9g xt0b8zv xo1l8bm'})
                react_element = post.find('span', {'class': 'xrbpyxo x6ikm8r x10wlt62 xlyipyv x1exxlbk'})
                comments_element = post.find('span', {'class': 'x193iq5w xeuugli x13faqbe x1vvkbs x1xmvt09 x1lliihq x1s928wv xhkezso x1gmr53x x1cpjm7i x1fgarty x1943h6x xudqn12 x3x7a5m x6prxxf xvq8zen xo1l8bm xi81zsa'})

                if title_element and link_element and react_element and comments_element:
                    title = title_element.text.strip()
                    link = link_element.get('href')
                    date = link_element.text.strip()
                    react = react_element.text.strip()
                    comments = comments_element.text.strip()
                    data.append({'Title': title, 'Link': link, 'React': react,'Date':date, 'Comments': comments})
            except AttributeError:
                pass  # Skip posts with missing elements

        df = pd.DataFrame(data)
        return df

    except (NoSuchElementException, TimeoutException) as e:
        print(e)
#series=get_facebook_post_data('https://web.facebook.com/alwa3d4', scroll_count=80)



'4,3\xa0K'

# Sorting Posts by Number of Reactions

This code segment demonstrates how to sort posts by the number of reactions. Here's a step-by-step explanation:

In [None]:
import pandas as pd

# Read the CSV file into a DataFrame
series = pd.read_csv('series.csv')

# Correctly replace commas with dots in the 'React' column
series['React'] = series['React'].str.replace(",", "")

# Remove any leading/trailing whitespace and non-breaking space before "K"
series['React'] = series['React'].str.strip().str.rstrip('\xa0')

# Efficiently replace "K" with "000" and remove non-breaking space
series['React'] = pd.to_numeric(series['React'].str.replace("K", "00").str.replace('\xa0', ''))

# Display the corrected DataFrame
series = series.sort_values(by='React', ascending=False)
series

Unnamed: 0.1,Unnamed: 0,Title,Link,React,Date,Comments
79,79,العاىلة الكاطورزية,https://web.facebook.com/alwa3d4/posts/pfbid02...,9900,4 j,834
68,68,العين خيبة ميمكنش يكنو تفاقو,https://web.facebook.com/alwa3d4/posts/pfbid0x...,9500,4 j,"2,3 K"
104,104,الاحداث القادمة غادي تشلل,https://web.facebook.com/alwa3d4/posts/pfbid0r...,9300,21 mars à 02:14,688
169,169,جابها تخدم معاه بلا مايعرف أنها بنتأكثر وحدين ...,https://web.facebook.com/alwa3d4/posts/pfbid03...,9200,17 mars à 20:13,678
25,25,-هاذ الطيحات ماااشي ديال المغاربة، بشاااخ فين ...,https://web.facebook.com/alwa3d4/posts/pfbid02...,8900,1 j,546
...,...,...,...,...,...,...
92,92,تخيلو معايا يكون هدا هو الأب الحقيقي لحمزةصافي...,https://web.facebook.com/alwa3d4/posts/pfbid02...,73,24 mars à 04:07,10
93,93,ناري قتلني بضحك,https://web.facebook.com/alwa3d4/posts/pfbid0z...,63,24 mars à 03:46,2
89,89,حتى هاد الكي خطار,https://web.facebook.com/alwa3d4/posts/pfbid03...,43,6 j,6
90,90,قلتلها يمكن,https://web.facebook.com/alwa3d4/posts/pfbid02...,29,6 j,3


# Facebook Scraper Class

This code defines a class `FacebookScraper` that utilizes the `facebook_scraper` library to extract data from Facebook posts. Here's an overview:

1. Import required libraries:
    ```python
    import facebook_scraper as fs
    import pandas as pd
    from facebook_scraper import exceptions  # Import specific exceptions
    ```

2. Define the `FacebookScraper` class:
    - Constructor:
        - Initializes the maximum number of comments to retrieve (`MAX_COMMENTS`).
    - Method `getPostData`:
        - Parameters:
            - `post_url`: URL of the Facebook post.
        - Returns:
            - If successful, returns a DataFrame containing post comments.
            - If unsuccessful, prints an error message and returns `None`.
    - Inside the method:
        - Extracts the post ID from the provided URL.
        - Attempts to retrieve post data using `facebook_scraper`.
        - Handles potential errors such as missing comments or invalid URLs.
        - If comments are found, normalizes the JSON data into a DataFrame.

3. Exception handling:
    - Catches `ValueError`, `IndexError`, and specific exceptions from `facebook_scraper`.
    - Prints error messages and returns `None` in case of failure.

This class provides a structured approach to scraping Facebook post comments and handling potential errors.


In [None]:
import facebook_scraper as fs
import pandas as pd
from facebook_scraper import exceptions  # Import specific exceptions

class FacebookScraper:
    def __init__(self):
        self.MAX_COMMENTS = 800

    def getPostData(self, post_url):
        try:
            post_id = post_url.split("/")[-1].split("?")[0]  # Extract post ID
            print(post_id)

            # Attempt to get post data, handling potential errors
            gen = fs.get_posts(post_urls=[post_id], options={"comments": self.MAX_COMMENTS, "progress": True})
            post = next(gen)

            # Handle missing 'comments_full' key
            comments = post.get('comments_full', [])  # Use default empty list if missing

            if comments:
                df = pd.json_normalize(comments, sep='_')
                return df
            else:
                print(f"No comments found for post: {post_id}")
                return None  # Return None to indicate no comments

        except (ValueError, IndexError, exceptions) as e:
            print(f"Error retrieving post data: {post_url} - {e}")
            return None  # Return None to signal failure


# Scraping Facebook Post Data

This code snippet utilizes a Facebook scraper instance (`fss`) to scrape data from a list of Facebook posts. Here's an overview:

1. Import required libraries:
    ```python
    import pandas as pd
    from facebook_scraper import exceptions  # Import specific exceptions
    ```

2. Initialize a list to store all the DataFrames:
    ```python
    all_post_data = []
    ```

3. Create a FacebookScraper instance (`fss`):
    ```python
    fss = FacebookScraper()
    ```

4. Limit the number of posts to scrape (adjust as needed):
    ```python
    seriesPost = series[:25]  # For example, limit to the first 25 posts
    ```

5. Iterate through the posts and titles:
    ```python
    for p, c in zip(seriesPost['Link'], seriesPost['Title']):
    ```

6. Inside the loop:
    - Attempt to scrape post data using the `getPostData` method from `FacebookScraper`.
    - If post data is successfully retrieved, add the category (`Title`) to the DataFrame and append it to the `all_post_data` list.
    - Handle potential errors and print error messages.

7. Combine all DataFrames into a single DataFrame:
    ```python
    if all_post_data:
        all_comments_df = pd.concat(all_post_data, ignore_index=True)
        # Optionally save the DataFrame to a CSV file
        # all_comments_df.to_csv('all_comments.csv', index=False)
    else:
        print("No posts were successfully scraped.")
    ```

This code effectively scrapes data from Facebook posts, handles errors gracefully, and combines the extracted data into a single DataFrame.


In [None]:
import pandas as pd
from facebook_scraper import exceptions  # Import specific exceptions

# Initialize a list to store all the DataFrames
all_post_data = []

# Create a Facebook scraper instance
fss = FacebookScraper()

# Limit the number of posts to scrape (adjust as needed)
seriesPost = series[:25]

for p, c in zip(seriesPost['Link'], seriesPost['Title']):
    try:
        # Attempt to scrape post data
        post_data = fss.getPostData(p)

        if post_data is not None:
            # Add category and append to list if successful
            post_data['Title'] = c
            all_post_data.append(post_data)

    except (ValueError, IndexError, ) as e:
        print(f"Error scraping post: {p} - {e}")

# Combine all DataFrames into a single DataFrame (assuming compatible structures)
if all_post_data:
    all_comments_df = pd.concat(all_post_data, ignore_index=True)
    # Save the DataFrame to a CSV file (optional)
    # all_comments_df.to_csv('all_comments.csv', index=False)
else:
    print("No posts were successfully scraped.")


pfbid022jobutRenT81bktfoSPcooCsHSB6gDhQgdvXtaq5SVMcP9FzLQXnHawGCujvHsvUl
No comments found for post: pfbid022jobutRenT81bktfoSPcooCsHSB6gDhQgdvXtaq5SVMcP9FzLQXnHawGCujvHsvUl
pfbid0x97YYD17YkjBF96PPzXkZFU4Y1ef1k4p5bXx3LfHGtSmaLbF68PLCRkM2yjAHdoVl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid0rdEgsMk8WkUhE46WqdMb79hXphkcf5YfGF7icwUrMi7PikBKkowfKxJ3xCqdd2ful


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid036eQymCbqipSS5RywFquVzqV8d5givbiLwkSC7SqWkKx8DFSVNZN9XPaMzrknLSynl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid02LpSTbuPczSTmNAjAmqVFpNYs9txg8JdG8Uu1uy5c6bN5RPvxnQtqRM3YoZtCMF8Nl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid0SMQh8mTQu9zmCB1R92hKVsFpR3YAirDtNQqva5GYyUH6PKWXUckME7hHHb6o3Prkl
No comments found for post: pfbid0SMQh8mTQu9zmCB1R92hKVsFpR3YAirDtNQqva5GYyUH6PKWXUckME7hHHb6o3Prkl
pfbid0hzHoNxR1vTCmuAMMSs8TZGYPU9BK8LBDmhfvAfoDtQE4Rec1U7iTwGDzMMUmJGFKl
No comments found for post: pfbid0hzHoNxR1vTCmuAMMSs8TZGYPU9BK8LBDmhfvAfoDtQE4Rec1U7iTwGDzMMUmJGFKl
pfbid02YE9KT6yBpZojfMzGXqfxKiDzeS95WyD14z5Dfp3jjkTS2kNpTbMiUAZw9KhULTAzl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid02PHBEvnq4NSJcEoKFqFV7kzaKGgUzLJrGuKut3ScaK3vW5XzGVpBpiLkbzyptcvXYl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid0jfpCnhkf7fVX9Pqh3A4SDQVuHEeWm7YRojCojgQjVkcjwAK2Bjijef2aja9jXBfkl
No comments found for post: pfbid0jfpCnhkf7fVX9Pqh3A4SDQVuHEeWm7YRojCojgQjVkcjwAK2Bjijef2aja9jXBfkl
pfbid02A88Jhz8qh8Qbjps9rPv1j7BHMgYd8AwRzdFkzNBe5umhRdtZde6kH2Uah1S5bRj1l


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid02ck2hR6pk41nhiK6nsmQ6tdFP2Xk2YSwimDYQa8VRJfchwuXN3ESR5u7yTp5TZCqql
No comments found for post: pfbid02ck2hR6pk41nhiK6nsmQ6tdFP2Xk2YSwimDYQa8VRJfchwuXN3ESR5u7yTp5TZCqql
pfbid02vptCSHfAGrZbANQRF24hb8nRPJzJPrHysgdMbAwvqQxP5T3KK3kdz2yzgoJ1G74Rl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid02wcCtJPsdYgz6gnXW327wydYDrirB2EcWwTZPgPGh4GLmZBoRVSKyaPboGTv88d2fl


  0%|          | 0/800 [00:00<?, ?it/s]

pfbid027VfksptMrAQQcFNz5p2DBCTjSoeEHoJANi8on2s7V6X5dFsPLgNV3AHjxtUTN2wsl
No comments found for post: pfbid027VfksptMrAQQcFNz5p2DBCTjSoeEHoJANi8on2s7V6X5dFsPLgNV3AHjxtUTN2wsl


In [None]:
all_comments_df= pd.concat(all_post_data, ignore_index=True)
comments_ds = all_comments_df[['comment_text', 'comment_time', 'Title']]
comments_ds.to_csv("facebook_comments.csv")

## Request to Continue Scraping or Change Page

Dear Team,

To continue scraping data from Facebook posts, you have two options:

### Option 1: Continue from Line 25

You can resume the scraping process from line 25 of the existing code snippet. This allows you to continue scraping data from the remaining posts in the provided list.

### Option 2: Change the Page

Alternatively, you can change the Facebook page being scraped. Simply replace the `series` DataFrame with another DataFrame containing the posts of a different Facebook page. Then, execute the code from the beginning to scrape data from the new page.

Please choose the option that best fits your requirements.

Best regards,
Soufiane
