# Toxic Femininity
## Web Scraping Script for incels.is Forum Posts

This script extracts the titles, replies, and views from the incels.is forum posts under a specific category. It navigates through all pages of the category until there are no more pages left. The extracted data is then saved to an Excel file.

### Ethics
The script is designed to be efficient and lightweight, executing in minimal time. It's crafted to ensure it doesn't place any significant load on the target server, aligning with best practices for ethical web scraping.

In [19]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

base_url = "https://incels.is"
forum_url = "/forums/inceldom-discussion.2/?prefix_id=23"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

title_slugs = []
title_links = []
reply_counts = []
view_counts = []
contents = []

while forum_url:
    response = requests.get(base_url + forum_url, headers=headers)
    if response.status_code != 200:
        print(f"Failed to retrieve the webpage: {base_url + forum_url}")
        break

    soup = BeautifulSoup(response.content, 'html.parser')
    title_elements = soup.find_all('a', {'data-xf-init': 'preview-tooltip'})
    data_elements = soup.find_all('dd')

    for title in title_elements:
        preview_url = title['data-preview-url']
        title_slug = preview_url.split('/')[-2]
        title_slugs.append(title_slug)

        # Extract link and fetch content
        post_link = base_url + title['href']
        title_links.append(post_link)
        post_response = requests.get(post_link, headers=headers)
        post_soup = BeautifulSoup(post_response.content, 'html.parser')
        content = post_soup.find('div', class_='bbWrapper').get_text(strip=True, separator="\n")
        contents.append(content)

    for i in range(0, len(data_elements), 2):
        reply_counts.append(data_elements[i].text.strip())
        view_counts.append(data_elements[i+1].text.strip())

    # Check for the "Next" button and retrieve the next page's URL
    next_button = soup.find('a', {'class': 'pageNav-jump pageNav-jump--next'})
    forum_url = next_button['href'] if next_button else None

# Convert lists to DataFrame
df = pd.DataFrame({
    'Title': title_slugs,
    'Link': title_links,
    'Replies': reply_counts,
    'Views': view_counts,
    'Content': contents
})


df['Title'] = df['Title'].str.replace('-', ' ')
df['Title'] = df['Title'].str.split('.').str[0]

# Save DataFrame to Excel
df.to_excel("titles_links_replies_views_contents.xlsx", index=False)

print("Saved titles, links, replies, views, and contents to titles_links_replies_views_contents.xlsx")


Saved titles, links, replies, views, and contents to titles_links_replies_views_contents.xlsx


In [22]:
df

Unnamed: 0,Title,Link,Replies,Views,Content
0,girlfriend of murdered activist ryan carson ra...,https://incels.is/threads/girlfriend-of-murder...,18,389,You can't make this shit up. I had to re-read ...
1,smv mogs whole male population combined,https://incels.is/threads/smv-mogs-whole-male-...,28,786,Your browser is not able to display this video.
2,list of privileges women have,https://incels.is/threads/list-of-privileges-w...,45,2K,I write this thread to summarize all the privi...
3,fundamental problems of real life females,https://incels.is/threads/fundamental-problems...,3,236,"Over time, they all get older than 25\nOnly a ..."
4,foids wear makeup to compete with other women,https://incels.is/threads/foids-wear-makeup-to...,7,196,It doesn't matter how much men say women don't...
...,...,...,...,...,...
1834,why are e girls so sociopathic,https://incels.is/threads/why-are-e-girls-so-s...,19,1K,this is bianca the e-girl who got btfo by a no...
1835,foids should learn loyalty from dogs,https://incels.is/threads/foids-should-learn-l...,8,855,"Foids are nothing but leeches, take, take and ..."
1836,fakeup makes women look like shit,https://incels.is/threads/fakeup-makes-women-l...,7,1K,Ever notice when a good who uses a ton of fake...
1837,females are lesser parents than males,https://incels.is/threads/females-are-lesser-p...,8,1K,"Hello there, first post.\nBut I would to state..."


In [23]:
print(df.loc[0, 'Title'])

girlfriend of murdered activist ryan carson raises money for herself


In [24]:
print(df.loc[0, 'Content'])

You can't make this shit up. I had to re-read it several times.
Remember this guy? He got stabbed to death while his girlfriend stands there nonchalantly.
She set up a GoFundMe. For his family, you assume? Nope. She set it up for her and her friends so they can "take time off work".
Ryan Thoresen Carson's friends raise $68k so they can 'properly mourn'
A GoFundMe set up by the Brooklyn social justice activist's friends has raked in over $68,000 in response to his stabbing death on Monday, which they say is to help them take time off work.
www.dailymail.co.uk
Ryan Thoresen Carson, organized by Tammie Marie David
Hi everyone. We are a collective of Ryan's close friends, reeling from a b… Tammie Marie David needs your support for Ryan Thoresen Carson
www.gofundme.com
"According to the fundraiser, the page was set up by Carson's girlfriend Claudia Morales, who was with Carson when he was stabbed in the heart, and a friend named Tammie Marie David."
From the GoFundMe: "Immediate needs are t