## Problem Statement

Data from the Department of Statistics show that annulments, marriage dissolutions, and Divorce rates in Singapore remain at an all-time high. In 2020 alone, 6,959 married couples filed for Divorce.

Amongst multiple factors, communication with each other is top 4 reason for divorce. [https://blackbox.com.sg/everyone/honey-you-can-have-him-rising-divorce-in-singapore]

How can we provide a low-barrier tool for people to better understand themselves and to be more mindful of their attachment styles?


In [1]:
# Import Libraries

import praw
import pandas as pd
import gensim.utils

In [2]:
# API authentication. actual values redacted due to privacy.

reddit_read_only = praw.Reddit(
    client_id="client_id",
    client_secret="client_secret",
    user_agent="user_id,)



In [3]:
# Selecting the 2 subreddits, scrapping the top posts (up till 1,000)

subreddit_name_A = "AvoidantAttachment"
subreddit_name_B = "AnxiousAttachment"

subreddit_A = reddit_read_only.subreddit(subreddit_name_A)
subreddit_B = reddit_read_only.subreddit(subreddit_name_B)

posts_A_top = subreddit_A.top(time_filter = "all", limit = None) 
posts_B_top = subreddit_B.top(time_filter = "all", limit = None)


In [4]:
# Initialize sets to store unique post IDs
post_ids_A = set()
post_ids_B = set()

# Retrieve and process posts from subreddit_A
posts_A = []

# Convert the generators to lists
top_posts_A = list(subreddit_A.top(time_filter="all", limit=None))
new_posts_A = list(subreddit_A.new(limit=None))

# Merge the lists and remove duplicates based on post IDs
merged_posts_A = top_posts_A + new_posts_A
for post in merged_posts_A:
    if post.id not in post_ids_A:
        posts_A.append(post)
        post_ids_A.add(post.id)

# Retrieve and process posts from subreddit_B
posts_B = []

# Convert the generators to lists
top_posts_B = list(subreddit_B.top(time_filter="all", limit=None))
new_posts_B = list(subreddit_B.new(limit=None))

# Merge the lists and remove duplicates based on post IDs
merged_posts_B = top_posts_B + new_posts_B
for post in merged_posts_B:
    if post.id not in post_ids_B:
        posts_B.append(post)
        post_ids_B.add(post.id)


In [5]:
# Fitting the scrapping results into a dictionary

posts_dictA = {"Title": [], "Post Text": [],
              "ID": [], "Score": [],
              "Total Comments": [], "Post URL": []
              }
 
for post in posts_A:
    # Title of each post
    posts_dictA["Title"].append(post.title)
     
    # Text inside a post
    posts_dictA["Post Text"].append(post.selftext)
     
    # Unique ID of each post
    posts_dictA["ID"].append(post.id)
     
    # The score of a post
    posts_dictA["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dictA["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dictA["Post URL"].append(post.url)

posts_dictB = {"Title": [], "Post Text": [],
              "ID": [], "Score": [],
              "Total Comments": [], "Post URL": []
              }
 
for post in posts_B:
    # Title of each post
    posts_dictB["Title"].append(post.title)
     
    # Text inside a post
    posts_dictB["Post Text"].append(post.selftext)
     
    # Unique ID of each post
    posts_dictB["ID"].append(post.id)
     
    # The score of a post
    posts_dictB["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dictB["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dictB["Post URL"].append(post.url)

In [6]:
# converting scrapped posts from dictionary into a Dataframe, and assigning a class based on the subreddit.

df_A = pd.DataFrame(posts_dictA)
df_B = pd.DataFrame(posts_dictB)

df_A["subreddit"] = subreddit_name_A
df_B["subreddit"] = subreddit_name_B

df_A["class"] = 0
df_B["class"] = 1

df = pd.concat([df_A, df_B], axis = 0)

In [7]:
# output the dataframe for the next section.

df.to_csv("../data/df.csv")