# 01 ‚Äî Data Collection: The Sims 4
**Goal:** Pull player discussions from Reddit and EA Answers HQ for sentiment & topic analysis.  
**Outputs:** 
- `data/raw/reddit_sims4_posts.csv` (and optionally `..._comments.csv`)
- `data/raw/ea_forum_threads.csv`
**Provenance:** Collected with PRAW (Reddit API) and requests/BeautifulSoup (forums).

In [1]:
import os
from pathlib import Path
import pandas as pd

# Project paths
DATA_RAW = Path("../data/raw")
DATA_RAW.mkdir(parents=True, exist_ok=True)

# Display options (handy in notebooks)
pd.set_option("display.max_colwidth", 200)
pd.set_option("display.max_rows", 50)

In [2]:
from dotenv import load_dotenv
load_dotenv()

RID  = os.getenv("REDDIT_ID")
RSEC = os.getenv("REDDIT_SECRET")
RUA  = os.getenv("REDDIT_USER_AGENT")

assert all([RID, RSEC, RUA]), "Missing one or more Reddit creds. Check your .env!"

In [3]:
import sys
from pathlib import Path

# This notebook lives in: <project_root>/notebooks/
ROOT = Path("..").resolve()   # <-- parent of notebooks = project root
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

print("CWD:", Path.cwd())
print("On sys.path?", str(ROOT) in sys.path)

CWD: /Users/baderrezek/Desktop/Projects/Personal/sims4-sentiment-analysis/notebooks
On sys.path? True


In [4]:
import sys
sys.path.append("../src")

from src.collect_data import collect_reddit_posts, collect_comments_for_posts

In [5]:
df_sims4_posts = collect_reddit_posts(RID, RSEC, RUA, subreddit_name="Sims4", limit=750, time_filter="year")
df_thesims_posts = collect_reddit_posts(RID, RSEC, RUA, subreddit_name="thesims", limit=750, time_filter="year")

In [6]:
sample_ids = df_thesims_posts["id"].head(250).tolist()
df_thesims_comments = collect_comments_for_posts(RID, RSEC, RUA, sample_ids)

In [7]:
sample_ids = df_sims4_posts["id"].head(250).tolist()
df_sims4_comments = collect_comments_for_posts(RID, RSEC, RUA, sample_ids)

In [8]:
print("Sims4 subreddit:")
len(df_sims4_comments), display(df_sims4_comments.head(5))
len(df_sims4_posts), display(df_sims4_posts.head(5))

print("\n\nTheSims subreddit:")
len(df_thesims_posts), display(df_thesims_posts.head(5))
len(df_thesims_comments), display(df_thesims_comments.head(5))

Sims4 subreddit:


Unnamed: 0,post_id,comment_id,created_utc,author,body,score,parent_permalink
0,1f9ncnr,llmvyhz,1725547000.0,Fr0d0TheFr0g,Oh man.....this would be so cool as the travel screen instead of the circles we currently have,3014,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/
1,1f9ncnr,llmqptr,1725545000.0,strangest_sea,"This has been my baby for months and I'm so happy to have finished it! The image is huge, and there are tons of little details so I hope you enjoy exploring ‚ú®\n\nUpdate: ""Windenburg"" has been fixe...",3629,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/
2,1f9ncnr,llmyg7s,1725548000.0,solunarmeetcute,This is great! From now on I will consult this map while planning my sims' trips to other worlds (for realism purposes).,986,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/
3,1f9ncnr,lln5uto,1725550000.0,SerenityBlackwood,"This makes it even more impressive that werewolves can use the Moonwood Mill tunnels to get to Forgotten Hollow. \n\nThis is a really great map, and manages to actually incorporate all of the shap...",231,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/
4,1f9ncnr,lln442n,1725550000.0,PhenomenalPhoenix,Was Magnolia Promenade forgotten on purpose? lol\n\nAlso I was expecting Willow Creek and Newcrest to be right next to each other but having Copperdale between them seems really fitting and isn‚Äôt ...,313,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/


Unnamed: 0,id,created_utc,created_date,author,title,body,score,num_comments,permalink,subreddit,mode
0,1f9ncnr,1725545000.0,2024-09-05 14:04:53,strangest_sea,I made a Sims 4 World Map! üåè,,20813,730,https://reddit.com/r/Sims4/comments/1f9ncnr/i_made_a_sims_4_world_map/,Sims4,top
1,1fx4a6k,1728174000.0,2024-10-06 00:21:55,smoretank,My sim's wife recently passed away. First night as a ghost I find them cuddling. üò≠,Bawling my eyes out with my mom who I showed this too. Lost my dad 2yrs ago and we joked he haunts mom. Damn I wish this was real so my mom could hold dad again.,16149,198,https://reddit.com/r/Sims4/comments/1fx4a6k/my_sims_wife_recently_passed_away_first_night_as/,Sims4,top
2,1evwu4m,1724057000.0,2024-08-19 08:43:12,BigAssClapper,10 years of this game and I still think this icon is for the office,,15991,472,https://reddit.com/r/Sims4/comments/1evwu4m/10_years_of_this_game_and_i_still_think_this_icon/,Sims4,top
3,1gfm3jn,1730294000.0,2024-10-30 13:08:13,fluffhq,I made a grafting cheatsheet üåø,,15955,340,https://reddit.com/r/Sims4/comments/1gfm3jn/i_made_a_grafting_cheatsheet/,Sims4,top
4,1fpijoe,1727307000.0,2024-09-25 23:33:03,BrockoBell,"Pro tip, don‚Äôt let your children use the slip ‚Äòn slide in the winter.",,15579,398,https://reddit.com/r/Sims4/comments/1fpijoe/pro_tip_dont_let_your_children_use_the_slip_n/,Sims4,top




TheSims subreddit:


Unnamed: 0,id,created_utc,created_date,author,title,body,score,num_comments,permalink,subreddit,mode
0,1hbjs4w,1733886000.0,2024-12-11 02:57:43,Sweetheart213119,My sim's infant has a goatee,Why does my Sims infant look like he has a goatee üòÜ ü§£ üòÇ,28291,620,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/,thesims,top
1,1hm4uom,1735148000.0,2024-12-25 17:25:54,princessfluffybutt96,The Actual Truth,I have to walk around for a bit after.,25341,201,https://reddit.com/r/thesims/comments/1hm4uom/the_actual_truth/,thesims,top
2,1h1ban9,1732732000.0,2024-11-27 18:32:40,Ju7genesis,Has anyone ever mentioned how they straight up whitewashed Travis Scott?,,23365,275,https://reddit.com/r/thesims/comments/1h1ban9/has_anyone_ever_mentioned_how_they_straight_up/,thesims,top
3,1g3n31m,1728931000.0,2024-10-14 18:34:20,i2tiny,made me laugh lol,,17749,120,https://reddit.com/r/thesims/comments/1g3n31m/made_me_laugh_lol/,thesims,top
4,1ggr5l4,1730415000.0,2024-10-31 22:43:02,snorecrux,Sul sul! My Bella Goth costume this year. It went over most people's heads but the real ones knew.,,16950,167,https://reddit.com/r/thesims/comments/1ggr5l4/sul_sul_my_bella_goth_costume_this_year_it_went/,thesims,top


Unnamed: 0,post_id,comment_id,created_utc,author,body,score,parent_permalink
0,1hbjs4w,m1gvbgz,1733887000.0,gimmeyourbadinage,The second photo with dad in the corner üòÇ,2639,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/
1,1hbjs4w,m1gwncb,1733887000.0,KrisKat38,https://preview.redd.it/oegub3xn056e1.jpeg?width=686&format=pjpg&auto=webp&s=d9f4e6a809966aaab60566176a4474c8aba44058,1873,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/
2,1hbjs4w,m1guz0i,1733886000.0,JayandMeeka,I'm sorry but that's hilarious omg,4891,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/
3,1hbjs4w,m1guzzy,1733886000.0,endlesslatte,"he was born to sing in a boy band, that‚Äôs why",355,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/
4,1hbjs4w,m1gx764,1733887000.0,Massive_Amphibian_91,I wonder if you have CC that is categorized as a skin feature but it‚Äôs actually facial hair‚Ä¶.,119,https://reddit.com/r/thesims/comments/1hbjs4w/my_sims_infant_has_a_goatee/


(32427, None)

In [9]:
from pathlib import Path
import sqlite3

DB_DIR = Path("../data/raw")
DB_DIR.mkdir(parents=True, exist_ok=True)
DB_PATH = DB_DIR / "sims4.db"

def get_conn(db_path=DB_PATH):
    conn = sqlite3.connect(db_path)
    # enforce FK constraints
    conn.execute("PRAGMA foreign_keys = ON;")
    return conn

In [None]:
from src.collect_data import ensure_posts_schema, ensure_comments_schema, insert_posts, insert_comments

In [None]:
dfs_posts = [df_sims4_posts, df_thesims_posts]
dfs_comments = [df_sims4_comments, df_thesims_comments]

df_posts_norm = ensure_posts_schema(posts_all)
df_comments_norm = ensure_comments_schema(comments_all, df_posts_norm)

df_posts_norm = df_posts_norm.drop_duplicates(subset=["post_id"])
df_comments_norm = df_comments_norm.drop_duplicates(subset=["comment_id"])

with get_conn() as conn:
    insert_posts(conn, df_posts_norm)
    insert_comments(conn, df_comments_norm)

print("Done inserting concatenated posts & comments.")