## Reddit API
- In the following document we will try to extract reddit post:
    - As a first approach we will collect data marked by keyboards relatedt to adhd found in related subreddits
    - To improve our model we can modify our model to identify by it's own (using old data for example) the subreddits that could be intereesting 
    to scrap.


    General composition of a post:
    -  id: The post’s ID
    -  title: The post’s title
    -  text: The post’s text
    -  author: The post’s author
    -  created_utc: The post’s creation time in UTC
    -  score: The post’s score
    -  num_comments: The number of comments on the post
    -  permalink: The post’s permalink

## Connections


In [None]:
import os
from dotenv import load_dotenv
import praw

#### Connection to the Reddit APi

In [None]:

# Load environment variables from .env file
load_dotenv()

try:
    reddit = praw.Reddit(
        client_id=os.getenv("REDDIT_CLIENT_ID"),
        client_secret=os.getenv("REDDIT_CLIENT_SECRET"),
        user_agent=os.getenv("REDDIT_USER_AGENT"),
        username=os.getenv("REDDIT_USERNAME"),
        password=os.getenv("REDDIT_PASSWORD")
    )
    print(f"Connected! Logged in as: {reddit.user.me()}")
except Exception as e:
    print("An error occurred:", e)

#### Connect to Mongdb

In [3]:
from pymongo import MongoClient

# Load environment variables from .env file
load_dotenv()
mongo='127.0.0.1'

try:
    # Connect to MongoDB
    myclient = MongoClient(
                        "mongodb://"+mongo+":27017/",  
                        username='admin',
                        password='admin') #Mongo URI format
    db=myclient['reddit']
    print("Connected to MongoDB successfully!")
except Exception as e:
    print("An error occurred while connecting to MongoDB:", e)

Connected to MongoDB successfully!


 #### Connect to reddis
 

In [21]:
import redis

r = redis.Redis(host='127.0.0.1', port=6379, db=0)

## Scrap Data Using Research on reddit

#### Scrapp usging the search bar in reddit (search using the keywords):
- Define the keywords for research 
["adhd", "diagnose","energy", "brain", "test", "distracted", "forgetful", "doctor","work","task","disord","struggl","focu","dysfunct","forgot","lazi","prescrib","medic","medicin","pill","self diagnosis","self medication"]
- Different sorting technics ["relevance", "hot", "top", "new", "comments"]


In [None]:
Querykeywords=["adhd", "diagnose","energy", "brain", "test", "distracted", "forgetful", "doctor"
                  ,"work","task","disord","struggl","focu","dysfunct","forgot","lazi","prescrib","medic","medicin","pill","self diagnosis","self medication"]
sortingTechniques=["relevance", "hot", "top", "new", "comments"]


In [None]:
import pandas as pd
# Subreddit to target
subreddit_name = "ADHD"
subreddit = reddit.subreddit(subreddit_name)
posts = []

Querykeywords=["self diagnosis","self medication"]

for keyword in Querykeywords:
    for sorting in sortingTechniques:
        print("Searching for keyword:", keyword, "using sorting technique:", sorting)
        for post in subreddit.search(query=keyword,sort=sorting,syntax='cloudsearch',time_filter='all',limit=10000):# 'hot', 'new', or 'top' post    
            datecreated=get_utc_time(post.created_utc)
            year=datecreated.year
            if(year>2019) and (r.sadd('reddit_posts', post.id)):
                posts.append({
                "id":post.id,
                "title": post.title,
                "author": str(post.author),
                "score": post.score,
                "num_comments": post.num_comments,
                "upvote_ratio": post.upvote_ratio,
                "url": post.url,
                "subreddit": post.subreddit.display_name,
                "created_at": post.created_utc,
                "self_text": post.selftext, 
                "searchQuery":keyword,  
            })
        
db.reddit_posts.insert_many(posts)
          
            
    
    



Searching for keyword: self diagnosis using sorting technique: relevance


  utc_time = datetime.datetime.utcfromtimestamp(timestamp)


Searching for keyword: self diagnosis using sorting technique: hot
Searching for keyword: self diagnosis using sorting technique: top
Searching for keyword: self diagnosis using sorting technique: new
Searching for keyword: self diagnosis using sorting technique: comments
Searching for keyword: self medication using sorting technique: relevance
Searching for keyword: self medication using sorting technique: hot
Searching for keyword: self medication using sorting technique: top
Searching for keyword: self medication using sorting technique: new
Searching for keyword: self medication using sorting technique: comments


InsertManyResult([ObjectId('674df915845d7125c2d2c32f'), ObjectId('674df915845d7125c2d2c330'), ObjectId('674df915845d7125c2d2c331'), ObjectId('674df915845d7125c2d2c332'), ObjectId('674df915845d7125c2d2c333'), ObjectId('674df915845d7125c2d2c334'), ObjectId('674df915845d7125c2d2c335'), ObjectId('674df915845d7125c2d2c336'), ObjectId('674df915845d7125c2d2c337'), ObjectId('674df915845d7125c2d2c338'), ObjectId('674df915845d7125c2d2c339'), ObjectId('674df915845d7125c2d2c33a'), ObjectId('674df915845d7125c2d2c33b'), ObjectId('674df915845d7125c2d2c33c'), ObjectId('674df915845d7125c2d2c33d'), ObjectId('674df915845d7125c2d2c33e'), ObjectId('674df915845d7125c2d2c33f'), ObjectId('674df915845d7125c2d2c340'), ObjectId('674df915845d7125c2d2c341'), ObjectId('674df915845d7125c2d2c342'), ObjectId('674df915845d7125c2d2c343'), ObjectId('674df915845d7125c2d2c344'), ObjectId('674df915845d7125c2d2c345'), ObjectId('674df915845d7125c2d2c346'), ObjectId('674df915845d7125c2d2c347'), ObjectId('674df915845d7125c2d2c3