### Introduction

##### Omdena Mental Health Challenge [link](https://www.omdena.com/chapter-challenges/leveraging-llms-to-understand-global-mental-health-well-being-fomo-in-social-media?fbclid=IwAR2Bvq_8BVfGsR9MLev9MZZZtcmHutd_ARp-ag1H4jg5rZsx6X3K3tHAPEU)

- Data Collection Task
    - we are trying to train chatbot on what casual conversation looks like and what conversation about mental health problems looks like
    - this is part of mental health training data (part of different subreddits)

### Required Data Structure

- Json format
- with the following structure

---

[A Reference Tutorial for more details](https://medium.com/analytics-vidhya/scraping-reddit-using-python-reddit-api-wrapper-praw-5c275e34a8f4)

**API limits**
- https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki

- We enforce rate limits for those eligible for free access usage of our Data API. The limit is:   

- 100 queries per minute (QPM) per OAuth client id
- QPM limits will be an average over a time window (currently 10 minutes) to support bursting requests.

### Imports

In [58]:
import pandas as pd
import praw
from praw.models import MoreComments
import json,pprint 

In [187]:
secret = "****"
client_id = "****"
user = "****"

In [3]:
reddit = praw.Reddit(
                    client_id=client_id,
                    client_secret=secret,
                    user_agent=user,
                )
print(reddit.read_only)

True


### Test on a specific url post
- https://www.reddit.com/r/CasualConversation/comments/1b4x3pk/kind_of_bored_looking_for_a_chill_conversation/

In [185]:
url = r"https://www.reddit.com/r/CasualConversation/comments/1b4x3pk/kind_of_bored_looking_for_a_chill_conversation/"

In [174]:
#----- Reply data structure
class RedditReply:
    '''
    reply structure
      {
          "comment_id": "ijk789",
          "author": "user789",
          "timestamp": "2024-02-29T12:10:00",
          "body": "Example reply body text."
      }
    '''
    def __init__(self, reply_object):
        self.id = reply_object.id
        if reply_object.author != None:
            self.author = reply_object.author.name
        else:
            self.author = None
        self.timestamp = reply_object.created_utc
        self.body = reply_object.body
        
    def to_dict(self):
        return {'comment_id':self.id, 
                'author':self.author,
                'timestamp':self.timestamp, 
                'body':self.body}

    
#----- Comments data structure
class RedditComment:
    '''
    comment structure
      {
          "comment_id": "uvw345",
          "author": "user345",
          "timestamp": "2024-02-29T12:20:00",
          "body": "Another comment body text.",
          "replies": []
      }
    '''
    def __init__(self, comment_object, MAX_REPLIES=100):
        self.MAX_REPLIES = MAX_REPLIES

        self.id = comment_object.id
        
        if comment_object.author != None:
            self.author = comment_object.author.name
        else:
            self.author = None
            
        self.timestamp = comment_object.created_utc
        self.body = comment_object.body
        self.replies = []
        self.add_replies(comment_object.replies)
        
    def add_replies(self, comment_replies):
        for reply in comment_replies[:self.MAX_REPLIES]:
            if isinstance(reply, MoreComments):
                continue
            reply = RedditReply(reply)
            
            self.replies.append(reply.to_dict())
            
    def to_dict(self):
        return {'comment_id':self.id, 
                'author':self.author,
                'timestamp':self.timestamp, 
                'body':self.body, 
                "replies":self.replies}


#----- Post data structure
class RedditPost:
    '''
    We will deal with top level comments/replies
    Limit to 100 comment/reply, until we define suitable criteria 
    
    Post structure
    {  
        "post_id": "abc123",
        "title": "Example Post Title",
        "author": "user123",
        "timestamp": "2024-02-29T12:00:00",
        "body": "Example post body text.",
        "Score" : 0,
        "Total comments", "comments":[{},{},..] 
    }
    '''
    def __init__(self, post_object, MAX_COMMENTS = 100, MAX_REPLIES = 100):
        self.MAX_COMMENTS = MAX_COMMENTS
        self.MAX_REPLIES = MAX_REPLIES
        
        self.id = post_object.id
        self.title = post_object.title

        if post_object.author != None:
            self.author = post_object.author.name
        else:
            self.author = None
            
        self.timestamp = post_object.created_utc
        self.body = post_object.selftext
        self.score = post_object.ups
        self.downs = post_object.downs
        self.total_comments = post_object.num_comments
        
        self.comments = []
        self.add_comments(post_object.comments)
        
    def add_comments(self, comments):
        
        for comment in comments[:self.MAX_COMMENTS]:
            # ignoring more comments
            if isinstance(comment, MoreComments):
                continue
            
            comment = RedditComment(comment, self.MAX_REPLIES)
            
            self.comments.append(comment.to_dict())
    def to_dict(self):
        
        return {'post_id':self.id, 
                'title':self.title, 
                'author':self.author,
                'timestamp':self.timestamp, 
                'body':self.body, 
                'score':self.score,
                'downs':self.downs, 
                'total_comments':self.total_comments, 
                'comments':self.comments}

In [175]:
post = RedditPost(reddit.submission(url=url))

In [176]:
pprint.pprint(post.to_dict(),sort_dicts=False)

{'post_id': '1b4x3pk',
 'title': 'Kind of Bored, looking for a chill conversation 😁',
 'author': 'Routine-Disk3773',
 'timestamp': 1709409366.0,
 'body': 'hello everyone!\n'
         'I am Luka, 25 Years old and currently living in Germany 😇\n'
         '\n'
         'Life is rather boring at the moment and I really find it hard to '
         "connect with people so i'm looking for a friend to pass my time "
         'with!\n'
         '\n'
         'My interests are DJing (House and Techno mostly but also Hip Hop), I '
         'loove Skiing and going to the gym as well as just being lazy and '
         "watching anime all day long! Lately i've been growing interest in "
         'cooking as well, so if you have any epic recipes, hit me up!\n'
         '\n'
         'I have a grey thicc cat and I love her with all my heart❤️ (Let me '
         "know if you wanna see her, you won't regret the cuteness)\n"
         '\n'
         'If you are interested in chatting for a bit, let me know 

In [164]:
print(json.dumps(post.to_dict(),indent=4))

{
    "id": "1b4x3pk",
    "title": "Kind of Bored, looking for a chill conversation \ud83d\ude01",
    "author": "Routine-Disk3773",
    "timestamp": 1709409366.0,
    "body": "hello everyone!\nI am Luka, 25 Years old and currently living in Germany \ud83d\ude07\n\nLife is rather boring at the moment and I really find it hard to connect with people so i'm looking for a friend to pass my time with!\n\nMy interests are DJing (House and Techno mostly but also Hip Hop), I loove Skiing and going to the gym as well as just being lazy and watching anime all day long! Lately i've been growing interest in cooking as well, so if you have any epic recipes, hit me up!\n\nI have a grey thicc cat and I love her with all my heart\u2764\ufe0f (Let me know if you wanna see her, you won't regret the cuteness)\n\nIf you are interested in chatting for a bit, let me know \ud83d\ude07",
    "score": 3,
    "downs": 0,
    "total_comments": 10,
    "comments": [
        {
            "id": "kt1pqd2",
      

---

### Working with a subreddit

In [186]:
# working on subreddit Casual Conversation
subreddit_name = 'CasualConversation'

subreddit = reddit.subreddit(subreddit_name)

print("Display Name:", subreddit.display_name)
print("Title:", subreddit.title)
print("Description:", subreddit.description[:100], "...")

Display Name: CasualConversation
Title: The friendlier part of Reddit.
Description: >[IRC Chat](https://kiwiirc.com/nextclient/#irc://irc.snoonet.org:+6697/#casualconversation) | [Twit ...


In [181]:
posts = []

for rawpost in subreddit.top(limit=2):
    post = RedditPost(rawpost, MAX_COMMENTS=10, MAX_REPLIES=10)
    posts.append(post.to_dict())

In [184]:
print(json.dumps(posts,indent=2))

[
  {
    "post_id": "fle7gb",
    "title": "This coronavirus things has made me realize people would be a lot happier and explore their passions and interests if they didn\u2019t have to work so much.",
    "author": "ghost_sanctum",
    "timestamp": 1584639819.0,
    "body": "Just live streams of people at home getting in touch with their instruments again, walking their dogs, etc.\n\nPeople actually seemed to be hanging out with each other, even if it\u2019s just playing my heart will go on or blowing a spitfire ball from the balconies.\n\nI get some people are mad or concerned because of cut hours , or just plain laid off , but if they , and everyone , had the means to the basics of just surviving and being alive it makes me yearn for that society where everyone is happy.\n\nPre- coronavirus I realized a lot of us work out butts off for very little reward.\n\nEdit: hey thanks for all the awards guys\n\nEdit 2: I get that things are stressful on the other hand with it being a pandem

### Next steps
- we can continue on with more posts for training our model