# Introduction and problem statement

Welcome to Notebook 1 of four total notebooks for this project. Over these four notebooks we gather data (Notebook 1), clean and analyze data (Notebook 2), develop models (Notebook 3) and develop a Streamlit app (Notebook 4). The work here is done in order to address the problem statement below. In this notebook we develop a function to pull posts from the pushshift.io Reddit api in the subreddit categoires r/OCD and r/Anxiety. We merge and export the dataframe for use in future notebooks.

## Problem statement

Can we leverage anonymized, free-text posts made in an open, safe environment from individuals discussing obsessive compulsive or anxiety disorders in order to build a first-step (non-medical) diagnostic tool?


# 1. Pulling from the API into DataFrames

### Imports

In [9]:
import pandas as pd
import requests
import time
import datetime as dt

### Function to pull from APIs

In [10]:
#Worked with Andy on this. Params use from youtube video supplied by GA. Datetime code converter came from https://psaw.readthedocs.io/en/latest/. 

def get_1200(subreddit):
    list_open = []
    for i in range(1, 13):
        after = int(dt.datetime(2019, i, 1).timestamp())
    
        
        url = 'https://api.pushshift.io/reddit/search/submission'

        params = {
            'subreddit' : subreddit,
            'size' : 100,
            'after' : after
            }

        res = requests.get(url, params)
        res.status_code
        data = res.json()
        list_open.extend(data['data'])
        time.sleep(2)
        
        df = pd.DataFrame(list_open)
        df_clean = df[['author_fullname', 'selftext', 'title', 'subreddit', 'score', 'id', 'over_18']]
    return df_clean

### Calling the function on subreddits of interest and concat

In [11]:
ocd_df = get_1200('OCD')

In [12]:
ocd_df

Unnamed: 0,author_fullname,selftext,title,subreddit,score,id,over_18
0,t2_1k53nux0,If a person triggers your ocd and causes you t...,Person triggers me,OCD,1,abfhc9,False
1,t2_1k53nux0,,The fastest way to get rid of intrusive though...,OCD,1,abfoxb,False
2,t2_2nvk1ovs,I need some thoughts. I have done years of CB...,HELP! Struggling with ocd and trust worsened b...,OCD,1,abfp45,False
3,t2_27lwv0dp,Lately I've been dealing with POCD and since I...,Ive started to ignore my friends,OCD,1,abfy99,False
4,t2_2esc7u18,Hi! I’m Kevin and I’m 22 years Old. I want to ...,How do I improve my life in 2019 with my Harm ...,OCD,1,abg3uu,False
...,...,...,...,...,...,...,...
1195,t2_sz99mos,,The combo of meds I’m taking is helping my sym...,OCD,1,e4wmrw,False
1196,t2_4mvdta8j,We have a day long meeting in my team so we we...,Coming in Later than Usual to Work Damn Near K...,OCD,1,e4wtzv,False
1197,t2_54nqyy1q,This is inspired by another [recent post](http...,Does anyone else say thoughts out loud?,OCD,1,e4x91n,False
1198,t2_3dhluw6t,wanted to play FGO (a mobile game) cause a fri...,30 minutes panic attack and still ongoing woohooo,OCD,1,e4xwqi,False


In [13]:
anx_df = get_1200('Anxiety')

In [14]:
anx_df.head(20)

Unnamed: 0,author_fullname,selftext,title,subreddit,score,id,over_18
0,t2_23ppmo72,If I would look at me I would not understand h...,I hate myself for allowing myself to be thick ...,Anxiety,1,abf5dn,False
1,t2_q7kfjlz,In celebration of 2018 and all the progress I ...,I'm unraveling.,Anxiety,1,abfbnn,False
2,t2_qqcsl,"As the title of this post says, today I've had...",New Year's Celebration is one of my triggers,Anxiety,1,abfbzd,False
3,t2_2uso8sgu,"Looking back on 2018, I realized this last yea...",Happy New Year! I'm so thankful for you guys.,Anxiety,1,abffr7,False
4,t2_1namrmmo,Some people came over to my house today for Ne...,New Years ruined.,Anxiety,1,abfgdb,False
5,t2_cmlat3g,,Tonight I had sex for the first time in 9 year...,Anxiety,1,abfgpv,False
6,t2_2irnokny,[removed],So anxiety hit me like a ton of fucking bricks...,Anxiety,1,abfhjg,False
7,t2_gv7qx,[removed],New Years party did not go well,Anxiety,1,abfi2w,False
8,t2_b4ivy,I can get really bad anxiety when it comes fro...,"Im grateful for the boss I have, and I'm stres...",Anxiety,1,abfk4p,False
9,t2_qwcmi4j,Happy new year everyone. We made it another ye...,Need some advice,Anxiety,1,abfk94,False


In [15]:
## Concatenating the data
df = pd.concat([ocd_df, anx_df], axis = 0)

In [16]:
df.to_csv('./data/big_df.csv', index=False)