# How to Scrape Data from Reddit (SEEDS Discussion Forum) <span style= "color:red">-Dr. Ebenezer Larnyo</span>

## 1. The following code installs the necessary packages for our project.

In [1]:
## install the following packages
#Altair is a Python library designed for statistical visualization
!pip install altair
# Praw: Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's API.
#!pip install praw
!pip install --upgrade praw









In [2]:
## Importing following libraries: 
import numpy as np
import pandas as pd
import altair as alt
import re
# For webscraping
import praw

In [3]:
# disable row limit for plotting
alt.data_transformers.disable_max_rows()
# uncomment to ensure graphics display with pdf export
alt.renderers.enable('mimetype')

RendererRegistry.enable('mimetype')

# 2. Connect to Reddit API

The next code connects you to the Reddit API with the appropriate credentials.
### <span style= "color:red"> (These credentials contain private/sensitive information and should not be shared outside of this project without explicit permission and approval from me through my email: elarnyo@ucsb.edu).</span>

In [4]:
reddit = praw.Reddit(client_id='------------',
                     client_secret='----',
                     user_agent='my_reddit_scraper:v1.0 (by /u/autismreddit)',
                     username='-----',
                     password='---------')

## 3. Define the Web Scrape Function
These functions will webscrape the chosen subreddit, <b>using the given keywords.</b>

In [5]:
def search_posts(subreddit, keywords, limit=1000):
    results = []
    for submission in subreddit.new(limit=limit):
        for keyword in keywords:
            if re.search(r'\b' + keyword.lower() + r'\b', submission.title.lower()) or \
                    re.search(r'\b' + keyword.lower() + r'\b', submission.selftext.lower()):
                results.append(submission)
                break
    return results

The code below assigns the correct subreddit. Here, we are webscraping through the subreddit <b> "r/autism" </b> (however, notice how you don't include the "r/" portion in the code). You can choose to run the code as is, or if you're curious to go through another subreddit 

In [6]:
# To alter code: subreddit = reddit.subreddit('[Insert subreddit title]')
subreddit = reddit.subreddit('autism')

# 4. Keywords¶
Now, you'll be adding your own keywords. Here is a list of keywords that have already been used. You may use the bold texts below as an outline (possible topics to expand on, as I did below). Read the outline below to get an idea of possible keywords.

Feel free to be creative, while still targeting the context of this project. Scroll below to create your own list of keywords in the next step...

### Example of Keywords:
#### Symptoms:
* sensory issues
* repetitive behaviors
* social difficulties
* communication challenges
* meltdowns
* stimming

#### Care/Treatment:
* behavioral therapy
* occupational therapy
* speech therapy
* social skills training
* early intervention
* ABA (Applied Behavior Analysis)
* TEACCH
* sensory integration therapy

#### Drugs/Interventions:
* Risperidone
* Aripiprazole
* Methylphenidate
* Guanfacine
* SSRI
* antipsychotics
* ADHD medications
* melatonin

#### Epidemiology:
* prevalence
* incidence
* risk factors
* gender differences
* heritability
* environmental factors
* geographic variations

#### Stigma:
* discrimination
* stereotypes
* public awareness
* misconceptions
* social isolation
* acceptance

#### Diagnosis:
* DSM-5
* Autism Diagnostic Observation Schedule (ADOS)
* Autism Diagnostic Interview-Revised (ADI-R)
* Childhood Autism Rating Scale (CARS)
* Asperger's syndrome
* PDD-NOS

#### Assistive Technologies (smart):
* AAC (Augmentative and Alternative Communication)
* speech-generating devices
* visual schedules
* smart devices
* apps for autism
* wearable technology

#### Burden of ASD:
* economic burden
* societal cost
* quality of life
* educational challenges
* employment challenges
* mental health issues

#### Caregivers of ASD:
* parental stress
* support groups
* coping strategies
* respite care
* sibling support
* family therapy

#### Natural Cure:
* dietary interventions
* herbal supplements
* homeopathy
* acupuncture
* meditation
* yoga

#### COVID and ASD:
* impact of lockdown
* teletherapy
* remote learning
* mental health during pandemic
* access to services
* social distancing challenges

#### Now, time to insert your own keywords below in the code.
I've provdided an example of how to appropriately do so below. Follow the code format of the example, and fill in your list of keywords. Provide a typed outline of possible keywords as I did above (this is for the purpose of our own clean records and organization). Feel free to edit this current markdown cell to provide this list. Below this cell, you can code in your keywords.

#### Your List of New Keywords (Type Below):
* -
* -
* -
* - .. etc.

#### Code format to follow below:
This reflects the previous keywords as an example. Follow this format in the cell below, but with your own keywords.

<span style= "color:red"> <b>keywords</b> </span> <span style= "color:blue">= [ 'sensory issues', 'repetitive behaviors', 'social difficulties', 'communication challenges', 'meltdowns', 'stimming', 'behavioral therapy', 'occupational therapy', 'speech therapy', 'social skills training', 'early intervention', 'ABA', 'TEACCH', 'sensory integration therapy', 'Risperidone', 'Aripiprazole', 'Methylphenidate', 'Guanfacine', 'SSRI', 'antipsychotics', 'ADHD medications', 'melatonin', 'prevalence', 'incidence', 'risk factors', 'gender differences', 'heritability', 'environmental factors', 'geographic variations', 'discrimination', 'stereotypes', 'public awareness', 'misconceptions', 'social isolation', 'acceptance', 'DSM-5', 'Autism Diagnostic Observation Schedule', 'ADOS', 'Autism Diagnostic Interview-Revised', 'ADI-R', 'Childhood Autism Rating Scale', 'CARS', "Asperger's syndrome", 'PDD-NOS', 'AAC', 'speech-generating devices', 'visual schedules', 'smart devices', 'apps for autism', 'wearable technology', 'economic burden', 'societal cost', 'quality of life', 'educational challenges', 'employment challenges', 'mental health issues', 'parental stress', 'support groups', 'coping strategies', 'respite care', 'sibling support', 'family therapy', 'dietary interventions', 'herbal supplements', 'homeopathy', 'acupuncture', 'meditation', 'yoga', 'impact of lockdown', 'teletherapy', 'remote learning', 'mental health during pandemic', 'access to services', 'social distancing challenges' ] </span>

In [7]:
## Paste your keywords: 
keywords = ['inequality','underdiagnosis','misdiagnosis','myths','education','caregiver','housing','income','sensory issues', 'repetitive behaviors', 'social difficulties', 'communication challenges', 'meltdowns', 'stimming', 'behavioral therapy', 'occupational therapy', 'speech therapy', 'social skills training', 'early intervention', 'ABA', 'TEACCH', 'sensory integration therapy', 'Risperidone', 'Aripiprazole', 'Methylphenidate', 'Guanfacine', 'SSRI', 'antipsychotics', 'ADHD medications', 'melatonin', 'prevalence', 'incidence', 'risk factors', 'gender differences', 'heritability', 'environmental factors', 'geographic variations', 'discrimination', 'stereotypes', 'public awareness', 'misconceptions', 'social isolation', 'acceptance', 'DSM-5', 'Autism Diagnostic Observation Schedule', 'ADOS', 'Autism Diagnostic Interview-Revised', 'ADI-R', 'Childhood Autism Rating Scale', 'CARS', "Asperger's syndrome", 'PDD-NOS', 'AAC', 'speech-generating devices', 'visual schedules', 'smart devices', 'apps for autism', 'wearable technology', 'economic burden', 'societal cost', 'quality of life', 'educational challenges', 'employment challenges', 'mental health issues', 'parental stress', 'support groups', 'coping strategies', 'respite care', 'sibling support', 'family therapy', 'dietary interventions', 'herbal supplements', 'homeopathy', 'acupuncture', 'meditation', 'yoga', 'impact of lockdown', 'teletherapy', 'remote learning', 'mental health during pandemic', 'access to services', 'social distancing challenges','African American', 'Black','African-American', 'African Americans with ASD','Neurodivergent']

The code below will show the various posts from the subreddit

In [8]:
# Search for posts in the specified subreddit with the given keywords.
posts = search_posts(subreddit, keywords, limit=10000)
# Iterate over the posts and print their title, selftext, and a separator line.
for post in posts:
    print(f"{post.title}\n{post.selftext}\n{'-'*80}")

Maybe a dumb question, but is stimming supposed to be innate?
Am I supposed to have started stimming and do it as a habit or is it a choice thing? It feels nice to rock back and forth but I don't feel compelled to.
--------------------------------------------------------------------------------
I just learned that recess was difficult for many of us
I(33m) had a really difficult time with my anxiety at school. It often revolved around feeling confused about how to fit in and how to be perceived as someone who my peers wanted to play with.  I always needed playground rules explained to me and needed instructions for simple games like lava, ice tag, etc. School was brutal to say the least.

But one thing that really sticks with me is the periods of serious sadness and anxiety which turns out were autistic meltdowns mixed with OCD. I felt so incredibly alone. I had friends but often felt judged or weird and didn't know what was wrong with me. This went on from kindergarten to 12th grade. 

# 5. Save the data to CSV¶

In [9]:
# Import the csv module.
import csv

# Open the CSV file for writing.
with open('SEEDS_reddit_scrape.csv', mode='w', newline='', encoding='utf-8') as f:

    # Create a CSV writer object.
    writer = csv.writer(f)

    # Write the header row.
    writer.writerow(['title', 'selftext', 'url', 'author', 'timestamp', 'num_comments', 'upvotes'])

    # Iterate over the posts and write each one to the CSV file.
    for post in posts:
        writer.writerow([post.title, post.selftext, post.url, post.author, post.created_utc, post.num_comments, post.ups])

# Download File and Send
Download the csv manually and email your csv file AND Jupyter notebook to <b><u>elarnyo@ucsb.edu.</b></u>

How do you manually download csv?
On your left control center, you should see your csv (after running the above code). If you right click the csv, you'll be able to click on the 'download' option.

How do you download your Jupyter Notebook file?
On the bar at the very top, click on file and download
#### <span style="color:red">Add the CSV and Jupyter notebook to the Git repo you created at the last discussion forum </span>

## Great job scrapping data from Reddit! 
## Next time, we'll clean the data. Please feel free to practice the data cleaning!!!