# Setup

Adapted from class notes and:

https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c

## Environmental Variables
Environmental variables are a way to set the configuration parameters of a program without needing to hardcode them in.  
This means that these parameters can be changed without needing to alter the source code itself. This makes it much easier to run other peoples code with your own parameters/credentials, or to make changes to the runtime environment of the program by just editing a file.

We will be using .env files to store our environment veriables, since this will work regardless of operating system

In [4]:
# Lets us load the environment variables from the .env file
#!pip install python-dotenv

In [5]:
from dotenv import load_dotenv

# Library that has a lot of operating system functions
from os import getenv

In [6]:
# Load from the .env
load_dotenv()

# Get the environmental variables
APP_NAME = getenv('APP_NAME')
APP_ID = getenv("APP_ID")
APP_SECRET = getenv("APP_SECRET")
USERNAME = getenv('REDDIT_USERNAME')
PASSWORD = getenv('PASSWORD')

In [7]:
# Check that is loads correctly
USERNAME

'Santos9090'

In [8]:
# Check that the variables all loaded
if APP_NAME and APP_ID and APP_SECRET and USERNAME and PASSWORD:
    print("Credentials loaded")
else:
    print("ERROR: Credentials not loaded!!")

Credentials loaded


In [9]:
# Library that let's us make http requests
import requests

# Sets up the authentication part of the request
auth = requests.auth.HTTPBasicAuth(f'{APP_ID}', f'{APP_SECRET}')

# Sets up the data we want to send: our login method (password), username, and password
data = {'grant_type': 'password',
        'username': f'{USERNAME}',
        'password': f'{PASSWORD}'}

# Sets up this requests header info, which gives reddit a brief description of our app
# This is the format requested by Reddit: os:app_name:version (by /u/username)
headers = {'User-Agent': f'windows:{APP_NAME}:v1.0 (by /u/{USERNAME})'}

# Makes the request to the access_token api endpoint, and saves the response in res
res = requests.post('https://www.reddit.com/api/v1/access_token',
                    auth=auth, data=data, headers=headers)

# Converts response to JSON and pull access_token value
TOKEN = res.json()['access_token']

# Adds authorisation to our headers dictionary
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}

# While the token is valid (~2 hours) we can just add headers=headers to our requests to prove authentication
# Making a test request. <Response [200]> means that it was a success!
requests.get('https://oauth.reddit.com/api/v1/me', headers=headers)

<Response [200]>

In [10]:
res = requests.get("https://oauth.reddit.com/r/ireland/comments/nyxlfl/is_there_really_a_shortage_of_skilled_tradesmen/",
                   headers=headers)

print(res.json())

[{'kind': 'Listing', 'data': {'after': None, 'dist': 1, 'modhash': None, 'geo_filter': '', 'children': [{'kind': 't3', 'data': {'approved_at_utc': None, 'subreddit': 'ireland', 'selftext': 'I’ve seen it mentioned a lot online.\nThat they can’t get enough people willing to go work construction.\nThat most people in the trades are getting on in years and there’s no one to replace them.\n\nI know a good number of guys who are taking up apprenticeships or just labouring.\nThat’s anecdotal, but it doesn’t seem like there’s a shortage, is there?', 'user_reports': [], 'saved': False, 'mod_reason_title': None, 'gilded': 0, 'clicked': False, 'title': 'Is there really a shortage of skilled tradesmen?', 'link_flair_richtext': [], 'subreddit_name_prefixed': 'r/ireland', 'hidden': False, 'pwls': 6, 'link_flair_css_class': None, 'downs': 0, 'thumbnail_height': None, 'top_awarded_type': None, 'parent_whitelist_status': 'all_ads', 'hide_score': False, 'name': 't3_nyxlfl', 'quarantine': False, 'link_fl

In [11]:
import json

res = requests.get("https://oauth.reddit.com/r/ireland/comments/nyxlfl/is_there_really_a_shortage_of_skilled_tradesmen/", headers=headers)

# Check if the request was successful
if res.status_code == 200:
    # Retrieve the JSON response
    response_json = res.json()

    # Save the JSON response to a file
    with open('reddit_response.json', 'w') as file:
        json.dump(response_json, file)
        print("JSON response saved to reddit_response.json")
else:
    print("Request was not successful. Status code:", res.status_code)

JSON response saved to reddit_response.json


In [14]:
res.json(['data']['children'][0]['data'])

TypeError: Response.json() takes 1 positional argument but 5 were given

Note that you can see what the html looks by putting it in a markdown cell  
This first one will parse the text:

In [2]:
print(res.json()['data']['children'][2]['data']['selftext_html'])

NameError: name 'res' is not defined

## Triyng tutorial from Geek Culture


(Available at: Simple Way To Extract Reddit Comments In Python> https://medium.com/geekculture/simple-way-to-extract-reddit-comments-in-python-c8cb2afe2fce)

In [12]:
#pip install psaw

In [13]:
# The code didnt work after many attempts with the error messages: 
        #JSONDecodeError: Expecting value: line 1 column 1 (char 0)

        #During handling of the above exception, another exception occurred:
       # >>  NameError: name 'JSONDecodeError' is not defined

#from psaw import PushshiftAPI

#try:
 #   api = PushshiftAPI()
    # Make your API request using psaw
#except JSONDecodeError as e:
 #   print(f"JSONDecodeError: {e}")

In [15]:
from psaw import PushshiftAPI
import json  #must import JSONDecodeError from here

try:
    api = PushshiftAPI()
    # Make your API request using psaw
except json.JSONDecodeError as e:
    print(f"JSONDecodeError: {e}")

JSONDecodeError: Expecting value: line 1 column 1 (char 0)


In [16]:
print(response.content)

NameError: name 'response' is not defined

## NOTE
**In the following example, I am limiting the results to 1 and then requesting more. This is purely a demonstration. DO NOT do this for real unless your limit is set to 100, as this would otherwise count as abuse of Reddit's API.**

In [17]:
# Tell the API to only return 1 result
params = {'limit': 1}

for i in range(3):
    res = requests.get(
        "https://oauth.reddit.com/r/python/new",
        headers=headers,
        params=params
    )
    
    post = res.json()['data']['children'][0]
    print(len(res.json()['data']['children']))
    print(post['data']['title'])
    # This is how the API identifies the post we just got
    fullname = f"{post['kind']}_{post['data']['id']}"
    
    # This tells the API that the post we want is the one after the current one in the queue (ie is the next oldest one).
    params['after'] = fullname

1
why does my describe command on MySQL table shows datatype as b'int', b'vchar(20) etc?
1
36 years old. Started an operator certification (1 years and a half) but looking to go into coding. Am I too old ?
1
Thursday Daily Thread: Python Careers, Courses, and Furthering Education!


## Sentimental Analysis (attempt following class tutorials)


In [18]:
import pandas as pd
import json

# Load the JSON file
with open('reddit_response.json', 'r') as file:
    data = json.load(file)

# Extract individual comments
comments = []
if 'data' in data and 'children' in data['data']:
    for comment in data['data']['children']:
        if comment['kind'] == 't3':  # Check if the item is a comment
            comments.append(comment['data']['body'])

# Create a DataFrame with the comments
df = pd.DataFrame({'comment': comments})

# Display the first five records
df.head()

Unnamed: 0,comment


In [19]:
import pandas as pd
import nltk
import json

import warnings
warnings.filterwarnings('ignore') # We can suppress the warnings

# Load the Tweets.csv file into an object object
comments=pd.read_json('reddit_response.json')

# Display the first five records
comments.head()

Unnamed: 0,kind,data
0,Listing,"{'after': None, 'dist': 1, 'modhash': None, 'g..."
1,Listing,"{'after': None, 'dist': None, 'modhash': None,..."


In [30]:
comments['data'][1]

{'after': None,
 'dist': None,
 'modhash': None,
 'geo_filter': '',
 'children': [{'kind': 't1',
   'data': {'subreddit_id': 't5_2qhb9',
    'approved_at_utc': None,
    'author_is_blocked': False,
    'comment_type': None,
    'awarders': [],
    'mod_reason_by': None,
    'banned_by': None,
    'author_flair_type': 'richtext',
    'total_awards_received': 0,
    'subreddit': 'ireland',
    'author_flair_template_id': '15d448ce-c030-11e9-be57-0ea31892be34',
    'likes': None,
    'replies': {'kind': 'Listing',
     'data': {'after': None,
      'dist': None,
      'modhash': None,
      'geo_filter': '',
      'children': [{'kind': 't1',
        'data': {'total_awards_received': 0,
         'approved_at_utc': None,
         'author_is_blocked': False,
         'comment_type': None,
         'awarders': [],
         'mod_reason_by': None,
         'banned_by': None,
         'ups': 11,
         'removal_reason': None,
         'link_id': 't3_nyxlfl',
         'author_flair_template_id'

In [33]:
comments['data'][1]['body']

KeyError: 'body'