<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 3.2.2
# *Mining Social Media on Reddit*

## The Reddit API and the PRAW Package

The Reddit API is rich and complex, with many endpoints (https://www.reddit.com/dev/api/). It includes methods for navigating its collections, which include various kinds of media as well as comments. Fortunately, the Python library PRAW reduces much of this complexity.

Reddit requires developers to create and authenticate an app before they can use the API, but the process is much less onerous than some, and does not have waiting period for approval of new developers.

### 1. Create a Reddit App

Go to https://www.reddit.com/prefs/apps and click "create an app".

Enter the following in the form:

- a name for your app
- select "script" radio button
- a description
- a redirect URI

(Nb. For pulling data into a data science experiment, a local port can be used for the Redirect URI; try http://127.0.0.1:1410)


- click "create app"
- from the form that displays, copy the following to a local text file (or to this notebook):

  - name (the name you gave to your app)
  - redirect URI
  - personal use script (this is your OAuth 2 Client ID)
  - secret (this is your OAuth 2 Secret)

### 2. Register for API Access

- follow the link at https://www.reddit.com/wiki/api and read the terms of use for Reddit API access
- fill in the form fields at the bottom
  - make sure to enter your new OAuth Client ID where indicated
  - your use case could be something like "Training in API usage for data science projects"
  - your platform could be something like "Jupyter Notebooks / Python"
  
- click "SUBMIT"

- when asked for User-Agent, enter something that fits this pattern:
  `your_os-python:your_reddit_appname:v1.0 (by /u/your_reddit_username)`

### 3. Load Python Libraries

In [3]:
!pip install praw



In [4]:
import praw
import requests
import json
import pprint
from datetime import datetime, date, time

### 4. Authenticate from your Python script

You could assign your authentication details explicitly, as follows:

In [5]:
my_user_agent = 'windows-python:Ohene:v1.0 (by/u/Latter_Outcome_504)'   # your user Agent string goes in here
my_client_id = 'S8Cj0KOTt3jZ0Auus1ElPQ'   # your Client ID string goes in here
my_client_secret = 'ibAxS4nDZOvB59Z3F8lVlHvJ6XHWwQ'   # your Secret string goes in here

A better way would be to store these details externally, so they are not displayed in the notebook:

- create a file called "auth_reddit.json" in your "notebooks" directory, and save your credentials there in JSON format:

`{   "my_client_id": "your Client ID string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;` "my_client_secret": "your Secret string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"my_user_agent": "your user Agent string goes in here"` <br>
`}`

Use the following code to load the credentials:  

In [7]:
pwd()  # make sure your working directory is where the file is

'C:\\Users\\ohene\\OneDrive\\Documents\\new_repo\\Module 3 New\\Labs 3-20240709T100044Z-001\\Labs 3'

In [10]:
import json

auth_details = {
    
    "my_client_id": "S8Cj0KOTt3jZ0Auus1ElPQ",
    "my_client_secret": "ibAxS4nDZOvB59Z3F8lVlHvJ6XHWwQ",
    "my_user_agent": "windows-python:Ohene:v1.0 (by/u/Latter_Outcome_504)",
    "username": "Latter_Outcome_504",
    "password": "Delarain123"
}

path_auth = 'auth_reddit.json'
with open(path_auth, 'w') as file:
    json.dump(auth_details, file, indent=4)




In [11]:
path_auth = 'auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:

pp.pprint(auth)



{   'my_client_id': 'S8Cj0KOTt3jZ0Auus1ElPQ',
    'my_client_secret': 'ibAxS4nDZOvB59Z3F8lVlHvJ6XHWwQ',
    'my_user_agent': 'windows-python:Ohene:v1.0 (by/u/Latter_Outcome_504)',
    'password': 'Delarain123',
    'username': 'Latter_Outcome_504'}


In [14]:
my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']

Security considerations:
- this method only keeps your credentials invisible as long as nobody else gets access to this notebook file
- if you wanted another user to have access to the executable notebook without divulging your credentials you should set up an OAuth 2.0 workflow to let them obtain and apply their own API tokens when using your app
- if you just want to share your analyses, you could use a separate script (which you don't share) to fetch the data and save it locally, then use a second notebook (with no API access) to load and analyse the locally stored data

### 5. Exploring the API

Here is how to connect to Reddit with read-only access:

In [18]:
reddit = praw.Reddit(client_id = my_client_id,
                     client_secret = my_client_secret,
                     user_agent = my_user_agent)

print('Read-only = ' + str(reddit.read_only))  # Output: True

Read-only = True


In the next cell, put the cursor after the '.' and hit the [tab] key to see the available members and methods in the response object:

In [None]:
reddit.

In [21]:
import json

# Define the credentials with correct keys
credentials = {
    "my_client_id": "S8Cj0KOTt3jZ0Auus1ElPQ",
    "my_client_secret": "ibAxS4nDZOvB59Z3F8lVlHvJ6XHWwQ",
    "my_user_agent": "windows-python:Ohene:v1.0 (by/u/Latter_Outcome_504)"

    
}

    

# Path to save the JSON file
file_path = 'auth_reddit.json'

# Write the credentials to a JSON file
with open(file_path, 'w') as file:
    json.dump(credentials, file, indent=4)

print(f"JSON file created at {file_path}")


JSON file created at auth_reddit.json


In [22]:
subreddit_name = 'malaysia'
subreddit = reddit.subreddit(subreddit_name)

In [23]:
comments = []
for comment in subreddit.comments(limit=1000):
    comments.append(comment)

In [None]:
reddit.subreddit(subreddit_name)

Subreddit(display_name='malaysia')

Consult the PRAW and Reddit API documentation. Print a few of the response members below:

In [24]:
for comment in subreddit.comments(limit=10):  # Limiting to 10 comments 
        comments.append(comment)
    
# Print details of the fetched comments
for comment in comments:
        print(f"Comment ID: {comment.id}")
        print(f"Comment Body: {comment.body}")
        print(f"Comment Author: {comment.author}")
        print(f"Comment Score: {comment.score}")
        print(f"Comment Created: {comment.created_utc}")
        print("-----------")


Comment ID: lfxei8a
Comment Body: Malaysia should send their troops to Palestine to support Hamas.
Comment Author: Western-Ebb-5880
Comment Score: 1
Comment Created: 1722486036.0
-----------
Comment ID: lfxegy8
Comment Body: It doesn't matter whether it's Hamas or the Pope. Conducting an airstrike in the capital of a country you're not officially at war with should not be tolerated. It's stupid and escalatory. Giving it a pass because it was done by the "good guys" is not a good reason.
Comment Author: wctree
Comment Score: 1
Comment Created: 1722486017.0
-----------
Comment ID: lfxegui
Comment Body: ![gif](giphy|YmQLj2KxaNz58g7Ofg)
Comment Author: thomsen9669
Comment Score: 1
Comment Created: 1722486016.0
-----------
Comment ID: lfxebb1
Comment Body: memang!
Comment Author: bucgene
Comment Score: 1
Comment Created: 1722485938.0
-----------
Comment ID: lfxe387
Comment Body: Saw an Indonesia news about Indonesian gold medalist will be rewarded prize worth USD 780K
Comment Author: Streps

Content in Reddit is grouped by topics called "subreddits". Content, called "submissions", is fetched by calling the `subreddit` method of the connection object (which is our `reddit` variable) with an argument that matches an actual topic.

We also need to append a further method call to a "subinstance", such as one of the following:

- controversial
- gilded
- hot
- new
- rising
- top

One of the submission objects members is `title`. Fetch and print 10 submission titles from the 'learnpython' subreddit using one of the subinstances above:

In [25]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.title)

Ask Anything Monday - Weekly Thread
Learn python the hard way-OOP
Return an internal list from a class - in an immutable way?
How to align text centered on a character?
How do I extract specific text from a PDF using rectangles similar to openCV?
Rehashing relative imports another time
having issues sending a packet to moonraker
Webbrowser module with Brave
Multithreading with MicroPython Reading 
Looking for a remote python tutor that can guide on projects


Now retrieve 10 authors:

In [26]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.author)

AutoModerator
Silent_Orange_9174
pachura3
SSJLach
ClaimPatient97
educational_escapism
nathan22211
Remarkable-Map-2747
bigdipper125
Financial_Buy5560


Note that we obtained the titles and authors from separate API calls. Can we expect these to correspond to the same submissions? If not, how could we gurantee that they do?

In [27]:
submissions=reddit.subreddit('learnpython')

Why doesn't the next cell produce output?

In [28]:
for submission in submissions:
    print(submission.comments)

TypeError: 'Subreddit' object is not iterable

Because it is intended to iterate over the list of submissions and print the commentsassociated with each submission.

Print two comments associated with each of these submissions:

In [29]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    top_level_comments = list(submission.comments)
    all_comments = submission.comments.list()[:2]
    for comment in all_comments:
        print(comment.body)

Dear python enthusiasts,

I am looking for an IT-"type" tool for automating python scripts. I have the following requirements

* It should be able to start/stop and restart python scripts
* Have levels where one script is started after another
* Show which scripts are alive
* Separated into frontend and backend, where Backend MUST be written in python doing the above three points
* MIT or BSD licensed (also pip installable is great!), so that I can add my own features

Frontend should do the following:

* See in a dashboard all controlled scripts (by this tool) and whether a script is running or not (I can also make one only if the backend is available)
* See also - in what all computers this tool has been installed

I could make my own, but that is not the point of this inquiry. I would like to save my work if any great tools already exists, especially the python backend, where people have thought about it even better.

Any suggestions welcome, thank you upfront.
I have been learning 

Referring to the API documentation, explore the submissions object and print some interesting data:

In [36]:
import json
import praw
from datetime import datetime

# Load credentials from JSON file
path_auth = 'auth_reddit.json'
with open(path_auth, 'r') as file:
    auth = json.load(file)

# Authenticate with Reddit using credentials from JSON file
reddit = praw.Reddit(
    client_id=auth['my_client_id'],
    client_secret=auth['my_client_secret'],
    user_agent=auth['my_user_agent'])

# Fetch submissions from a subreddit
subreddit_name = 'learnpython'
subreddit = reddit.subreddit(subreddit_name)
submissions = subreddit.hot(limit=10)  # Adjust the limit as needed

# Iterate over submissions and print interesting data
try:
    for submission in submissions:
        print(f"Title: {submission.title}")
        print(f"Author: {submission.author}")
        print(f"Score: {submission.score}")
        print(f"Upvote Ratio: {submission.upvote_ratio}")
        print(f"Number of Comments: {submission.num_comments}")
        print(f"URL: {submission.url}")
        print(f"Subreddit: {submission.subreddit}")
        print(f"Created: {datetime.utcfromtimestamp(submission.created_utc).strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"Permalink: {submission.permalink}")
        print(f"Flair: {submission.link_flair_text}")
        print(f"Is NSFW: {submission.over_18}")
        print(f"Is Spoiler: {submission.spoiler}")
        print(f"Is Stickied: {submission.stickied}")
        print(f"Selftext (first 200 chars): {submission.selftext[:200]}...")
        print("-----------")
except praw.exceptions.APIException as e:
    print(f"API Exception: {e}")
except praw.exceptions.ClientException as e:
    print(f"Client Exception: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


Title: Ask Anything Monday - Weekly Thread
Author: AutoModerator
Score: 3
Upvote Ratio: 0.68
Number of Comments: 21
URL: https://www.reddit.com/r/learnpython/comments/1eellsb/ask_anything_monday_weekly_thread/
Subreddit: learnpython
Created: 2024-07-29 00:00:29
Permalink: /r/learnpython/comments/1eellsb/ask_anything_monday_weekly_thread/
Flair: None
Is NSFW: False
Is Spoiler: False
Is Stickied: True
Selftext (first 200 chars): Welcome to another /r/learnPython weekly "Ask Anything\* Monday" thread

Here you can ask all the questions that you wanted to ask but didn't feel like making a new thread.

\* It's primarily intended...
-----------
Title: Learn python the hard way-OOP
Author: Silent_Orange_9174
Score: 28
Upvote Ratio: 0.86
Number of Comments: 17
URL: https://www.reddit.com/r/learnpython/comments/1egso93/learn_python_the_hard_wayoop/
Subreddit: learnpython
Created: 2024-07-31 17:14:32
Permalink: /r/learnpython/comments/1egso93/learn_python_the_hard_wayoop/
Flair: None
Is NSFW: Fa

#### Posting to Reddit

To be able to post to your Reddit account (i.e. contribute submissions), you need to connect to the API with read/write privilege. This requires an *authorised instance*, which is obtained by including your Reddit user name and password in the connection request:

In [33]:
reddit = praw.Reddit(client_id='my client id',
                     client_secret='my client secret',
                     user_agent='my user agent',
                     username='my username',
                     password='my password')
print(reddit.read_only)  # Output: False

False


You could hide these last two credentials by adding them to your JSON file and then reading all five values at once.

>
>


>
>




---



---



> > > > > > > > > © 2024 Institute of Data


---



---



