<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 3.2.2
# *Mining Social Media on Reddit*

## The Reddit API and the PRAW Package

The Reddit API is rich and complex, with many endpoints (https://www.reddit.com/dev/api/). It includes methods for navigating its collections, which include various kinds of media as well as comments. Fortunately, the Python library PRAW reduces much of this complexity.

Reddit requires developers to create and authenticate an app before they can use the API, but the process is much less onerous than some, and does not have waiting period for approval of new developers.

### 1. Create a Reddit App

Go to https://www.reddit.com/prefs/apps and click "create an app".

Enter the following in the form:

- a name for your app
- select "script" radio button
- a description
- a redirect URI

(Nb. For pulling data into a data science experiment, a local port can be used for the Redirect URI; try http://127.0.0.1:1410)


- click "create app"
- from the form that displays, copy the following to a local text file (or to this notebook):

  - name (the name you gave to your app)
  - redirect URI
  - personal use script (this is your OAuth 2 Client ID)
  - secret (this is your OAuth 2 Secret)

### 2. Register for API Access

- follow the link at https://www.reddit.com/wiki/api and read the terms of use for Reddit API access
- fill in the form fields at the bottom
  - make sure to enter your new OAuth Client ID where indicated
  - your use case could be something like "Training in API usage for data science projects"
  - your platform could be something like "Jupyter Notebooks / Python"
  
- click "SUBMIT"

- when asked for User-Agent, enter something that fits this pattern:
  `your_os-python:your_reddit_appname:v1.0 (by /u/your_reddit_username)`

### 3. Load Python Libraries

In [1]:
!pip install praw



In [2]:
import praw
import requests
import json
import pprint
from datetime import datetime, date, time

### 4. Authenticate from your Python script

You could assign your authentication details explicitly, as follows:

In [5]:
my_user_agent = 'developer'   # your user Agent string goes in here
my_client_id = 'SN-KMQiTffWanMfYyXBLxg'   # your Client ID string goes in here
my_client_secret = 'aINRJRW2hfNMuFqg8lRoH-2f2RqJjA'   # your Secret string goes in here

A better way would be to store these details externally, so they are not displayed in the notebook:

- create a file called "auth_reddit.json" in your "notebooks" directory, and save your credentials there in JSON format:

`{   "my_client_id": "your Client ID string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;` "my_client_secret": "your Secret string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"my_user_agent": "your user Agent string goes in here"` <br>
`}`

Use the following code to load the credentials:  

In [7]:
pwd()  # make sure your working directory is where the file is

'/Users/nicoleperez/Desktop/IOD/MODULE 3'

In [9]:
path_auth = 'auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:

pp.pprint(auth)




{   'my_client_id': 'your Client ID string goes in here',
    'my_client_secret': 'your Secret string goes in here',
    'my_user_agent': 'your user Agent string goes in here'}


In [11]:
my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']

Security considerations:
- this method only keeps your credentials invisible as long as nobody else gets access to this notebook file
- if you wanted another user to have access to the executable notebook without divulging your credentials you should set up an OAuth 2.0 workflow to let them obtain and apply their own API tokens when using your app
- if you just want to share your analyses, you could use a separate script (which you don't share) to fetch the data and save it locally, then use a second notebook (with no API access) to load and analyse the locally stored data

### 5. Exploring the API

Here is how to connect to Reddit with read-only access:

In [13]:
reddit = praw.Reddit(client_id = my_client_id,
                     client_secret = my_client_secret,
                     user_agent = my_user_agent)

print('Read-only = ' + str(reddit.read_only))  # Output: True

Read-only = True


In the next cell, put the cursor after the '.' and hit the [tab] key to see the available members and methods in the response object:

In [29]:
reddit.comment

<bound method Reddit.comment of <praw.reddit.Reddit object at 0x1156ee240>>

In [31]:
subreddit_name = 'malaysia'
subreddit = reddit.subreddit(subreddit_name)

In [33]:
comments = []
for comment in subreddit.comments(limit=1000):
    comments.append(comment)

ResponseException: received 401 HTTP response

In [35]:
reddit.subreddit(subreddit_name)

Subreddit(display_name='malaysia')

Consult the PRAW and Reddit API documentation. Print a few of the response members below:

Content in Reddit is grouped by topics called "subreddits". Content, called "submissions", is fetched by calling the `subreddit` method of the connection object (which is our `reddit` variable) with an argument that matches an actual topic.

We also need to append a further method call to a "subinstance", such as one of the following:

- controversial
- gilded
- hot
- new
- rising
- top

One of the submission objects members is `title`. Fetch and print 10 submission titles from the 'learnpython' subreddit using one of the subinstances above:

In [None]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.title)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Ask Anything Monday - Weekly Thread
I deployed my first fully functional python script at work today
Disabling virus protection in VSCode to run a program
Beginner Python DrugWars/DopeWars remake - feedback please
matplotlib xaxis is squeeze together if massive data
Need help very new
Python CLI Live Demo?
Best way to parse code snippets from an Event Stream (OpenAI Completion)
Absolute beginner
Empty Variable Declaration.


Now retrieve 10 authors:

In [None]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.author)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



AutoModerator
micr0nix
MattDLD
Catsuponmydog
jaklee26
spookbush
DievelKnievel
Sound4Sound
PaitentHero
NitkarshC


Note that we obtained the titles and authors from separate API calls. Can we expect these to correspond to the same submissions? If not, how could we gurantee that they do?

In [21]:
submissions=reddit.subreddit('learnpython')

Why doesn't the next cell produce output?

In [23]:
for submission in submissions:
    print(submission.comments)

TypeError: 'Subreddit' object is not iterable

Print two comments associated with each of these submissions:

In [None]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    top_level_comments = list(submission.comments)
    all_comments = submission.comments.list()[:2]
    for comment in all_comments:
        print(comment.body)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



what's the opposite of if name == main? like, we include this code in a script if we want its behavior to be different when run as the main executable, as opposed to when we import it from another script

&#x200B;

but what if we want to explicitly specify a subset of behavior that ONLY exists when the script is imported? Is there such a dunder method to specify that?
Hey I wanna learn python with Automate the boring stuff, can I code in Visual Studio Code instead of MU? If so, do I need something else besides VSC?


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



>We have the script deployed to a linux vm and use a workflow management software to SSH into the vm and call a shell script that executes the python in the appropriate conda environment. The resulting csv is then returned back to the workflow management software where it is then inserted into our EDW for later use.

nice.

at some point when you will wanna update/maintain/revisit your code i'd also look into a docker solution
GGs o7


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Use a virtual machine when playing around with that kind of stuff. On Windows, you can just enable the Windows Sandbox feature to get a secure Windows VM running in no time, without needing third-party software, although you could also use Virtualbox or VMWare if you want to run it under, say, Debian.
just use a VM.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



This brought me back to 9th grade math on my TI-83+

The good ol' days.

* When the game starts, the TKinter window displays nothing, so I did not know how much cash I had. I just bought zero of something to refresh the inventory screen and then saw my $500

* If you enter nothing into the "Number of drugs" box, the Inventory window crashes and displays nothing. I accidentally turned off NumLock and then had no idea how many of whatever drug I had enough for. Didn't want that (likely totally benign) message that the dealer was angry because I did my math wrong!

If you intend to continue, I recommend you add a "if you type zero, you just get the max you can afford" type option. Unless you're trying to encourage multiplication practice, of course.

---

I'm not familiar enough with TKinter to comment on the code, sorry.
Looks good. I have notes on how the TI83 version was implemented.

The original game did not have events based on prices. The prices were always generated in it's range;

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Like this:

https://www.w3schools.com/python/python_conditions.asp


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



There are plenty of places allowing to run a python CLI Demo and interact with it.

try https://replit.com for example.
Brython came up for me first while I was looking around, but Replit came up in my search as well. I'm assuming that'd be the better choice since you mentioned it. I was trying to avoid making another account somewhere for this, but I will if it's the simplest option.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



If you're just looking to see what it's like, try watching some [Corey Schafer](https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7) videos.
The general consensus I see on the sub is that any beginner-focused book by Al Sweigart, especially "Automate the Boring Stuff with Python", is an awesome way to start learning Python.https://alsweigart.com/

I have not personally read his books. I focus mainly on projects I'm interested in and slowly but surely build my proficiency. Eventually I got into codewars, then Advent of Code, both of which drastically boosted my abilities. Still, I strongly believe that personal projects are the best way to learn, even if you never finish them.

Try to learn git along the way

For help on specific topics, courses, or crash course style articles on a topic/technique, check out https://realpython.com/
>	Will it be okay?

Does it work? I'm not actually sure Python has this syntax. Type hints aren't type declarations - values, not variab

Referring to the API documentation, explore the submissions object and print some interesting data:

#### Posting to Reddit

To be able to post to your Reddit account (i.e. contribute submissions), you need to connect to the API with read/write privilege. This requires an *authorised instance*, which is obtained by including your Reddit user name and password in the connection request:

In [25]:
reddit = praw.Reddit(client_id='my client id',
                     client_secret='my client secret',
                     user_agent='my user agent',
                     username='my username',
                     password='my password')
print(reddit.read_only)  # Output: False

False


You could hide these last two credentials by adding them to your JSON file and then reading all five values at once.

>
>


>
>




---



---



> > > > > > > > > © 2024 Institute of Data


---



---



