<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 3.2.3 
# *Mining Social Media on Reddit*

## The Reddit API and the PRAW Package

The Reddit API is rich and complex, with many endpoints (https://www.reddit.com/dev/api/). It includes methods for navigating its collections, which include various kinds of media as well as comments. Fortunately, the Python library PRAW reduces much of this complexity.

Reddit requires developers to create and authenticate an app before they can use the API, but the process is much less onerus than some, and does not have waiting period for approval of new developers (as of 18 August 2018).

### 1. Create a Reddit App

Go to https://www.reddit.com/prefs/apps and click "create an app".

Enter the following in the form:

- a name for your app
- select "script" radio button
- a description
- a redirect URI

(Nb. For pulling data into a data science experiment, a local port can be used for the Redirect URI; try http://127.0.0.1:1410)


- click "create app"
- from the form that displays, copy the following to a local text file (or to this notebook):

  - name (the name you gave to your app)
  - redirect URI
  - personal use script (this is your OAuth 2 Client ID)
  - secret (this is your OAuth 2 Secret)

### 2. Register for API Access

- follow the link at https://www.reddit.com/wiki/api and read the terms of use for Reddit API access 
- fill in the form fields at the bottom 
  - make sure to enter your new OAuth Client ID where indicated
  - your use case could be something like "Training in API usage for data science projects"
  - your platform could be something like "Jupyter Notebooks / Python"
  
- click "SUBMIT"
 
- when asked for User-Agent, enter something that fits this pattern:
  `your_os-python:your_reddit_appname:v1.0 (by /u/your_reddit_username)`

In [1]:
pip install praw

Collecting praw
  Using cached praw-7.3.0-py3-none-any.whl (165 kB)
Collecting prawcore<3,>=2.1
  Using cached prawcore-2.2.0-py3-none-any.whl (15 kB)
Collecting update-checker>=0.18
  Using cached update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Installing collected packages: update-checker, prawcore, praw
Successfully installed praw-7.3.0 prawcore-2.2.0 update-checker-0.18.0
Note: you may need to restart the kernel to use updated packages.


### 3. Load Python Libraries

In [2]:
import praw
import requests
import json
import pprint
from datetime import datetime, date, time

### 4. Authenticate from your Python script

You could assign your authentication details explicitly, as follows:

In [3]:
my_user_agent = 'Effendi_os-python: TutorialAPI:v1.0'   # your user Agent string goes in here
my_client_id = 'w0kcqhVim4T-gg'   # your Client ID string goes in here
my_client_secret = '87kKMRGnD6jZ9rF6gQDdvRk0JrnNJA'   # your Secret string goes in here

A better way would be to store these details externally, so they are not displayed in the notebook:

- create a file called "auth_reddit.json" in your "notebooks" directory, and save your credentials there in JSON format:

`{   "my_client_id": "your Client ID string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;` "my_client_secret": "your Secret string goes in here",` <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"my_user_agent": "your user Agent string goes in here"` <br>
`}`

Use the following code to load the credentials:  

In [4]:
pwd()  # make sure your working directory is where the file is

'C:\\Users\\65911\\Desktop\\module 1\\myLab'

In [5]:
path_auth = 'auth_reddit.json'
auth = json.loads(open(path_auth).read())
pp = pprint.PrettyPrinter(indent=4)
# For debugging only:
#pp.pprint(auth)

my_user_agent = auth['my_user_agent']
my_client_id = auth['my_client_id']
my_client_secret = auth['my_client_secret']

Security considerations: 
- this method only keeps your credentials invisible as long as nobody else gets access to this notebook file 
- if you wanted another user to have access to the executable notebook without divulging your credentials you should set up an OAuth 2.0 workflow to let them obtain and apply their own API tokens when using your app
- if you just want to share your analyses, you could use a separate script (which you don't share) to fetch the data and save it locally, then use a second notebook (with no API access) to load and analyse the locally stored data

### 5. Exploring the API

Here is how to connect to Reddit with read-only access:

In [6]:
reddit = praw.Reddit(client_id = my_client_id, 
                     client_secret = my_client_secret, 
                     user_agent = my_user_agent)

print('Read-only = ' + str(reddit.read_only))  # Output: True

Read-only = True


In the next cell, put the cursor after the '.' and hit the [tab] key to see the available members and methods in the response object:

In [7]:
reddit.info

<bound method Reddit.info of <praw.reddit.Reddit object at 0x0000026820420B80>>

Consult the PRAW and Reddit API documentation. Print a few of the response members below:

In [8]:
reddit.subreddit.create

<bound method SubredditHelper.create of <praw.models.helpers.SubredditHelper object at 0x000002682043FF10>>

Content in Reddit is grouped by topics called "subreddits". Content, called "submissions", is fetched by calling the `subreddit` method of the connection object (which is our `reddit` variable) with an argument that matches an actual topic. 

We also need to append a further method call to a "subinstance", such as one of the following:

- controversial
- gilded
- hot
- new
- rising
- top

One of the submission objects members is `title`. Fetch and print 10 submission titles from the 'learnpython' subreddit using one of the subinstances above:

In [9]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.title)

Ask Anything Monday - Weekly Thread
Best way to randomly assign an attribute to 10% of objects in a list?
[Req] Review my code for a simple GUI calculator application!
PyCharm console
help with my code
Could I use Python for my project?
Pipenv vs Poetry vs PDM vs Conda
Need help understanding this boolean logic: print(1=='1' ) results in false
Any tips for understanding "self" more?
How can I write a programme that will print the factorial of the digits of a number?


Now retrieve 10 authors:

In [10]:
for submission in reddit.subreddit('learnpython').hot(limit=10):
    print(submission.author)

AutoModerator
marienbad2
Traditional-Leg-1106
cosgus
0pium666
IMuhPEA
kautica0
wewnames
Izaya_Orihara170
GameDeveloper94


Note that we obtained the titles and authors from separate API calls. Can we expect these to correspond to the same submissions? If not, how could we gurantee that they do?

Submission

Why doesn't the next cell produce output?

In [13]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    print("Author: {} | Title: {}".format(submission.author, submission.title))

Author: AutoModerator | Title: Ask Anything Monday - Weekly Thread
Author: marienbad2 | Title: Best way to randomly assign an attribute to 10% of objects in a list?
Author: Traditional-Leg-1106 | Title: [Req] Review my code for a simple GUI calculator application!
Author: cosgus | Title: PyCharm console
Author: 0pium666 | Title: help with my code
Author: IMuhPEA | Title: Could I use Python for my project?
Author: kautica0 | Title: Pipenv vs Poetry vs PDM vs Conda
Author: wewnames | Title: Need help understanding this boolean logic: print(1=='1' ) results in false
Author: Izaya_Orihara170 | Title: Any tips for understanding "self" more?
Author: GameDeveloper94 | Title: How can I write a programme that will print the factorial of the digits of a number?


Print two comments associated with each of these submissions:

In [14]:
submissions = reddit.subreddit('learnpython').hot(limit=10)
for submission in submissions:
    top_level_comments = list(submission.comments)
    all_comments = submission.comments.list()[:2]
    for comment in all_comments:
        print(comment.body)

Hello! Like many others I am new to Python and coding - although I have some prior experience with HTML and css. I have a specific problem I want to solve, and as far as I can tell Python is the way to go. 

I am a frequent user of a car sharing app, which is awesome - but it’s lacking in showing summarized statistics. I would like to build a web app which lets a user of the car sharing app log in, and then see their rental history summarized (total/monthly number of rentals, total/monthly km traveled, total/monthly money spent in the app). 

I have done some research, and i have found resources on how to scrape websites and how to implement BankID login.

My questions are: 
1. Does this sound doable?
2. Should I first try to learn Python first through Udemy and other resources, before trying to write code?

Hoping for some insight, any pointers are appreciated!
I want to write a program which uses multiprocessing, the program will create 2 types of subprocesses, one will be used to ge

Expanding out to C++ or just general object-oriented resources may help here.

This goes a bit behind the scenes and might be just as confusing, but, your class functions are only created once. They only exist at 1 memory location in your executable. So if you had 2 classes, i.e

    class MyClass:
        def __init__(self):
            self.x = 1
    
    obj1 = MyClass()
    obj2 = MyClass()

2 separate objects, but the whole MyClass and its init function only exists one time. So if we then did this:

    class MyClass:
        def __init__(self):
            self.x = 1
        def UpdateX(self):
            self.x = 10
    
    obj1 = MyClass()
    obj2 = MyClass()
    obj1.UpdateX()
    print(obj1.x)
    print(obj2.x)

What do you expect to be printed? Hopefully `10` and `1`, as that's what you get. But the question is, how? Once again, the function `UpdateX` only exists once. So how can it know to change obj1's value and not touch obj2, or any other object? And if we call it agai

Referring to the API documentation, explore the submissions object and print some interesting data:

#### Posting to Reddit

To be able to post to your Reddit account (i.e. contribute submissions), you need to connect to the API with read/write privilege. This requires an *authorised instance*, which is obtained by including your Reddit user name and password in the connection request: 

In [15]:
reddit = praw.Reddit(client_id='my client id',
                     client_secret='my client secret',
                     user_agent='my user agent',
                     username='my username',
                     password='my password')
print(reddit.read_only)  # Output: False

False


You could hide these last two credentials by adding them to your JSON file and then reading all five values at once.

>
>


>
>




---



---



> > > > > > > > > © 2021 Institute of Data


---



---



