# Using PRAW to Create Networks

This notebook contains examples for using web-based APIs (Application Programmer Interfaces) to download data from social media platforms.

This notebook focuses specifically on _Reddit_.

We will use this API to create a network from the Reddit data, focusing on who replied to whom.

In [None]:
%matplotlib inline

import json

<hr>

Import `NetworkX` package

In [None]:
# First we need to make sure we have networkx installed...
!pip install networkx

import networkx as nx

<hr>

## Reddit API

Reddit's API used to be the easiest to use since it did not require credentials to access data on its subreddit pages.
Unfortunately, this process has been changed, and developers now need to create a Reddit application on Reddit's app page located here: (https://www.reddit.com/prefs/apps/).

In [None]:
# First we need to make sure we have praw installed...
!pip install praw

# For our first piece of code, we need to import the package 
# that connects to Reddit. Praw is a thin wrapper around reddit's 
# web APIs and works well

import praw

### Creating a Reddit Application
Go to https://www.reddit.com/prefs/apps/.
Scroll down to "create application", select "web app", and provide a name, description, and URL (which can be anything).

After you press "create app", you will be redirected to a new page with information about your application. Copy the unique identifiers below "web app" and beside "secret". These are your client_id and client_secret values, which you need below.

In [None]:
# Now we specify a "unique" user agent for our code
# This is primarily for identification, I think, and some
# user-agents of bad actors might be blocked
redditApi = praw.Reddit(client_id='xxx',
                        client_secret='xxx',
                        user_agent='is688_cbuntain_v01')

### Accessing Reddit Comments

While you're never supposed to read the comments, for certain live streams or new and rising posts, the comments may provide useful insight into events on the ground or people's sentiment.
New posts may not have comments yet though.

Comments are attached to the post title, so for a given submission, you can pull its comments directly.

Note Reddit returns pages of comments to prevent server overload, so you will not get all comments at once and will have to write code for getting more comments than the top ones returned at first.
This pagination is performed using the MoreXYZ objects (e.g., MoreComments or MorePosts).

In [None]:
def recursive_node_adder(g, comment, parent_author):
    '''Recursively process comments and add them to the graph'''
    
    # Check if we have the node already in our graph
    if comment.author not in g.nodes:
        g.add_node(comment.author)
        
    # Create an edge between this comment author and the
    #  parent author
    g.add_edge(comment.author, parent_author)

    # Iterate through the comments
    for reply in comment.replies.list():
        if isinstance(reply, praw.models.MoreComments):
            continue
            
        # Recursively process this reply
        recursive_node_adder(g, reply, comment.author)

### Create and Populate the Graph

In [None]:
# Create an undirected graph
g = nx.Graph()


subreddit = "worldnews"

breadthCommentCount = 10

targetSub = redditApi.subreddit(subreddit)

submissions = targetSub.hot(limit=20)

for post in submissions:
    print (post.author, "-", post.title)
    
    # Check if we have the node already in our graph
    if post.author not in g.nodes:
        g.add_node(post.author)
    
    post.comment_limit = breadthCommentCount
    
    # Get the top few comments
    for comment in post.comments.list():
        
        # Skip MoreComment objects, which don't have authors
        if isinstance(comment, praw.models.MoreComments):
            continue
        
        # Recursively process this reply
        recursive_node_adder(g, comment, post.author)

### Export Graph

We export the graph using GraphML in `NetworkX`, so we can load it in other software later.

Note, we could use other formats here as well. GraphML is just convenient.

In [None]:
nx.write_graphml(g, "output.reddit.graphml", prettyprint=False)

### Draw the Graph

Now that we've made the graph, let's draw it using the layout algorithms in `NetworkX`.

_NOTE_: `NetworkX` is not meant for graph layouts. We only do this for illustrative purposes.

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Use the Spring layout algorithm
pos = nx.spring_layout(g, scale=200, iterations=100, k=0.2)

# And draw the graph with node labels
nx.draw(g, 
        pos, 
        node_color='#A0CBE2', 
        width=1, 
        with_labels=True,
        node_size=50)