
## Before you start

### Creating an App

To access Reddit API you need to create an app on Reddit (it requires having a regular account first). It might sound a bit difficult at first but it is not.

1. Go to [www.reddit.com](https://www.reddit.com) and create an account if you do not have one. 
2. When you are logged in to Reddit go to this [link](https://www.reddit.com/prefs/apps). After pressing `Create app` you should see something like below.

<div style="text-align:center"><img src="../png/reddit_app.png"/></div>

For now, you should be interested in only three fields: `name`, `type of the app`, and `redirect_url`. You should fill in these fields as follows:

* `name` -- the name of your app. It should be one word.
* `type of the app` -- you should select script. 
* `redirect url` -- the location where the authorization server sends the user once the app has been successfully authorized and granted an authorization code or access token. You should type the following: `http://localhost:8080`. It will make sure that after successful authorization you will stay on Reddit API.

Press `Create app`. Congratulations, you just have created your first app!

You should now see something like the following.

<div style="text-align:center"><img src="../png/reddit_app2.png"/></div>

In the picture above we hid some information because those are the credentials (authorization details) you will use to tell Reddit API who you are. You should never share them with anyone, even your spouse or a firefighter! That is because they serve to identify you. If someone maluses them it will be on you. 

### Storing credentials

There are multiple ways to store your credentials and passwords safely. We do not want them to be corrupted, right? However, it is one thing to store them safely and the other to have strong passwords. We all know that we should use strong passwords, but do we really know why? The picture below shows how fast one can crack your password depending on its complexity.

<div style="text-align:center"><img src="../png/password_table_2023.jpg"/></div>

Anyhow, the lesson we should take from the graph above is twofold:

1. Use strong passwords.
2. Use password managers to propose strong passwords and store them.

If for any reason, you are still reluctant to trust password managers at least create complex passwords by mixing nonsense words (it is the only place where making spelling errors helps) and special characters, for example:

>`$eating#keyborads-1ncreases_staminA`

In our case, we have already generated passwords and credentials which look pretty strong. How are we going to store them?

### Environmental variables

As you probably rightly suspect, in our case, we will need our credentials to connect to API. We do not really want to store them in the notebook because we want to be able to share the notebook. We do not want to copy and paste them every time we want to use the notebook cause it would be very inefficient. Also, it will be quite easy to forget about it. What are we going to do then?

We are going to use something which is called environmental variables. In other words, we are going to define some variables either on our computer or in the Colab that will be stored there. In the Notebook, we will just retrieve them by their names. For this purpose, we need to press the key on the left-hand side tab. We need to define the 5 variables:

1. `username` -- our Reddit's username.
2. `password` -- our password to Reddit.
3. `client_id` -- it is a string right below the name of our app.
4. `client_secret` -- it is a string called secret.
5. `user_agent` -- `<name of OS>:<client_id>:<version of the app> (r/<username)`, in my case it is `macos:<client_id>:v1 (r/profesor_floretu)`

In [None]:
## Load module if you are using Google Colab
from google.colab import userdata

## Retrive the environmental variables and assing them to names.
client_id = userdata.get("client_id")
client_secret = userdata.get("client_secret")
password = userdata.get("password")
user_agent = userdata.get("user_agent")
username = userdata.get("username")

## Collect data

The chunk below serves to load necessary Python modules and defines a useful
function for dealing with date formats.

In [None]:
## Uncomment the line below if you are using Google Colab.
## !pip install praw
## Import modules
import praw
from datetime import datetime
import json
import pandas as pd


def convert_date(date_float: float) -> str:
    """
    Takes a date in epoch time format and converts it into a string in human-readable date format.

    Parameters:
    -----------
        date_float (float): a float representing a date in epoch time format.

    Returns:
    --------
        (str) : a string representing a date in human-readable format.
    """
    return datetime.fromtimestamp(date_float).strftime("%d-%m-%Y %H:%M:%S")

### Connect to Reddit

Once you have all your credentials stored in _Python_ as strings let's connect to Reddit API. [Here](https://praw.readthedocs.io/en/stable/index.html) is the documentation of this module.

In [None]:
## Connect to Reddit API
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    password=password,
    user_agent=user_agent,
    username=username,
)

### Subreddit selection

In this study, we looked for relevant posts from the subreddits r/Portugal, r/UK, and r/Polska. Our search keywords within these subreddits were "vegan", “vegetarian”, “plant-based diet”, “meat reduction” and “flexitarian”. The criteria for choosing the post were (1) the date of creation (around 2021), (2) the number of comments, (3) the relevance of the content, and (4) encouragement to share motivations and barriers, for instance, questions containing a call to actions to explain the reasons for choosing specific diet (e.g., ‘What is your diet? Share why you chose it’). The relevance of the content was assessed based on the quality of the comments shared. The comments were considered relevant if they contained meaningful arguments and responses, rather than being surface-level remarks (e.g. "I like meat.") that did not contribute to the subject of the post. Our data collection encompassed all available comments (i.e., responses to other users' posts) posted on the chosen submission. 

We first identified the submission through the user interface and only then we used the following script to collect all the comments under the given post. From each URL, we identified the six-element long string that was the post id which was necessary for the data collection, for Poland, the post id was [vnapm6](https://www.reddit.com/r/Polska/comments/vnapm6/weganizm_i_wegetarianizm/), for the UK -- [telqnb](https://www.reddit.com/r/CasualUK/comments/telqnb/plantbased_eating_seems_to_be_becoming_quite/), and for Portugal -- [lrcixp](https://www.reddit.com/r/portugal/comments/lrcixp/qual_a_vossa_opinião_em_relação_ao_veganismo/).

There is a lot of information about a single comment that can be gathered. Below you can
find the description of all possible data.

* author -- provides an instance of Redditor.
* body -- the body of the comment, as Markdown.
* body_html -- the body of the comment, as HTML.
* created_utc -- time the comment was created, represented in Unix Time.
* distinguished -- whether or not the comment is distinguished.
* edited -- whether or not the comment has been edited.
* id -- the ID of the comment.
* is_submitter -- whether or not the comment author is also the author of the submission.
* link_id -- the submission ID that the comment belongs to.
* parent_id -- the ID of the parent comment (prefixed with t1_). If it is a top-level comment, this returns the submission ID instead (prefixed with t3_).
* permalink -- a permalink for the comment. Comment objects from the inbox have a context attribute instead.
* replies -- provides an instance of CommentForest.
* saved -- whether or not the comment is saved.
* score -- the number of upvotes for the comment.
* stickied -- whether or not the comment is stickied.
* submission -- provides an instance of Submission. The submission that the comment belongs to.
* subreddit -- provides an instance of Subreddit. The subreddit that the comment belongs to.
* subreddit_id -- the subreddit ID that the comment belongs to.

And for the Redditor (user).

* comment_karma -- the comment karma for the Redditor.
* comments -- provide an instance of SubListing for comment access.
* submissions -- provide an instance of SubListing for submission access.
* created_utc -- time the account was created, represented in Unix Time.
* has_verified_email -- whether or not the Redditor has verified their email.
* icon_img -- the url of the Redditors’ avatar.
* id -- the ID of the Redditor.
* is_employee -- whether or not the Redditor is a Reddit employee.
* is_friend -- whether or not the Redditor is friends with the authenticated user.
* is_mod -- whether or not the Redditor mods any subreddits.
* is_gold -- whether or not the Redditor has active Reddit Premium status.
* is_suspended -- whether or not the Redditor is currently suspended.
* link_karma -- the link karma for the Redditor.
* name -- the Redditor’s username.
* subreddit -- if the Redditor has created a user-subreddit, provides a dictionary of additional attributes. See below.
* subreddit["banner_img"] -- the URL of the user-subreddit banner.
* subreddit["name"]-- the fullname of the user-subreddit.
* subreddit["over_18"] -- whether or not the user-subreddit is NSFW.
* subreddit["public_description"] -- the public description of the user-subreddit.
* subreddit["subscribers"] -- the number of users subscribed to the user-subreddit.
* subreddit["title"] -- the title of the user-subreddit.

In [None]:
## Select a submission by id -- in this case it is Polish submission.
submission = reddit.submission("vnapm6")

## Set the option to get all the comments
submission.comments.replace_more(limit=None)

## Iterate over all the comments. Ignore the comments
## tree. Write the comments to the JSON line file.
with open("comments_example.jl", "w") as file:
    for comment in submission.comments.list():
        temp_dict = {}
        temp_dict["body"] = comment.body
        ## Sometimes a given comment was deleted. Then
        ## we don't want to write it out to the file.
        ## I use here the continue statement. It does not
        ## break the loop it just goes to the next iteration.
        ## In other words whenever the comment was deleted
        ## it skips the rest of the code below the continue
        ## statement and gos for the next comment.
        if temp_dict["body"] == "[deleted]":
            continue
        temp_dict["score"] = comment.score
        temp_dict["link"] = comment.permalink
        try:
            temp_dict["author"] = {
                "name": comment.author.name,
                "karma": comment.author.comment_karma,
                "created_utc": convert_date(comment.author.created_utc),
                "has_verified_email": comment.author.has_verified_email,
            }
        except:  # noqa
            pass
        temp_dict["created_utc"] = convert_date(comment.created_utc)
        temp_dict["edited"] = comment.edited
        temp_dict["is_submitter"] = comment.is_submitter

        file.write(json.dumps(temp_dict) + "\n")

## EXCELLENT File
If you want to create an excell file from a JSON line file you can easily do it in the following manner.

In [None]:
## Read into Python a JSON line file
with open("comments_example.jl", "r") as file:
    df = [json.loads(line) for line in file.readlines()]

pd.DataFrame(df).to_excel("comments_example.xlsx")