<div style="text-align:center"><img src="png/reddit.png" /></div>

## What is reddit?

In general, I would say it is a good practice to start with learning what Reddit is. Below I copied some basics information from their [help page](https://www.reddithelp.com/hc/en-us/articles/204511479-What-is-Reddit/). They give the following answer to the question of what Reddit is:
> * Reddit is a source for what's new and popular on the Internet.
> * Users like you provide all of the content and decide, through voting, what's good and what's junk.
> * Reddit is made up of many individual communities, also known as subreddits. Each community has its own page, subject matter, users, and moderators.
> * Users post stories, links, and media to these communities, and other users vote and comment on the posts.
> * Through voting, users determine what posts rise to the top of community pages and, by extension, the public home page of the site.
> * Links that receive community approval bubble up towards #1, so the front page is constantly in motion and (hopefully) filled with fresh, interesting links.

Personally, I *do not have* an account on Reddit and probably not planning to have one, but if you want to understand better what kind of data you can extract from there I would recommend setting an account. As far as I understand Reddit is a big old internet forum (similar to 4chan or Polish Wykop) in which users post or comment on different information. Actually, every user can perform four types of actions:

1. Create a subreddit. Basically, it is a subforum on a given topic in which a group of users discusses it.
2. Write a post (submission) in a given subreddit.
3. Write a comment to a given post.
4. Rate a given comment or post.

For these actions, people earn **karma**.

### What is karma?

Again, according to [Reddit's help page](https://www.reddithelp.com/hc/en-us/articles/204511829-What-is-karma-) karma is:

>A user's **karma** reflects how much a user has contributed to the Reddit community by an approximate indication of the total votes a user has earned on their submissions ("post karma") and comments ("comment karma"). When posts or comments get upvoted, that user gains some karma. You can see how much karma a user has on their profile page.
>
>Karma is only approximate: there is not a 1:1 relationship with votes. Your post karma will always be significantly lower than the total number of votes you receive on your links. Comment karma is closer to a 1:1 relationship but is still only approximate.

Therefore, from our perspective important there are two important pieces of information here. First, we learned that users differ in terms of their activity and the popularity of their content by karma points. This information might be useful when/if learn how to get information on users. Second, we learned that posts or comments might be either upvoted or downvoted. This is important because as far as I understand the comments or posts with the highest score are exposed on the front page of Reddit and might have a bigger impact on the users not necessarily only the given subreddit. Also, comments with a high score are displayed higher under the post.

![api](png/api.jpg)

## What is API
When we know something about Reddit let's dig a bit deeper into the world of restaurants. I mean APIs.
In general, web APIs (Application Programming Interface) are publicly (usually; there is plenty of private APIs, but for obvious reasons, we do not care about them as we can not use them) available interfaces through which third parties (this is us!) can access some data resources in a **remote**, **reliable** and **programmable** manner.

What does it mean in practice?

* **Remote.** Users can access the resource from anywhere, provided they have an internet connection.
* **Reliable.** The interface exposed to users is independent of the internal details of the system that produces the data. In other words, the way a user communicates with the API is independent of the way the system works. In practice it means that a user does not have to know anything about the system, it is enough to know the API interface.
* **Programmable.** API can be interacted with based on a predefined set of commands/methods (an interface) in a way that can be expressed with a programming language. This is usually achieved by using HTTP protocol which a standard communication protocol in the Web and for which utilities are available in any major programming language.

## Reddit API
In general, we now should know what API is and what Reddit. So it is the right time to [talk about practice](https://www.youtube.com/watch?v=eGDBR2L5kzI). Where to find Reddit's API. This question is more complex than it might seem. There are two ways to access Reddit through API:

1. **Official Reddit API.** In most cases the best way to access data from a webpage which has an API is to use the official one. You might find doccumentation on Reddit's [here](https://www.reddit.com/dev/api). This webpage is not particullary beautifull but rarely documentation is. At first glance, you probably would be overhelmed with the ammount of information you might find there. However, for now you only need to know that you are not going to use the official Reddit's API cause it is inconvinient. It requires authentication (having a developer account) and as far as I am concerned it is not really developed. Anyhow, if you decide to perform more detailed analysis of Reddit you probably should read the official documnetation and visit these two pages: [Reddit's Archived GitHub repository](https://github.com/reddit-archive/reddit/wiki/API) and [Documentation on Reddit's API Python Wrapper](https://praw.readthedocs.io/en/latest/). This is a lot of reading and understanding. However, there is no other way unless...
2. **Pushshift Reddit's API.** There is a Reddit's user Jason Baumgartner who for unclear reasons (at least for me they are unclear but I was not particylarly motivated to look it up) decided to dump every month the whole Reddit. On [this](https://pushshift.io) a much nicer webpage you might find documentation on his API.

In our case we will use 

## Practice

We are going to use _python_ to access Reddit API. It is possible to use _R_ but it would be to complicated to do so. Somehow it is easier to use Google Colab and _python_ you don't know that using the programming language you know a bit. However, as you will see in a second you don't have to know _python_ to get your first batch of data from Reddit.

However, I would recommend doing a tutorial on _python_ basics, for example I find this one quite good. There is always my class you can refer to but as I mentioned before it migh lack a detailed explanation.

In [39]:
import requests as rq
import pandas as pd
import json
import datetime

## Submission

In [54]:
url = 'https://api.pushshift.io/reddit/search/submission/'
payload = { 'q' : 'climate',
            'html_decode' : True}

In [55]:
response = rq.get(url, params = payload)

In [56]:
response

<Response [200]>

In [58]:
data = json.loads(response.text)['data']

In [59]:
len(data)

25