<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Reddit Data using Reddit API</h1>

In this live coding session, we leverage the Python Reddit API Wrapper (`PRAW`) to retrieve data from subreddits on [Reddit](https://www.reddit.com), and perform sentiment analysis using [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) from [HuggingFace ( 🤗 the GitHub of Machine Learning )](https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/), powered by [transformer](https://arxiv.org/pdf/1706.03762.pdf).

## Objectives

At the end of the session, you will 

- know how to work with APIs
- feel more comfortable navigating thru documentation, even inspecting the source code
- understand what a `pipeline` object is in HuggingFace
- perform sentiment analysis using `pipeline`
- run a python script in command line and get the results

## How to Submit

- At the end of each task, commit* the work into the repository you created before the assignment
- After completing all three tasks, make sure to push the notebook containing all code blocks and output cells to your repository you created before the assignment
- Submit the link to the notebook in Canvas

\***NEVER** commit a notebook displaying errors unless it is instructed otherwise. However, commit often; recall git ABC = **A**lways **B**e **C**ommitting.

## Tasks

### Task I: Instantiate a Reddit API Object

The first task is to instantiate a Reddit API object using [PRAW](https://praw.readthedocs.io/en/stable/), through which you will retrieve data. PRAW is a wrapper for [Reddit API](https://www.reddit.com/dev/api) that makes interacting with the Reddit API easier unless you are already an expert of [`requests`](https://docs.python-requests.org/en/latest/).

#### 1. Install packages

Please ensure you've ran all the cells in the `imports.ipynb`, located [here](https://github.com/FourthBrain/MLE-8/blob/main/assignments/week-3-analyze-sentiment-subreddit/imports.ipynb), to make sure you have all the required packages for today's assignment.

####  2. Create a new app on Reddit 

Create a new app on Reddit and save secret tokens; refer to [post in medium](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c) for more details.

- Create a Reddit account if you don't have one, log into your account.
- To access the API, we need create an app. Slight updates, on the website, you need to navigate to `preference` > `app`, or click [this link](https://www.reddit.com/prefs/apps) and scroll all the way down. 
- Click to create a new app, fill in the **name**, choose `script`, fill in  **description** and **redirect uri** ( The redirect URI is where the user is sent after they've granted OAuth access to your application (more info [here](https://github.com/reddit-archive/reddit/wiki/OAuth2)) For our purpose, you can enter some random url, e.g., www.google.com; as shown below.


    <img src="https://miro.medium.com/max/700/1*lRBvxpIe8J2nZYJ6ucMgHA.png" width="500"/>
- Jot down `client_id` (left upper corner) and `client_secret` 

    NOTE: CLIENT_ID refers to 'personal use script" and CLIENT_SECRET to secret.
    
    <div>
    <img src="https://miro.medium.com/max/700/1*7cGAKth1PMrEf2sHcQWPoA.png" width="300"/>
    </div>

- Create `secrets_reddit.py` in the same directory with this notebook, fill in `client_id` and `secret_id` obtained from the last step. We will need to import those constants in the next step.
    ```
    REDDIT_API_CLIENT_ID = "client_id"
    REDDIT_API_CLIENT_SECRET = "secret_id"
    REDDIT_API_USER_AGENT = "any string except bot; ex. My User Agent"
    ```
- Add `secrets_reddit.py` to your `.gitignore` file if not already done. NEVER push credentials to a repo, private or public. 

#### 3. Instantiate a `Reddit` object

Now you are ready to create a read-only `Reddit` instance. Refer to [documentation](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html) when necessary.

In [7]:
import praw
from secrets_reddit import reddit_secrets as rs

reddit = praw.Reddit(
    client_id=rs["REDDIT_API_CLIENT_ID"],
    client_secret=rs["REDDIT_API_CLIENT_SECRET"],
    user_agent=rs["REDDIT_API_USER_AGENT"],
)


In [8]:
print(reddit.read_only) 

True


<details>
<summary>Expected output:</summary>   

```<praw.reddit.Reddit object at 0x10f8a0ac0>```
</details>

#### 4. Instantiate a `subreddit` object

Lastly, create a `subreddit` object for your favorite subreddit and inspect the object. The expected output you will see ar from `r/machinelearning` unless otherwise specified.

In [10]:
s_ml = reddit.subreddit("machinelearning")

What is the display name of the subreddit?

In [11]:
s_ml.display_name

'machinelearning'

<details>
<summary>Expected output:</summary>   

    machinelearning
</details>

How about its title, is it different from the display name?

In [12]:
s_ml.title

'Machine Learning'

<details>
<summary>Expected output:</summary>   

    Machine Learning
</details>

Print out the description of the subreddit:

In [13]:
print(s_ml.description)

**[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
--------
+[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
--------
+[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
--------
+[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
--------
+[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ANews)
--------
***[@slashML on Twitter](https://twitter.com/slashML)***
--------
***[Chat with us on Slack](https://join.slack.com/t/rml-talk/shared_invite/enQtNjkyMzI3NjA2NTY2LWY0ZmRjZjNhYjI5NzYwM2Y0YzZhZWNiODQ3ZGFjYmI2NTU3YjE1ZDU5MzM2ZTQ4ZGJmOTFmNWVkMzFiMzVhYjg)***
--------
**Beginners:**
--------
Please have a look at [our FAQ and Link-Collection](http://www.reddit.com/r/MachineLearning/wiki/index)

[Metacademy](http://www.metacademy.org) is a great resource which compiles le

<details>
<summary>Expected output:</summary>

    **[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
    --------
    +[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
    --------
    +[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
    --------
    +[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
    --------
    +[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict
</details>

### Task II: Parse comments

#### 1. Top Posts of All Time

Find titles of top 10 posts of **all time** from your favorite subreddit. Refer to [Obtain Submission Instances from a Subreddit Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html)) if necessary. Verify if the titles match what you read on Reddit.

In [53]:
# try run this line, what do you see? press q once you are done
?subreddit.top 

In [14]:
for submission in s_ml.top(limit=10, time_filter="all"):
    print(submission.title)

[Project] From books to presentations in 10s with AR + ML
[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
[R] First Order Motion Model applied to animate paintings
[N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
[D] This AI reveals how much time politicians stare at their phone at work
[D] Types of Machine Learning Papers
[D] The machine learning community has a toxicity problem
I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]
[Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
[P] Using oil portraits and First Order Model to bring the paintings back to life


<details> <summary>Expected output:</summary>

    [Project] From books to presentations in 10s with AR + ML
    [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
    [R] First Order Motion Model applied to animate paintings
    [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
    [D] This AI reveals how much time politicians stare at their phone at work
    [D] Types of Machine Learning Papers
    [D] The machine learning community has a toxicity problem
    [Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
    [P] Using oil portraits and First Order Model to bring the paintings back to life
    [D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG)    
</details>

#### 2. Top 10 Posts of This Week

What are the titles of the top 10 posts of **this week** from your favorite subreddit?

In [15]:
for submission in s_ml.top(limit=10, time_filter="week"):
    print(submission.title)

[P][R] Modern Disney Diffusion, dreambooth model trained using the diffusers implementation
[P] Finetuned Diffusion: multiple fine-tuned Stable Diffusion models, trained on different styles
[P] Explain Paper - A Better Way to Read Academic Papers
[R] TOCH outperforms state of the art 3D hand-object interaction models and produces smooth interactions even before and after contact
[D] DALL·E to be made available as API, OpenAI to give users full ownership rights to generated images
[P] Made a text generation model to extend stable diffusion prompts with suitable style cues
[P] Learn diffusion models with Hugging Face course 🧨
[News] The Stack: 3 TB of permissively licensed source code - Hugging Face and ServiceNow Research Denis Kocetkov et al 2022
[N] Adversarial Policies Beat Professional-Level Go AIs
[P] Fine Tuning Stable Diffusion: Naruto Character Edition


<details><summary>Expected output:</summary>

    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    [R] Meta is releasing a 175B parameter language model
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    [P] T-SNE to view and order your Spotify tracks
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    [D] Do you use NLTK or Spacy for text preprocessing?
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
</details>

💽❓ Data Question:

Check out what other attributes the `praw.models.Submission` class has in the [docs](https://praw.readthedocs.io/en/stable/code_overview/models/submission.html). 

1. After having a chance to look through the docs, is there any other information that you might want to extract? How might this additional data help you?

Write a sample piece of code below extracting three additional pieces of information from the submission below.

In [41]:
#s_ml.traffic() - Doesn't seem to like this and throws an exception
try:
    stats = s_ml.traffic()
except Exception as ex:
    print(f"Ex - {ex}")


Ex - Redirect to /r/MachineLearning/login/ (You may be trying to perform a non-read-only action via a read-only instance.)


In [97]:
#Rising submissions
for rising_submission in s_ml.rising():
    print(rising_submission.title)

How to perform economic optimization without TensorFlow or PyTorch ? [Research]
[P] Finetuned Diffusion: multiple fine-tuned Stable Diffusion models, trained on different styles
[D] Paper Explanation & Author Interview - ROME: Locating and Editing Factual Associations in GPT
[Discussion] ICLR2023 statistics of submission
[D] Physics-inspired Deep Learning Models
[P] Learn diffusion models with Hugging Face course 🧨
[D] ICLR 2023 reviews are out. How was your experience ?
[P] Sparse Transformers for Inference in a Real-Time Twitter Stream
[D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks
[D] A model with different data types as input?
[N] AAAI2023 workshop on Dynamical Systems and Machine Learning
[P] Topic modeling with semantic graphs: a different approach
[D] Paper bidding is a terrible system


In [78]:
#See who has posted controversial comments
for controversial_comments in s_ml.controversial(limit=10):
    if controversial_comments:
        print(controversial_comments.author)

emilec___
FutureWatch4
llSourcell
datadudes-ai
improssibility
No_Effective7572
depressedPOS-plzhelp
sour_losers
otoyou1234
throwmeaway-account


In [82]:
#Comment author is useful if we want to know who commented or if we want to see if a specific person has commented
for comment in s_ml.comments(limit=5):
    print(comment.author)

LUNA_underUrsaMajor
RoboticCougar
ForceBru
ShadowKnightPro
AutoModerator


💽❓ Data Question:

2. Is there any information available that might be a concern when it comes to Ethical Data?

In [96]:
#In the first pass, it doesn't look like it. 
#Author information could be one but doesn't look like it could be used for hacking purposes. 
#People can always post things that are unethical.

#### 3. Comment Code

Add comments to the code block below to describe what each line of the code does (Refer to [Obtain Comment Instances Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html) when necessary). The code is adapted from [this tutorial](https://praw.readthedocs.io/en/stable/tutorials/comments.html)

The purpose is 
1. to understand what the code is doing 
2. start to comment your code whenever it is not self-explantory if you have not (others will thank you, YOU will thank you later 😊) 

In [16]:
%%time
from praw.models import MoreComments

# A list to store the comments for the top posts
top_comments = []

# Iterate through the top 10 posts
for submission in s_ml.top(limit=10):
    # Iterate through comments for each post
    for top_level_comment in submission.comments:
        # Ignore if what is returned is an instance of MoreComments
        if isinstance(top_level_comment, MoreComments):
            continue
        # Append the comment body to list of top comments
        top_comments.append(top_level_comment.body)
print(len(top_comments))

746
CPU times: user 486 ms, sys: 30.6 ms, total: 517 ms
Wall time: 1min 39s


#### 4. Inspect Comments

How many comments did you extract from the last step? Examine a few comments. 

In [None]:
#YOUR CODE HERE  # the answer may vary 693 for r/machinelearning
746

In [18]:
import random

[random.choice(top_comments) for i in range(3)]

['My cat HATED that omg.',
 "Calm down, young one, too much drama ;)\n\nA lot has changed over the past ~5 years, and the machine learning field really raised the bar on standards imho.\n\nPapers are no longer behind a paywall? And there's code to go with it and results can be reproduced? And open datasets to benchmark against? Ya kidding me? 10 years ago if you took latest state of the art paper and implemented it yourself, you'd find out your performance is somehow worse. That maybe some magic values were not mentioned. Or they hand-picked test sequences. Etc.\n\nPeople worship Google or Stanford? Few years back, the fashion was about publishing in Nature and Science and chasing impact factors. Either way, exceptional work gets recognized, that's the best you can do anyway. Get published on merit.\n\nSo, worried about publishing by all means, marginally pushing the envelope on state of the art and mostly just tinkering with hyper parameters until you get the result you wanted? That's

<details> <summary>Some of the comments from `r/machinelearning` subreddit are:</summary>

    ['Awesome visualisation',
    'Similar to a stack or connected neurons.',
    'Will this Turing pass the Turing Test?']
</details>

💽❓ Data Question:

3. After having a chance to review a few samples of 5 comments from the subreddit, what can you say about the data? 

HINT: Think about the "cleanliness" of the data, the content of the data, think about what you're trying to do - how does this data line up with your goal?

In [107]:
# Comments like the first one don't have value and the second is just rambling. Third one might have value.
# The data istself is raw and so will need some filtering
# Depending on what we are looking for, we'll have to more due diligence. 

#### 5. Extract Top Level Comment from Subreddit `TSLA`.

Write your code to extract top level comments from the top 10 topics of a time period, e.g., year, from subreddit `TSLA` and store them in a list `top_comments_tsla`.  

In [19]:
%%time
from praw.models import MoreComments

# A list to store the comments for the top posts
top_comments_tsla = []

# Iterate through the top 10 posts
for submission in reddit.subreddit('TSLA').top(limit=10):
    # Iterate through comments for each post
    for top_level_comment in submission.comments:
        # Ignore if what is returned is an instance of MoreComments
        if isinstance(top_level_comment, MoreComments):
            continue
        # Append the comment body to list of top comments
        top_comments_tsla.append(top_level_comment.body)

CPU times: user 183 ms, sys: 16.7 ms, total: 200 ms
Wall time: 3.29 s


In [20]:
len(top_comments_tsla) # Expected: 174 for r/machinelearning

170

In [22]:
[random.choice(top_comments_tsla) for i in range(3)]

['**Fell of the chair.. Awesomely put together**',
 '[deleted]',
 'I saw a Tesla go from $173 to $1500. Elon musk is a beast. He’s always fighting off the short sellers and still on top. It’s like all the Rocky movies doesn’t matter how many times you hit him he just keeps coming back until you’re too weak to fight back and then Bam a legend is made. Don’t take my word for it do you’re own  research. If I had a nickel for every time I heard someone say I wish I would’ve bought it when it was low....']

<details>
<summary>Some of the comments from `r/TSLA` subreddit:</summary>

    ['I bought puts',
    '100%',
    'Yes. And I’m bag holding 1200 calls for Friday and am close to throwing myself out the window']
</details>

💽❓ Data Question:

4. Now that you've had a chance to review another subreddits comments, do you see any differences in the kinds of comments either subreddit has - and how might this relate to bias?

In [None]:
# The comments are look a more vociferous 
# Obviously comments are from folks interested in Tesla stock and their very biased opinions
# I'm sure comments for machine learning can also be as biased but may be we didn't see any

### Task III: Sentiment Analysis

Let us analyze the sentiment of comments scraped from `r/TSLA` using a pre-trained HuggingFace model to make the inference. Take a [Quick tour](https://huggingface.co/docs/transformers/quicktour). 

#### 1. Import `pipeline`

In [23]:
from transformers import pipeline

#### 2. Create a Pipeline to Perform Task "sentiment-analysis"

In [24]:
sentiment_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

Downloading:   0%|          | 0.00/1.45k [00:00<?, ?B/s]

#### 3. Get one comment from list `top_comments_tsla` from Task II - 5.

In [25]:
comment = random.choice(top_comments_tsla)

In [26]:
comment

'ho lee fuk \n\nyou got anymore insider information? 👀👀'

The example comment is: `'Bury Burry!!!!!'`. Print out what you get. For reproducibility, use the same comment in the next step; consider setting a seed.

#### 4. Make Inference!

In [28]:
sentiment = sentiment_model(comment)
print(type(sentiment))

<class 'list'>


What is the type of the output `sentiment`?

```
It is a  List
```

In [29]:
print(f'The comment: {comment}')
print(f'Predicted Label is {sentiment[0]["label"]} and the score is {sentiment[0]["score"]:.3f}')

The comment: ho lee fuk 

you got anymore insider information? 👀👀
Predicted Label is NEG and the score is 0.951


For the example comment, the output is:

    The comment: Bury Burry!!!!!
    Predicted Label is NEGATIVE and the score is 0.989

🖥️❓ Model Question:

1. What does the score represent?

The score represents how sure the model is with respect to the sentiment. In here the score is 0.951 and so the model is pretty sure that the sentiment is negative

### Task IV: Put All Together

Let's pull all the piece together, create a simple script that does 

- get the subreddit
- get comments from the top posts for given subreddit
- run sentiment analysis 

#### Complete the Script

Once you complete the code, running the following block writes the code into a new Python script and saves it as `top_tlsa_comment_sentiment.py` under the same directory with the notebook. 

In [51]:
%%writefile top_tlsa_comment_sentiment.py

import secrets
import random

from typing import Dict, List

from praw import Reddit
from praw.models.reddit.subreddit import Subreddit
from praw.models import MoreComments

from transformers import pipeline
from secrets_reddit import reddit_secrets as rs


def get_subreddit(display_name:str) -> Subreddit:
    """Get subreddit object from display name

    Args:
        display_name (str): [description]

    Returns:
        Subreddit: [description]
    """
    reddit = Reddit(
        client_id=rs["REDDIT_API_CLIENT_ID"],
        client_secret=rs["REDDIT_API_CLIENT_SECRET"],
        user_agent=rs["REDDIT_API_USER_AGENT"],
#        client_id=secrets.REDDIT_API_CLIENT_ID,        
#        client_secret=secrets.REDDIT_API_CLIENT_SECRET,
#        user_agent=secrets.REDDIT_API_USER_AGENT
        )
    
    subreddit = reddit.subreddit(display_name)
    return subreddit

def get_comments(subreddit:Subreddit, limit:int=3) -> List[str]:
    """ Get comments from subreddit

    Args:
        subreddit (Subreddit): [description]
        limit (int, optional): [description]. Defaults to 3.

    Returns:
        List[str]: List of comments
    """
    top_comments = []
    for submission in subreddit.top(limit=limit):
        for top_level_comment in submission.comments:
            if isinstance(top_level_comment, MoreComments):
                continue
            top_comments.append(top_level_comment.body)
    return top_comments

def run_sentiment_analysis(comment:str) -> Dict:
    """Run sentiment analysis on comment using default distilbert model
    
    Args:
        comment (str): [description]
        
    Returns:
        str: Sentiment analysis result
    """
#    sentiment_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")
    sentiment_model = pipeline("sentiment-analysis")
    sentiment = sentiment_model(comment)
    return sentiment[0]


if __name__ == '__main__':
    subreddit = get_subreddit("TSLA")
    comments = get_comments(subreddit)
    comment = random.choice(comments)
    sentiment = run_sentiment_analysis(comment)
    
    print(f'The comment: {comment}')
    print(f'Predicted Label is {sentiment["label"]} and the score is {sentiment["score"]:.3f}')

Overwriting top_tlsa_comment_sentiment.py


Run the following block to see the output.

In [52]:
!python top_tlsa_comment_sentiment.py

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The comment: FACTS SIR!!!😒😌😔😪
Predicted Label is POSITIVE and the score is 0.996


<details><summary> Expected output:</summary>

    No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
    The comment: When is DOGE flying
    Predicted Label is POSITIVE and the score is 0.689
</details>

In [57]:
import requests
subreddit = 'machinelearning'
import datetime as dt

after = int(dt.datetime(2022,10,7,21,0).timestamp())
r = requests.get(f"https://api.pushshift.io/reddit/search/submission/?subreddit={subreddit}&metadata=true&size=0&after={after}")

print(r.json()['metadata']['total_results']/31)

57.806451612903224


💽❓ Data Question:

5. Is the subreddit active? About how many posts or threads per day? How could you find this information?

Doesn't look like there is a simple way to do it. I googled and found the way that I have used above to be the simplest although I'm not entirely sure if that is the case. Either way, it seems like there are only 58 posts per day on an average in the last 31 days for "machinelearning" and so it is active but not very active, I guess.

In [47]:
from collections import Counter

top_posts = s_ml.top(limit=1000)
authors = []
for post in top_posts:
    authors.append(post.author)
    
counts = Counter(authors)
counts.most_common()

[(Redditor(name='Illustrious_Row_9971'), 49),
 (None, 34),
 (Redditor(name='hardmaru'), 22),
 (Redditor(name='pinter69'), 15),
 (Redditor(name='cloud_weather'), 12),
 (Redditor(name='programmerChilli'), 11),
 (Redditor(name='sensetime'), 9),
 (Redditor(name='SpatialComputing'), 8),
 (Redditor(name='TheInsaneApp'), 7),
 (Redditor(name='wei_jok'), 7),
 (Redditor(name='milaworld'), 7),
 (Redditor(name='Yuqing7'), 7),
 (Redditor(name='downtownslim'), 6),
 (Redditor(name='yusuf-bengio'), 5),
 (Redditor(name='dojoteef'), 5),
 (Redditor(name='vijish_madhavan'), 4),
 (Redditor(name='_sshin_'), 4),
 (Redditor(name='davidbun'), 4),
 (Redditor(name='othotr'), 4),
 (Redditor(name='olaf_nij'), 4),
 (Redditor(name='DeepEven'), 4),
 (Redditor(name='Wiskkey'), 4),
 (Redditor(name='SkiddyX'), 4),
 (Redditor(name='RudyWurlitzer'), 4),
 (Redditor(name='finallyifoundvalidUN'), 4),
 (Redditor(name='MassivePellfish'), 4),
 (Redditor(name='RichardRNN'), 3),
 (Redditor(name='pathak22'), 3),
 (Redditor(name='h

💽❓ Data Question:

6. Does there seem to be a large distribution of posters or a smaller concentration of posters who are very active? What kind of impact might this have on the data?



Looking at the top 1000 posts there does seem to be a reasonably large distribution of posters but only few of them have multiple posts and the number is even less if we look at 10s of posts. That is going to give a view that is more biased based on those authors opinions.