<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Reddit Data using Reddit API</h1>

In this live coding session, we leverage the Python Reddit API Wrapper (`PRAW`) to retrieve data from subreddits on [Reddit](https://www.reddit.com), and perform sentiment analysis using [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) from [HuggingFace ( 🤗 the GitHub of Machine Learning )](https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/), powered by [transformer](https://arxiv.org/pdf/1706.03762.pdf).

## Objectives

At the end of the session, you will 

- know how to work with APIs
- feel more comfortable navigating thru documentation, even inspecting the source code
- understand what a `pipeline` object is in HuggingFace
- perform sentiment analysis using `pipeline`
- run a python script in command line and get the results

## How to Submit

- At the end of each task, commit* the work into the repository you created before the assignment
- After completing all three tasks, make sure to push the notebook containing all code blocks and output cells to your repository you created before the assignment
- Submit the link to the notebook in Canvas

\***NEVER** commit a notebook displaying errors unless it is instructed otherwise. However, commit often; recall git ABC = **A**lways **B**e **C**ommitting.

## Tasks

### Task I: Instantiate a Reddit API Object

The first task is to instantiate a Reddit API object using [PRAW](https://praw.readthedocs.io/en/stable/), through which you will retrieve data. PRAW is a wrapper for [Reddit API](https://www.reddit.com/dev/api) that makes interacting with the Reddit API easier unless you are already an expert of [`requests`](https://docs.python-requests.org/en/latest/).

#### 1. Install packages

Please ensure you've ran all the cells in the `imports.ipynb`, located [here](https://github.com/FourthBrain/MLE-8/blob/main/assignments/week-3-analyze-sentiment-subreddit/imports.ipynb), to make sure you have all the required packages for today's assignment.

####  2. Create a new app on Reddit 

Create a new app on Reddit and save secret tokens; refer to [post in medium](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c) for more details.

- Create a Reddit account if you don't have one, log into your account.
- To access the API, we need create an app. Slight updates, on the website, you need to navigate to `preference` > `app`, or click [this link](https://www.reddit.com/prefs/apps) and scroll all the way down. 
- Click to create a new app, fill in the **name**, choose `script`, fill in  **description** and **redirect uri** ( The redirect URI is where the user is sent after they've granted OAuth access to your application (more info [here](https://github.com/reddit-archive/reddit/wiki/OAuth2)) For our purpose, you can enter some random url, e.g., www.google.com; as shown below.


    <img src="https://miro.medium.com/max/700/1*lRBvxpIe8J2nZYJ6ucMgHA.png" width="500"/>
- Jot down `client_id` (left upper corner) and `client_secret` 

    NOTE: CLIENT_ID refers to 'personal use script" and CLIENT_SECRET to secret.
    
    <div>
    <img src="https://miro.medium.com/max/700/1*7cGAKth1PMrEf2sHcQWPoA.png" width="300"/>
    </div>

- Create `secrets_reddit.py` in the same directory with this notebook, fill in `client_id` and `secret_id` obtained from the last step. We will need to import those constants in the next step.
    ```
    REDDIT_API_CLIENT_ID = "client_id"
    REDDIT_API_CLIENT_SECRET = "secret_id"
    REDDIT_API_USER_AGENT = "any string except bot; ex. My User Agent"
    ```
- Add `secrets_reddit.py` to your `.gitignore` file if not already done. NEVER push credentials to a repo, private or public. 

#### 3. Instantiate a `Reddit` object

Now you are ready to create a read-only `Reddit` instance. Refer to [documentation](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html) when necessary.

In [9]:



import praw

#import secrets_reddit

# Create a Reddit object which allows us to interact with the Reddit API

reddit = praw.Reddit(
    client_id = "euo_gpXfRJt-OYkJM1ixWw",
    client_secret = "4rP6bJE6meTY4z8qoi7qSDtrf4_1Fw",
    user_agent = "FourthBrain"
)

In [10]:
print(reddit) 

<praw.reddit.Reddit object at 0x7f8f6f940e80>


<details>
<summary>Expected output:</summary>   

```<praw.reddit.Reddit object at 0x10f8a0ac0>```
</details>

#### 4. Instantiate a `subreddit` object

Lastly, create a `subreddit` object for your favorite subreddit and inspect the object. The expected outputs you will see are from `r/machinelearning` unless otherwise specified.

In [28]:

# Look at Obtain a subreddit section in URL
# - https://praw.readthedocs.io/en/stable/getting_started/quick_start.html

#create a subreddit object from Machine Learning 
sub_red = reddit.subreddit('machinelearning')

    


What is the display name of the subreddit?

In [29]:


print(sub_red.display_name)

# Output: MachineLearning



machinelearning


<details>
<summary>Expected output:</summary>   

    machinelearning
</details>

How about its title, is it different from the display name?

In [30]:

print(sub_red.title)

# Output: Machine Learning


Machine Learning


<details>
<summary>Expected output:</summary>   

    Machine Learning
</details>

Print out the description of the subreddit:

In [31]:

print(sub_red.description[:400])

# Output: prints the first 400 characters

**[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
--------
+[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
--------
+[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
--------
+[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q


<details>
<summary>Expected output:</summary>

    **[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
    --------
    +[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
    --------
    +[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
    --------
    +[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
    --------
    +[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict
</details>

### Task II: Parse comments

#### 1. Top Posts of All Time

Find titles of top 10 posts of **all time** from your favorite subreddit. Refer to [Obtain Submission Instances from a Subreddit Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html)) if necessary. Verify if the titles match what you read on Reddit.

In [77]:
# try run this line, what do you see? press q once you are done
?sub_red.top 

In [39]:
# top ten posts of all time 

for submission in sub_red.top(time_filter ='all', limit = 10):
    print(submission.title)

# Output: 10 submissions

[Project] From books to presentations in 10s with AR + ML
[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
[R] First Order Motion Model applied to animate paintings
[N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
[D] This AI reveals how much time politicians stare at their phone at work
[D] Types of Machine Learning Papers
[D] The machine learning community has a toxicity problem
I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]
[Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
[P] Using oil portraits and First Order Model to bring the paintings back to life


In [27]:


# assume you have a Subreddit instance bound to variable `subreddit`
for submission in sub_red.hot(limit=10):
    print(submission.comments)
    print(submission.title)
    # Output: the submission's title
    print(submission.score)
    # Output: the submission's score
    print(submission.id)
    # Output: the submission's ID
    print(submission.url)
    # Output: the URL the submission points to or the submission's URL if it's a self post

<praw.models.comment_forest.CommentForest object at 0x7f8f6fe3f640>
[D] Simple Questions Thread
4
wzxike
https://www.reddit.com/r/MachineLearning/comments/wzxike/d_simple_questions_thread/
<praw.models.comment_forest.CommentForest object at 0x7f8f701a0310>
[D] Machine Learning - WAYR (What Are You Reading) - Week 140
127
vg5kjd
https://www.reddit.com/r/MachineLearning/comments/vg5kjd/d_machine_learning_wayr_what_are_you_reading_week/
<praw.models.comment_forest.CommentForest object at 0x7f8f707b0040>
[D] "A majority of respondents think that the scientific value of the majority of work in NLP is dubious"
118
x0lz2f
https://www.reddit.com/r/MachineLearning/comments/x0lz2f/d_a_majority_of_respondents_think_that_the/
<praw.models.comment_forest.CommentForest object at 0x7f8f707b0460>
[D] Easily Run Stable Diffusion Image to Image mode
73
x0jo8h
https://www.reddit.com/r/MachineLearning/comments/x0jo8h/d_easily_run_stable_diffusion_image_to_image_mode/
<praw.models.comment_forest.CommentFor

<details> <summary>Expected output:</summary>

    [Project] From books to presentations in 10s with AR + ML
    [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
    [R] First Order Motion Model applied to animate paintings
    [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
    [D] This AI reveals how much time politicians stare at their phone at work
    [D] Types of Machine Learning Papers
    [D] The machine learning community has a toxicity problem
    [Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
    [P] Using oil portraits and First Order Model to bring the paintings back to life
    [D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG)    
</details>

#### 2. Top 10 Posts of This Week

What are the titles of the top 10 posts of **this week** from your favorite subreddit?

In [40]:
# YOUR CODE HERE

# top ten posts of THIS WEEK 

for submission in sub_red.top(time_filter ='week', limit = 10):
    print(submission.title)

# Output: 10 submissions


[P] Run Stable Diffusion locally with a web UI + artist workflow video
[D] StableDiffusion v1.4 is entirely public. What do you think about Stability.ai ?
[P] Einstein Instant NeRF
[D] How to Run Stable Diffusion (Locally and in Colab)
[D][N]"Mudge learned that Twitter had never acquired proper legal rights to training material used to build Twitter's key Machine Learning models. The Machine Learning models at issue were some of the core models running the company's most basic products, like which Tweets to show each user."
[D] ML for Good
[D] What are some dead ideas in machine learning or machine learning textbooks?
[P] Run stable diffusion in google colab including image2image and inpainting
[D] A thought I had on Yann LeCun's recent paper "A Path Towards Autonomous Machine Intelligence"
[D] "A majority of respondents think that the scientific value of the majority of work in NLP is dubious"


<details><summary>Expected output:</summary>

    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    [R] Meta is releasing a 175B parameter language model
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    [P] T-SNE to view and order your Spotify tracks
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    [D] Do you use NLTK or Spacy for text preprocessing?
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
</details>

💽❓ Data Question:

Check out what other attributes the `praw.models.Submission` class has in the [docs](https://praw.readthedocs.io/en/stable/code_overview/models/submission.html). 

1. After having a chance to look through the docs, is there any other information that you might want to extract? How might this additional data help you?

Yes, score, ID, and url are some are other information that can be extracted. There are many more pieces of info tha can be extracted. This information gives you information on the post which can be used to analysis or otherwise. Other information that can be extracted can be found in this url: https://praw.readthedocs.io/en/stable/code_overview/models/submission.html



Write a sample piece of code below extracting three additional pieces of information from the submission below.

In [42]:
# YOUR CODE HERE

for submission in sub_red.top(time_filter ='week', limit = 10):
    print(submission.score)
    # Output: the submission's score
    print(submission.id)
    # Output: the submission's ID
    print(submission.url)
    # Output: the URL the submission points to or the submission's URL if it's a self post

1195
wz68mz
https://v.redd.it/djdpfsmy2ak91
416
wv50uh
https://www.reddit.com/r/MachineLearning/comments/wv50uh/d_stablediffusion_v14_is_entirely_public_what_do/
336
wzmokl
https://v.redd.it/jacxo1lgvdk91
263
wvr23n
https://www.reddit.com/r/MachineLearning/comments/wvr23n/d_how_to_run_stable_diffusion_locally_and_in_colab/
230
wx7rw5
https://www.reddit.com/r/MachineLearning/comments/wx7rw5/dnmudge_learned_that_twitter_had_never_acquired/
220
wwdqp8
https://www.reddit.com/r/MachineLearning/comments/wwdqp8/d_ml_for_good/
214
x05d1e
https://www.reddit.com/r/MachineLearning/comments/x05d1e/d_what_are_some_dead_ideas_in_machine_learning_or/
124
wxogf0
https://www.reddit.com/r/MachineLearning/comments/wxogf0/p_run_stable_diffusion_in_google_colab_including/
119
x0lz2f
https://www.reddit.com/r/MachineLearning/comments/x0lz2f/d_a_majority_of_respondents_think_that_the/
114
wyqyu8
https://www.reddit.com/r/MachineLearning/comments/wyqyu8/d_a_thought_i_had_on_yann_lecuns_recent_paper_a/


💽❓ Data Question:

2. Is there any information available that might be a concern when it comes to Ethical Data?

It seems most of this information does not compromise a person's personal information. However, there are other things to consider such as over_18 and nsfw, which essentially tag content as mature or offensive. 

#### 3. Comment Code

Add comments to the code block below to describe what each line of the code does (Refer to [Obtain Comment Instances Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html) when necessary). The code is adapted from [this tutorial](https://praw.readthedocs.io/en/stable/tutorials/comments.html)

The purpose is 
1. to understand what the code is doing 
2. start to comment your code whenever it is not self-explantory if you have not (others will thank you, YOU will thank you later 😊) 

In [44]:
%%time
from praw.models import MoreComments

# YOUR COMMENT HERE

# this creates an empty list in which comments will appended into
top_comments = []


#loops through the top ten posts 
for submission in sub_red.top(limit=10):
    # if top_level-coment is an instance of MoreComments
    #this submission's comment forest contains a number of MoreComments objects 
    # these objects represent the "load more comments" and "continue this thread"
    for top_level_comment in submission.comments:
        # isinstance() checks if top_level_comment is an instance of MoreComments
        if isinstance(top_level_comment, MoreComments):
            continue
        # append body of top_level_comment to list
        top_comments.append(top_level_comment.body)

CPU times: user 470 ms, sys: 45.5 ms, total: 515 ms
Wall time: 1min 2s


#### 4. Inspect Comments

How many comments did you extract from the last step? Examine a few comments. 

In [45]:
#YOUR CODE HERE  # the answer may vary 693 for r/machinelearning

print(f"Number of top comments: {len(top_comments)}")

Number of top comments: 741


In [47]:
import random

[random.choice(top_comments) for i in range(5)]

['Every data scientist today is truly standing on the shoulders of giants.',
 "What are the edges cases when this doesn't work? Does this require certain lighting conditions etc? How does it know to extract both people from the image?",
 'Is there a hacky way I could modify this to work with streaming audio from a place like spotify, or would it require a pretty big overhaul of the code? Does anyone know?',
 "There's no way that took 10s to develop, install, try and record an 57 sec video of. I mean, yeah, technology and stuff, but not in 10s. Sorry.",
 "Does that mean we're supposed to find and share the link? Come on, op...."]

<details> <summary>Some of the comments from `r/machinelearning` subreddit are:</summary>

    ['Awesome visualisation',
    'Similar to a stack or connected neurons.',
    'Will this Turing pass the Turing Test?']
</details>

💽❓ Data Question:

3. After having a chance to review a few samples of 5 comments from the subreddit, what can you say about the data? 

The data varies in length, there may be mispellings, punctuaction, etc. The data is not "clean". 

HINT: Think about the "cleanliness" of the data, the content of the data, think about what you're trying to do - how does this data line up with your goal?

#### 5. Extract Top Level Comment from Subreddit `TSLA`.

Write your code to extract top level comments from the top 10 topics of a time period, e.g., year, from subreddit `TSLA` and store them in a list `top_comments_tsla`.  

In [53]:
# YOUR CODE HERE

top_comments_tsla =[]
subreddit = reddit.subreddit('TSLA')
top_comments = []

#loops through the top ten posts 
for submission in subreddit.top(limit=10):
    # if top_level-coment is an instance of MoreComments
    #this submission's comment forest contains a number of MoreComments objects 
    # these objects represent the "load more comments" and "continue this thread"
    for top_level_comment in submission.comments:
        # isinstance() checks if top_level_comment is an instance of MoreComments
        if isinstance(top_level_comment, MoreComments):
            continue
        # append body of top_level_comment to list
        top_comments_tsla.append(top_level_comment.body)

In [54]:
len(top_comments_tsla) # Expected: 174 for r/machinelearning

170

In [55]:
[random.choice(top_comments_tsla) for i in range(3)]

['Here is a very interesting theory/conspiracy https://twitter.com/robgrav3s/status/1461111661310398469?s=10',
 '[removed]',
 'Or you are actually an insider and freaked out that the post got attention lol. \n\nPm me with the next hot tip lol']

<details>
<summary>Some of the comments from `r/TSLA` subreddit:</summary>

    ['I bought puts',
    '100%',
    'Yes. And I’m bag holding 1200 calls for Friday and am close to throwing myself out the window']
</details>

💽❓ Data Question:

4. Now that you've had a chance to review another subreddits comments, do you see any differences in the kinds of comments either subreddit has - and how might this relate to bias?

- Depending on which subreddit you are examining there will be some kind of bias. For example, if you look at the Republican and Democrat subreddits, there will be a bias towards right and left views, respectively. 

### Task III: Sentiment Analysis

Let us analyze the sentiment of comments scraped from `r/TSLA` using a pre-trained HuggingFace model to make the inference. Take a [Quick tour](https://huggingface.co/docs/transformers/quicktour). 

#### 1. Import `pipeline`

In [57]:
from transformers import pipeline

#### 2. Create a Pipeline to Perform Task "sentiment-analysis"

In [58]:
sentiment_model = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


#### 3. Get one comment from list `top_comments_tsla` from Task II - 5.

In [59]:
comment = random.choice(top_comments_tsla)

In [60]:
comment

'Are you legally allowed to share that information though before they announced it?'

The example comment is: `'Bury Burry!!!!!'`. Print out what you get. For reproducibility, use the same comment in the next step; consider setting a seed.

#### 4. Make Inference!

In [61]:
sentiment = sentiment_model([comment])# YOUR CODE HERE 

What is the type of the output `sentiment`?

In [62]:
print(sentiment)

[{'label': 'NEGATIVE', 'score': 0.9689354300498962}]


```
YOUR ANSWER HERE

A list of dictionaries 

Negative 0.969
```

In [63]:
print(f'The comment: {comment}')
print(f'Predicted Label is {sentiment[0]["label"]} and the score is {sentiment[0]["score"]:.3f}')

The comment: Are you legally allowed to share that information though before they announced it?
Predicted Label is NEGATIVE and the score is 0.969


For the example comment, the output is:

    The comment: Bury Burry!!!!!
    Predicted Label is NEGATIVE and the score is 0.989

🖥️❓ Model Question:

1. What does the score represent?

The score is the confidence of the model. It is 96.9% confident the sentiment is Negative. 

### Task IV: Put All Together

Let's pull all the piece together, create a simple script that does 

- get the subreddit
- get comments from the top posts for given subreddit
- run sentiment analysis 

#### Complete the Script

Once you complete the code, running the following block writes the code into a new Python script and saves it as `top_tlsa_comment_sentiment.py` under the same directory with the notebook. 

In [75]:
%%writefile top_tlsa_comment_sentiment.py

import secrets_reddit as sr
import random

from typing import Dict, List

from praw import Reddit
from praw.models.reddit.subreddit import Subreddit
from praw.models import MoreComments

from transformers import pipeline


def get_subreddit(display_name:str) -> Subreddit:
    """Get subreddit object from display name

    Args:
        display_name (str): [description]

    Returns:
        Subreddit: [description]
    """
    reddit = Reddit(
        client_id = "euo_gpXfRJt-OYkJM1ixWw",
        client_secret = "4rP6bJE6meTY4z8qoi7qSDtrf4_1Fw",
        user_agent = "FourthBrain"
        )
    
    subreddit = reddit.subreddit(display_name)
    return subreddit

def get_comments(subreddit:Subreddit, limit:int=3) -> List[str]:
    """ Get comments from subreddit

    Args:
        subreddit (Subreddit): [description]
        limit (int, optional): [description]. Defaults to 3.

    Returns:
        List[str]: List of comments
    """
    top_comments = []
    for submission in subreddit.top(limit=limit):
        for top_level_comment in submission.comments:
            if isinstance(top_level_comment, MoreComments):
                continue
            top_comments.append(top_level_comment.body)
    return top_comments

def run_sentiment_analysis(comment:str) -> Dict:
    """Run sentiment analysis on comment using default distilbert model
    
    Args:
        comment (str): [description]
        
    Returns:
        str: Sentiment analysis result
    """
    sentiment_model = pipeline("sentiment-analysis")
    sentiment = sentiment_model(comment)
    return sentiment[0]


if __name__ == '__main__':
    subreddit = get_subreddit('TSLA')# YOUR CODE HERE
    comments = get_comments(subreddit)
    comment = random.choice(comments)
    sentiment = run_sentiment_analysis(comment)
    
    print(f'The comment: {comment}')
    print(f'Predicted Label is {sentiment["label"]} and the score is {sentiment["score"]:.3f}')

Overwriting top_tlsa_comment_sentiment.py


Run the following block to see the output.

In [76]:
!python top_tlsa_comment_sentiment.py

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The comment: So funny! Same thing can't buy anymore dips
Predicted Label is POSITIVE and the score is 0.983


<details><summary> Expected output:</summary>

    No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
    The comment: When is DOGE flying
    Predicted Label is POSITIVE and the score is 0.689
</details>

💽❓ Data Question:

5. Is the subreddit active? About how many posts or threads per day? How could you find this information?

Yes, the subreddit is active. The code below prints the number of posts in the past day. 


In [86]:
# this prints the number of posts per day 

count = 0
for submission in subreddit.top(time_filter ='day',limit=None):
    count = count + 1

print(f"The number of posts per day is {count}")

The number of posts per day is 3


💽❓ Data Question:

6. Does there seem to be a large distribution of posters or a smaller concentration of posters who are very active? What kind of impact might this have on the data?

From the coutput of the code below we can see that there is a large distributiion of posters. This means that the data comes from a lot of sources, making it more diverse. 

In [106]:
#this counts the number of authors in the top 100 posts 

authors = []

for submission in subreddit.top(time_filter ='all',limit=100):
    #print(submission.author)
    
    authors.append(submission.author)


from collections import Counter

Counter(authors)


Counter({Redditor(name='TSLAinsider'): 1,
         Redditor(name='MrBills07'): 1,
         Redditor(name='bigjimz88'): 3,
         Redditor(name='G_Train24'): 1,
         Redditor(name='lxelite89'): 1,
         Redditor(name='ThenickThenick'): 1,
         Redditor(name='Kornbelly'): 2,
         Redditor(name='Kaiserschmorrn'): 2,
         Redditor(name='_feelsgoodman__'): 1,
         None: 4,
         Redditor(name='Coffeeandtrade'): 1,
         Redditor(name='jrow68'): 1,
         Redditor(name='jrventure1'): 2,
         Redditor(name='BelmontMan'): 1,
         Redditor(name='_Dannyio'): 1,
         Redditor(name='DopeJeprox007'): 1,
         Redditor(name='Newt_Gold'): 1,
         Redditor(name='unleashthepotential'): 1,
         Redditor(name='ExplanationGeneral31'): 1,
         Redditor(name='Savings-Bee-2896'): 2,
         Redditor(name='SantiagoCoffee'): 2,
         Redditor(name='droneauto'): 7,
         Redditor(name='Legal_Philosopher_26'): 4,
         Redditor(name='harlando-