<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Reddit Data using Reddit API</h1>

In this live coding session, we leverage the Python Reddit API Wrapper (`PRAW`) to retrieve data from subreddits on [Reddit](https://www.reddit.com), and perform sentiment analysis using [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) from [HuggingFace ( 🤗 the GitHub of Machine Learning )](https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/), powered by [transformer](https://arxiv.org/pdf/1706.03762.pdf).

## Objectives

At the end of the session, you will 

- know how to work with APIs
- feel more comfortable navigating thru documentation, even inspecting the source code
- understand what a `pipeline` object is in HuggingFace
- perform sentiment analysis using `pipeline`
- run a python script in command line and get the results

## How to Submit

- At the end of each task, commit* the work into the repository you created before the assignment
- After completing all three tasks, make sure to push the notebook containing all code blocks and output cells to your repository you created before the assignment
- Submit the link to the notebook in Canvas

\***NEVER** commit a notebook displaying errors unless it is instructed otherwise. However, commit often; recall git ABC = **A**lways **B**e **C**ommitting.

## Tasks

### Task I: Instantiate a Reddit API Object

The first task is to instantiate a Reddit API object using [PRAW](https://praw.readthedocs.io/en/stable/), through which you will retrieve data. PRAW is a wrapper for [Reddit API](https://www.reddit.com/dev/api) that makes interacting with the Reddit API easier unless you are already an expert of [`requests`](https://docs.python-requests.org/en/latest/).

#### 1. Install packages

Please ensure you've ran all the cells in the `imports.ipynb`, located [here](https://github.com/FourthBrain/MLE-8/blob/main/assignments/week-3-analyze-sentiment-subreddit/imports.ipynb), to make sure you have all the required packages for today's assignment.

####  2. Create a new app on Reddit 

Create a new app on Reddit and save secret tokens; refer to [post in medium](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c) for more details.

- Create a Reddit account if you don't have one, log into your account.
- To access the API, we need create an app. Slight updates, on the website, you need to navigate to `preference` > `app`, or click [this link](https://www.reddit.com/prefs/apps) and scroll all the way down. 
- Click to create a new app, fill in the **name**, choose `script`, fill in  **description** and **redirect uri** ( The redirect URI is where the user is sent after they've granted OAuth access to your application (more info [here](https://github.com/reddit-archive/reddit/wiki/OAuth2)) For our purpose, you can enter some random url, e.g., www.google.com; as shown below.


    <img src="https://miro.medium.com/max/700/1*lRBvxpIe8J2nZYJ6ucMgHA.png" width="500"/>
- Jot down `client_id` (left upper corner) and `client_secret` 

    NOTE: CLIENT_ID refers to 'personal use script" and CLIENT_SECRET to secret.
    
    <div>
    <img src="https://miro.medium.com/max/700/1*7cGAKth1PMrEf2sHcQWPoA.png" width="300"/>
    </div>

- Create `secrets_reddit.py` in the same directory with this notebook, fill in `client_id` and `secret_id` obtained from the last step. We will need to import those constants in the next step.
    ```
    REDDIT_API_CLIENT_ID = "client_id"
    REDDIT_API_CLIENT_SECRET = "secret_id"
    REDDIT_API_USER_AGENT = "any string except bot; ex. My User Agent"
    ```
- Add `secrets_reddit.py` to your `.gitignore` file if not already done. NEVER push credentials to a repo, private or public. 

#### 3. Instantiate a `Reddit` object

Now you are ready to create a read-only `Reddit` instance. Refer to [documentation](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html) when necessary.

In [1]:
import praw
import secret_reddit

# Create a Reddit object which allows us to interact with the Reddit API
reddit = praw.Reddit(
    client_id=secret_reddit.REDDIT_API_CLIENT_ID,
    client_secret=secret_reddit.REDDIT_API_CLIENT_SECRET,
    user_agent=secret_reddit.REDDIT_API_USER_AGENT,
    username=secret_reddit.USERNAME,
    password=secret_reddit.PASSWORD,
)

In [2]:
print(reddit) 

<praw.reddit.Reddit object at 0x7f9f69d6df40>


<details>
<summary>Expected output:</summary>   

```<praw.reddit.Reddit object at 0x10f8a0ac0>```
</details>

#### 4. Instantiate a `subreddit` object

Lastly, create a `subreddit` object for your favorite subreddit and inspect the object. The expected output you will see ar from `r/machinelearning` unless otherwise specified.

In [3]:
# YOUR CODE HERE
subreddit = reddit.subreddit("StockMarket")

What is the display name of the subreddit?

In [4]:
# YOUR CODE HERE
subreddit.display_name

'StockMarket'

<details>
<summary>Expected output:</summary>   

    machinelearning
</details>

How about its title, is it different from the display name?

In [5]:
# YOUR CODE HERE
print(subreddit.title)

r/StockMarket - Reddit's Front Page of the Stock Market


<details>
<summary>Expected output:</summary>   

    Machine Learning
</details>

#### My Answer:
Yes, the name and title are different, as follows from their definition in the subreddit.  

Print out the description of the subreddit:

In [6]:
# YOUR CODE HERE
print(subreddit.description)

######[Home](http://www.reddit.com#top) [hot](http://www.reddit.com/r/StockMarket/hot) [new](http://www.reddit.com/r/StockMarket/new/) [top](http://www.reddit.com/r/StockMarket/top/)

*****

**Objectives:** 

Welcome to /r/StockMarket! Our objective is to provide short and mid term trade ideas, market analysis & commentary for active traders and investors. Posts about equities, options, forex, futures, analyst upgrades & downgrades, technical and fundamental analysis, and the stock market in general are all welcome.


If you are new to the markets, you may wish to start with some of the resources in our [stock market toolkit](https://reddit.com/r/StockMarket/comments/g8t1sy/massive_open_source_collection_of_stock_market/) or see our submission guidelines and stock market resources section below which have some helpful links:

________________________________________

**Submission Guidelines**

1. Please search before posting: If you are new to the markets and about to ask how to get st

<details>
<summary>Expected output:</summary>

    **[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
    --------
    +[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
    --------
    +[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
    --------
    +[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
    --------
    +[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict
</details>

### Task II: Parse comments

#### 1. Top Posts of All Time

Find titles of top 10 posts of **all time** from your favorite subreddit. Refer to [Obtain Submission Instances from a Subreddit Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html)) if necessary. Verify if the titles match what you read on Reddit.

In [7]:
# try run this line, what do you see? press q once you are done
?subreddit.top 

In [8]:
# YOUR CODE HERE
for sub in subreddit.top(time_filter="all", limit = 10):
    print(sub.title)

What do you think about it?
The stock market is easy
U/Deepfuckingvalue is having to testify in congress. While I hope he seeks legal advice. If he doesn't I hope his go to answer is...I like the stock.
Historic recurrence
No matter the amount, anything positive is great
Just doing my part sir
Crazy to think about
The stock market explained in 13 seconds
When in doubt, zoom out
In 1998, Google’s founders got their first investment: a $100,000 check. They didn’t have a bank account. They went to Burger King to celebrate.


<details> <summary>Expected output:</summary>

    [Project] From books to presentations in 10s with AR + ML
    [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
    [R] First Order Motion Model applied to animate paintings
    [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
    [D] This AI reveals how much time politicians stare at their phone at work
    [D] Types of Machine Learning Papers
    [D] The machine learning community has a toxicity problem
    [Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
    [P] Using oil portraits and First Order Model to bring the paintings back to life
    [D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG)    
</details>

#### 2. Top 10 Posts of This Week

What are the titles of the top 10 posts of **this week** from your favorite subreddit?

In [9]:
# YOUR CODE HERE
for sub in subreddit.top(time_filter="week", limit = 10):
    print(sub.title)

What's next...
What are your thoughts on this?
WTF Canada
Elizebeth Warren, US Senators wrote a letter, asks the Fed to stop raising rates at an alarming pace.
Pass this along to $META employees too. Maybe they can take their fact-checking skills to the Vegas strip… 😂
Facebook parent company Meta is now the worst performer in the S&P 500 this year
When the “news” headline is plain ridiculous
Thoughts on Carvana (CVNA)? Stock is down 97% YTD - Morgan Stanley had a price target of $420/share not too long ago
China decides to indefinitely delay the publication of headline third-quarter indicators, including GDP. China becoming much less transparent about its economic performance - quietly discontinuing thousands of statistical series.
Fed: U.S. home prices overvalued, risk of sharp decline


<details><summary>Expected output:</summary>

    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    [R] Meta is releasing a 175B parameter language model
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    [P] T-SNE to view and order your Spotify tracks
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    [D] Do you use NLTK or Spacy for text preprocessing?
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
</details>

💽❓ Data Question:

Check out what other attributes the `praw.models.Submission` class has in the [docs](https://praw.readthedocs.io/en/stable/code_overview/models/submission.html). 

1. After having a chance to look through the docs, is there any other information that you might want to extract? How might this additional data help you?

Write a sample piece of code below extracting three additional pieces of information from the submission below.

In [10]:
# YOUR CODE HERE

for sub in subreddit.top(time_filter="week", limit = 10):
    print("Number of comments " + str(sub.num_comments) + ", " 
         "Number of upvotes " + str(sub.score) + ", "
         "Is this an original content? " + str(sub.is_original_content))


Number of comments 144, Number of upvotes 2363, Is this an original content? False
Number of comments 160, Number of upvotes 1911, Is this an original content? False
Number of comments 614, Number of upvotes 1871, Is this an original content? False
Number of comments 619, Number of upvotes 1814, Is this an original content? False
Number of comments 118, Number of upvotes 1711, Is this an original content? False
Number of comments 176, Number of upvotes 1265, Is this an original content? False
Number of comments 158, Number of upvotes 1094, Is this an original content? False
Number of comments 396, Number of upvotes 1096, Is this an original content? False
Number of comments 104, Number of upvotes 1004, Is this an original content? False
Number of comments 376, Number of upvotes 788, Is this an original content? False


#### My comments about the other atributes checked:
For this example and the attributes I chose, the gap between the upvoting and the comments is noticeable. In addition, the top comment of the week received fewer comments than others, like the third or fourth ones, for instance. It is easier to upvote a post than to write a comment, which requires knowledge/strong opinion on the matter. So what makes people upvote these titles? And what makes people in the third/fourth title write so many comments? Investigating this a little further revealed no surprise since they were responding to some of the week's top stories (according to the headlines in the paper, too), which affects people's savings and housing value. At least the top 5 titles, which I've checked manualy, deal with opinions about the Fed's new interest rate this week and housing prices.

In addition, none of the top ten comments of this week were original content, meaning they were discussed before. This is no surprise as they are "hot topics" that interest many. 


💽❓ Data Question:

2. Is there any information available that might be a concern when it comes to Ethical Data?

#### My Answer:
In the subreddit I chose, and according to the answeres above, I don't see ethical data concerns.

#### 3. Comment Code

Add comments to the code block below to describe what each line of the code does (Refer to [Obtain Comment Instances Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html) when necessary). The code is adapted from [this tutorial](https://praw.readthedocs.io/en/stable/tutorials/comments.html)

The purpose is 
1. to understand what the code is doing 
2. start to comment your code whenever it is not self-explantory if you have not (others will thank you, YOU will thank you later 😊) 

In [11]:
%%time
from praw.models import MoreComments

# MoreComments is an object represents “load more comments”+“continue this thread” links. 
# MoreComments has no body. 

# YOUR COMMENT HERE
# Initiate a list of the content (=body) of the top comments of 10 top submissions
top_comments = []

# YOUR COMMENT HERE
# Iterate over the top ten submission (posts), to add their content to our list
for submission in subreddit.top(limit=10):
    # YOUR COMMENT HERE
    # Iterate over the top-level comments only
    for top_level_comment in submission.comments:
        # YOUR COMMENT HERE
        # Ignore MoreComments  
        if isinstance(top_level_comment, MoreComments):
            continue
        # YOUR COMMENT HERE
        # Add body to our list
        top_comments.append(top_level_comment.body)

CPU times: user 655 ms, sys: 45.2 ms, total: 700 ms
Wall time: 21.9 s


#### 4. Inspect Comments

How many comments did you extract from the last step? Examine a few comments. 

In [12]:
#YOUR CODE HERE  # the answer may vary 693 for r/machinelearning
print(len(top_comments))

1348


In [13]:
import random

[random.choice(top_comments) for i in range(5)]

['$KMPH 🔥( under 10😮)has new Catalyst info read about FDA Approval for med !!!!\n🏋️\u200d♀️🏋️\u200d♀️🏋️\u200d♀️ Needs to get the word out ?bc it’s still dipped to 🚀\nSo, I guess best time to buy in \nKeep spread the word ?\nI’m not a pro but this seems like a no brainer 💎💎💎🔥\nGood purpose mental health related 👍',
 'What happened in the 90s to allow this apparent takeoff/crash cycle?',
 ' Robinhood is trash and will fail as a company and their CEO will likely be scrubbing toilets somewhere. Perhaps at his local gamestop location?',
 'Rich asf boi',
 'By buying stocks that suffer from #NakedShorts like GME. CNBC said so yesterday']

<details> <summary>Some of the comments from `r/machinelearning` subreddit are:</summary>

    ['Awesome visualisation',
    'Similar to a stack or connected neurons.',
    'Will this Turing pass the Turing Test?']
</details>

💽❓ Data Question:

3. After having a chance to review a few samples of 5 comments from the subreddit, what can you say about the data? 

HINT: Think about the "cleanliness" of the data, the content of the data, think about what you're trying to do - how does this data line up with your goal?

#### My Answer:
These comments are written by people with strong opinions on these topics and thus cannot represent a whole population. In particular, one cannot make conclusions with certainty. However, investigating them further (upvoting, sub-comments, etc.) concerning their content and opposite opinions can reveal trends and allow more coherent findings. 

#### 5. Extract Top Level Comment from Subreddit `TSLA`.

Write your code to extract top level comments from the top 10 topics of a time period, e.g., year, from subreddit `TSLA` and store them in a list `top_comments_tsla`.  

In [14]:
# YOUR CODE HERE
from praw.models import MoreComments

# Creating a subreddit TSLA and initiate list of top comments content
subreddit = reddit.subreddit("TSLA")
top_comments_tsla = []

for submission in subreddit.top(time_filter="month", limit=10):
    # YOUR COMMENT HERE
    # Iterate over the top-level comments only
    for top_level_comment in submission.comments:
        if isinstance(top_level_comment, MoreComments):
            continue
        top_comments_tsla.append(top_level_comment.body)

In [15]:
len(top_comments_tsla) # Expected: 174 for r/machinelearning

43

In [16]:
[random.choice(top_comments_tsla) for i in range(3)]

['Yeah and they were laughing at you on the way down you regard',
 'Aside from BYD there is no competition to TSLA so they will continue to grow exponentially!',
 'I sell $TSLA options cause it makes me money.  I have never owned a tesla, cause I have a 2015 honda accord that is paid for , and I will drive it till it dies.  I had previously a 2002 honda accord with 300k miles.  \n\nI have made money on $TSLA stock, because I believe in /u/elonmuskofficial and I can see for the past 3 years , the largest financial institutions, sovereign banks like Switzerland own TSLA.  \n\n>2022-08-09 - Swiss National Bank has filed a 13F-HR form disclosing ownership of 3,810,759 shares of Tesla Motors, Inc. (US:TSLA) with total holdings valued at $2,566,241,000 USD as of 2022-06-30. Swiss National Bank had filed a previous 13F-HR on 2022-05-09 disclosing 3,697,259 shares of Tesla Motors, Inc. at a value of $3,984,166,000 USD. This represents a change in shares of 3.07 percent and a change in value of

<details>
<summary>Some of the comments from `r/TSLA` subreddit:</summary>

    ['I bought puts',
    '100%',
    'Yes. And I’m bag holding 1200 calls for Friday and am close to throwing myself out the window']
</details>

💽❓ Data Question:

4. Now that you've had a chance to review another subreddits comments, do you see any differences in the kinds of comments either subreddit has - and how might this relate to bias?

#### My Answer:
Yes :) The language and tone are different, which might suggest different type of people are contributing to each subreddit

### Task III: Sentiment Analysis

Let us analyze the sentiment of comments scraped from `r/TSLA` using a pre-trained HuggingFace model to make the inference. Take a [Quick tour](https://huggingface.co/docs/transformers/quicktour). 

#### 1. Import `pipeline`

In [17]:
from transformers import pipeline # YOUR CODE HERE

#### 2. Create a Pipeline to Perform Task "sentiment-analysis"

In [18]:
sentiment_model = pipeline("sentiment-analysis") # YOUR CODE HERE

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


#### 3. Get one comment from list `top_comments_tsla` from Task II - 5.

In [19]:
comment = random.choice(top_comments_tsla)

In [20]:
comment

'I own two. A model 3 and a model y. It’s worth it for the gas/oil/time savings. They’ve also been relatively maintenance free and just an all around great driving experience that boosts my morale. I also am a shareholder and someday a solar/battery owner.'

The example comment is: `'Bury Burry!!!!!'`. Print out what you get. For reproducibility, use the same comment in the next step; consider setting a seed.

#### 4. Make Inference!

In [21]:
sentiment = sentiment_model(comment)
print(sentiment) # YOUR CODE HERE 

[{'label': 'POSITIVE', 'score': 0.9996745586395264}]


What is the type of the output `sentiment`?

```
YOUR ANSWER HERE
```

This is a list containing one entry - a dictionary. The dictionary had two keys - 'label' and 'score', each have its value ('POSITIVE' and '0.9976595640182495', accordingly)

In [22]:
print(f'The comment: {comment}')
print(f'Predicted Label is {sentiment[0]["label"]} and the score is {sentiment[0]["score"]:.3f}')

The comment: I own two. A model 3 and a model y. It’s worth it for the gas/oil/time savings. They’ve also been relatively maintenance free and just an all around great driving experience that boosts my morale. I also am a shareholder and someday a solar/battery owner.
Predicted Label is POSITIVE and the score is 1.000


For the example comment, the output is:

    The comment: Bury Burry!!!!!
    Predicted Label is NEGATIVE and the score is 0.989

🖥️❓ Model Question:

1. What does the score represent?

#### My Answer:
The score represents the likelihood of the label being correct, with regard to this model.

### Task IV: Put All Together

Let's pull all the piece together, create a simple script that does 

- get the subreddit
- get comments from the top posts for given subreddit
- run sentiment analysis 

#### Complete the Script

Once you complete the code, running the following block writes the code into a new Python script and saves it as `top_tlsa_comment_sentiment.py` under the same directory with the notebook. 

In [29]:
%%writefile top_tlsa_comment_sentiment.py

#import secrets
import secret_reddit as secrets
import random

from typing import Dict, List

from praw import Reddit
from praw.models.reddit.subreddit import Subreddit
from praw.models import MoreComments

from transformers import pipeline


def get_subreddit(display_name:str) -> Subreddit:
    """Get subreddit object from display name

    Args:
        display_name (str): [description]

    Returns:
        Subreddit: [description]
    """
    reddit = Reddit(
        client_id=secrets.REDDIT_API_CLIENT_ID,        
        client_secret=secrets.REDDIT_API_CLIENT_SECRET,
        user_agent=secrets.REDDIT_API_USER_AGENT
        )
    
    subreddit = reddit.subreddit("TSLA") # YOUR CODE HERE
    return subreddit

def get_comments(subreddit:Subreddit, limit:int=3) -> List[str]:
    """ Get comments from subreddit

    Args:
        subreddit (Subreddit): [description]
        limit (int, optional): [description]. Defaults to 3.

    Returns:
        List[str]: List of comments
    """
    top_comments = []
    for submission in subreddit.top(limit=limit):
        for top_level_comment in submission.comments:
            if isinstance(top_level_comment, MoreComments):
                continue
            top_comments.append(top_level_comment.body)
    return top_comments

def run_sentiment_analysis(comment:str) -> Dict:
    """Run sentiment analysis on comment using default distilbert model
    
    Args:
        comment (str): [description]
        
    Returns:
        str: Sentiment analysis result
    """
    sentiment_model = pipeline("sentiment-analysis") # YOUR CODE HERE
    sentiment = sentiment_model(comment)
    return sentiment[0]


if __name__ == '__main__':
    subreddit = get_subreddit("TLSA") # YOUR CODE HERE
    comments = get_comments(subreddit)
    comment = random.choice(comments) # YOUR CODE HERE
    sentiment = run_sentiment_analysis(comment)
    
    print(f'The comment: {comment}')
    print(f'Predicted Label is {sentiment["label"]} and the score is {sentiment["score"]:.3f}')

Overwriting top_tlsa_comment_sentiment.py


Run the following block to see the output.

In [30]:
!python top_tlsa_comment_sentiment.py

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The comment: You sure did. Great heads up. Your awesome!!!
Predicted Label is POSITIVE and the score is 1.000


<details><summary> Expected output:</summary>

    No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
    The comment: When is DOGE flying
    Predicted Label is POSITIVE and the score is 0.689
</details>

💽❓ Data Question:

5. Is the subreddit active? About how many posts or threads per day? How could you find this information?

To verify if a subreddit is active, one can check if there were any posts in the past day/week. To verify how many, one need to loop over all the posts and count them (there is no attribute that does that). Below is an example of such a code, that shoews that this subreddit is active:

In [47]:
# This code is similar to what we saw above, with the count of total number of comments
subreddit = reddit.subreddit("TSLA")
comments = 0

for sub in subreddit.top(time_filter="day"):
    for top_level_comment in submission.comments:
        if isinstance(top_level_comment, MoreComments):
            continue
    comments+= 1

if comments == 0:
    print("This subreddit is not active. No submissions this day")
else:
    print("This subreddit is active. Number of submissions today: " + str(comments))

This subreddit is active. Number of submissions today: 1


💽❓ Data Question:

6. Does there seem to be a large distribution of posters or a smaller concentration of posters who are very active? What kind of impact might this have on the data?

If the number of unique posters is small, the data is bias. One can check the number of unique posters, with regard to the total number of comments, as follows: 


In [48]:
unique_users = []
num_users = 0

# Loop over each submission to find top comments
for submission in subreddit.top(time_filter="month", limit=100):
    for top_level_comment in submission.comments:
        if isinstance(top_level_comment, MoreComments):
            continue
        
        # Add 1 to the num_users and verify if this user is new
        num_users += 1
        a = top_level_comment.author
        if not a in users:
            unique_users.append(a) 

print("Number of unique posters: " + str(len(users)), 
      ", and number of total participants: " + str(num_users))

Number of unique posters: 105 , and number of total participants: 175
