<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Reddit Data using Reddit API</h1>

In this live coding session, we leverage the Python Reddit API Wrapper (`PRAW`) to retrieve data from subreddits on [Reddit](https://www.reddit.com), and perform sentiment analysis using [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) from [HuggingFace ( 🤗 the GitHub of Machine Learning )](https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/), powered by [transformer](https://arxiv.org/pdf/1706.03762.pdf).

## Objectives

At the end of the session, you will 

- know how to work with APIs
- feel more comfortable navigating thru documentation, even inspecting the source code
- understand what a `pipeline` object is in HuggingFace
- perform sentiment analysis using `pipeline`
- run a python script in command line and get the results

## How to Submit

- At the end of each task, commit* the work into the repository you created before the assignment
- After completing all three tasks, make sure to push the notebook containing all code blocks and output cells to your repository you created before the assignment
- Submit the link to the notebook in Canvas

\***NEVER** commit a notebook displaying errors unless it is instructed otherwise. However, commit often; recall git ABC = **A**lways **B**e **C**ommitting.

## Tasks

### Task I: Instantiate a Reddit API Object

The first task is to instantiate a Reddit API object using [PRAW](https://praw.readthedocs.io/en/stable/), through which you will retrieve data. PRAW is a wrapper for [Reddit API](https://www.reddit.com/dev/api) that makes interacting with the Reddit API easier unless you are already an expert of [`requests`](https://docs.python-requests.org/en/latest/).

#### 1. Install packages

Please ensure you've ran all the cells in the `imports.ipynb`, located [here](https://github.com/FourthBrain/MLE-8/blob/main/assignments/week-3-analyze-sentiment-subreddit/imports.ipynb), to make sure you have all the required packages for today's assignment.

####  2. Create a new app on Reddit 

Create a new app on Reddit and save secret tokens; refer to [post in medium](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c) for more details.

- Create a Reddit account if you don't have one, log into your account.
- To access the API, we need create an app. Slight updates, on the website, you need to navigate to `preference` > `app`, or click [this link](https://www.reddit.com/prefs/apps) and scroll all the way down. 
- Click to create a new app, fill in the **name**, choose `script`, fill in  **description** and **redirect uri** ( The redirect URI is where the user is sent after they've granted OAuth access to your application (more info [here](https://github.com/reddit-archive/reddit/wiki/OAuth2)) For our purpose, you can enter some random url, e.g., www.google.com; as shown below.


    <img src="https://miro.medium.com/max/700/1*lRBvxpIe8J2nZYJ6ucMgHA.png" width="500"/>
- Jot down `client_id` (left upper corner) and `client_secret` 

    NOTE: CLIENT_ID refers to 'personal use script" and CLIENT_SECRET to secret.
    
    <div>
    <img src="https://miro.medium.com/max/700/1*7cGAKth1PMrEf2sHcQWPoA.png" width="300"/>
    </div>

- Create `secrets_reddit.py` in the same directory with this notebook, fill in `client_id` and `secret_id` obtained from the last step. We will need to import those constants in the next step.
    ```
    REDDIT_API_CLIENT_ID = "client_id"
    REDDIT_API_CLIENT_SECRET = "secret_id"
    REDDIT_API_USER_AGENT = "any string except bot; ex. My User Agent"
    ```
- Add `secrets_reddit.py` to your `.gitignore` file if not already done. NEVER push credentials to a repo, private or public. 

#### 3. Instantiate a `Reddit` object

Now you are ready to create a read-only `Reddit` instance. Refer to [documentation](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html) when necessary.

In [7]:
import praw
import secrets_reddit

# Create a Reddit object which allows us to interact with the Reddit API
reddit = praw.Reddit(
    client_id = secrets_reddit.REDDIT_API_CLIENT_ID,
    client_secret = secrets_reddit.REDDIT_API_CLIENT_SECRET,
    user_agent = secrets_reddit.REDDIT_API_USER_AGENT
)

In [8]:
print(reddit) 

<praw.reddit.Reddit object at 0x109bd3550>


<details>
<summary>Expected output:</summary>   

```<praw.reddit.Reddit object at 0x10f8a0ac0>```
</details>

#### 4. Instantiate a `subreddit` object

Lastly, create a `subreddit` object for your favorite subreddit and inspect the object. The expected output you will see ar from `r/machinelearning` unless otherwise specified.

In [9]:
subreddit = reddit.subreddit('todayilearned')

What is the display name of the subreddit?

In [10]:
# display the subreddit name
print(subreddit.display_name)

todayilearned


<details>
<summary>Expected output:</summary>   

    todayilearned
</details>

How about its title, is it different from the display name?

In [11]:
# display the subreddit title
print(subreddit.title) 

Today I Learned (TIL)


<details>
<summary>Expected output:</summary>   

    Today I Learned
</details>

Print out the description of the subreddit:

In [12]:
# display the subreddit description
print(subreddit.description)

[](http://www.reddit.com/r/aww/#newlink)
[New to reddit? Click here!](/wiki/reddit_101)



* You learn something new every day; what did *you* learn today?
 
* Submit interesting and **specific facts** that you just found out (not broad information you looked up, TodayILearned is not [/r/wikipedia](/r/wikipedia)).
 
#Posting rules#
 
1. **Submissions must be verifiable**. *Please link directly to a reliable source that supports every claim in your post title.* **Images alone do not count as valid references.** Videos are fine so long as they come from reputable sources (e.g. BBC, Discovery, etc).
1. **No personal opinions, anecdotes or subjective statements** (e.g "TIL xyz is a great movie").

1. **No recent sources.** Any sources (blog, article, press release, video, etc.) with a publication date more recent than two months are not allowed.
 
1. No politics, soapboxing, or agenda based submissions. This includes (but is not limited to) submissions related to: 
   1. Recent political i

<details>
<summary>Expected output:</summary>

    **[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
    --------
    +[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
    --------
    +[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
    --------
    +[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
    --------
    +[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict
</details>

### Task II: Parse comments

#### 1. Top Posts of All Time

Find titles of top 10 posts of **all time** from your favorite subreddit. Refer to [Obtain Submission Instances from a Subreddit Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html)) if necessary. Verify if the titles match what you read on Reddit.

In [13]:
# try run this line, what do you see? press q once you are done
?subreddit.top 

[0;31mSignature:[0m
[0msubreddit[0m[0;34m.[0m[0mtop[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtime_filter[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'all'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mgenerator_kwargs[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mint[0m[0;34m,[0m [0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mstr[0m[0;34m][0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mIterator[0m[0;34m[[0m[0mAny[0m[0;34m][0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a :class:`.ListingGenerator` for top items.

:param time_filter: Can be one of: ``"all"``, ``"day"``, ``"hour"``,
    ``"month"``, ``"week"``, or ``"year"`` (default: ``"all"``).

:raises: :py:class:`ValueError` if ``time_filter`` is invalid.

Additional keyword arguments are passed in the initialization of
:class:`.ListingGenerator`.

This method can be used

In [28]:
# the previous cell returns documentation for using the .top method.
# the output matches the top tdil posts here: https://www.reddit.com/r/todayilearned/top/?t=all
tdil_alltime_posts = subreddit.top(time_filter='all', limit=10)
for post in tdil_alltime_posts:
    print(post.title)

TIL During an interview with Stephen Hawking, the camera operator yanked a cable causing an alarm and Hawking to slump forward. Worried they had killed him, everyone rushed over to find Hawking giggling at his own joke. The alarm was from an office computer losing power.
TIL Genghis Khan would marry off a daughter to the king of an allied nation. Then he would assign his new son in law to military duty in the Mongol wars, while his daughter took over the rule. Most sons in law died in combat, giving his daughters complete control of these nations
TIL the FBI has struggled to hire hackers because of the FBI hiring rule that the applicant must not have used marijuana during the last 3 years.
TIL After Col. Shaw died in battle, Confederates buried him in a mass grave as an insult for leading black soldiers. Union troops tried to recover his body, but his father sent a letter saying "We would not have his body removed from where it lies surrounded by his brave and devoted soldiers."
TIL a 

<details> <summary>Expected output:</summary>

    [Project] From books to presentations in 10s with AR + ML
    [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
    [R] First Order Motion Model applied to animate paintings
    [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
    [D] This AI reveals how much time politicians stare at their phone at work
    [D] Types of Machine Learning Papers
    [D] The machine learning community has a toxicity problem
    [Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
    [P] Using oil portraits and First Order Model to bring the paintings back to life
    [D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG)    
</details>

#### 2. Top 10 Posts of This Week

What are the titles of the top 10 posts of **this week** from your favorite subreddit?

In [29]:
# output can be validated here: https://www.reddit.com/r/todayilearned/top/?t=week
tdil_topweek_posts = subreddit.top(time_filter='week', limit=10)
for post in tdil_topweek_posts:
    print(post.title)

TIL no child has been harmed or killed by poisoned or dangerous Halloween candy.
TIL cats were a common wedding gift among Vikings due to their association with the goddess of luck, Freyja. Men favored women who loved cats, believing that it increased the likelihood of a happy marriage.
TIL about millionaire Wellington Burt, who died in 1919 and deliberately held back his enormous fortune. His will denied any inheritance until 21 years after the death of his last surviving grandchild. The money sat in a trust for 92 years, until 12 descendants finally shared $110 million in 2011.
TIL about death flights. Captives would be loaded into a plane, stripped, and pushed out over the ocean. These account for 1500-2000 of the 20k-30k estimated to have been disappeared by the Argentine military between 1976 and 1983 in the Argentine Dirty War.
TIL that in 1996 a 7-year-old Californian girl tried to fly an airplane across the US and crashed in a thunderstorm, killing her. This resulted in a law b

<details><summary>Expected output:</summary>

    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    [R] Meta is releasing a 175B parameter language model
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    [P] T-SNE to view and order your Spotify tracks
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    [D] Do you use NLTK or Spacy for text preprocessing?
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
</details>

💽❓ Data Question:

Check out what other attributes the `praw.models.Submission` class has in the [docs](https://praw.readthedocs.io/en/stable/code_overview/models/submission.html). 

1. After having a chance to look through the docs, is there any other information that you might want to extract? How might this additional data help you?

Answer: If my goal were to better understand how popularity/rankings are quantified, then I would choose to look at attributes such as:
1. Score
2. Up-vote ratio
3. Num_comments
4. Comments

These additional attributes would give me information about user engagement on these posts.

Write a sample piece of code below extracting three additional pieces of information from the submission below.

In [60]:
# Additional information
import pandas as pd

tdil_posts = []
for post in subreddit.top(limit=10):
    tdil_posts.append(
    [   post.id,
        post.created,
        post.title, 
        post.score,
        post.num_comments, 
        post.upvote_ratio
    ]
        )
posts = pd.DataFrame(
    tdil_posts,
    columns=
    [ 
        'id',
        'created', 
        'title', 
        'score', 
        'num_comments', 
        'upvote_ratio'
    ]   
)

# convert the created column to a datetime object
posts["created"] = pd.to_datetime(posts["created"], unit="s")

posts.head(10)

Unnamed: 0,id,created,title,score,num_comments,upvote_ratio
0,hbrimd,2020-06-19 01:35:04,"TIL During an interview with Stephen Hawking, ...",163096,1829,0.97
1,lgl7ag,2021-02-10 03:47:28,TIL Genghis Khan would marry off a daughter to...,161994,3292,0.95
2,gjmajm,2020-05-14 13:25:42,TIL the FBI has struggled to hire hackers beca...,156709,7824,0.96
3,7pbzcb,2018-01-10 01:13:53,"TIL After Col. Shaw died in battle, Confederat...",155469,4237,0.93
4,i8pq5h,2020-08-13 00:24:53,"TIL a fan drove three hours to deliver rapper,...",154782,3156,0.93
5,8ih4tq,2018-05-10 18:34:18,TIL that in 1916 there was a proposed Amendmen...,153133,5247,0.93
6,tlmlnw,2022-03-23 23:54:05,TIL that the Animal Planet reality series ‘Riv...,151908,3613,0.95
7,eb15tp,2019-12-15 16:47:11,TIL actor Robert Pattinson dealt with an obses...,149162,2759,0.96
8,ejwxed,2020-01-04 13:59:08,TIL that millennial dads are spending 3 times ...,147997,7010,0.95
9,i7irad,2020-08-11 01:53:49,TIL The cast of FRIENDS each made $1M per epis...,147077,5499,0.94


💽❓ Data Question:

2. Is there any information available that might be a concern when it comes to Ethical Data?

Answer: While the subreddit I reviewed is relatively benign in regards to inflammatory content, it is still managed by a small group of content moderators who have the power to approve/disapprove of submitted content. The submissions are basically gated, so all comments should be scrutinized for various types of biases. 

#### 3. Comment Code

Add comments to the code block below to describe what each line of the code does (Refer to [Obtain Comment Instances Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html) when necessary). The code is adapted from [this tutorial](https://praw.readthedocs.io/en/stable/tutorials/comments.html)

The purpose is 
1. to understand what the code is doing 
2. start to comment your code whenever it is not self-explantory if you have not (others will thank you, YOU will thank you later 😊) 

In [34]:
%%time
from praw.models import MoreComments

# Create a list object to hold top comments for each post
top_comments = []

# Outer loop iterates through each post and will only show the top 10 posts
for submission in subreddit.top(limit=10):
    # Innner loop iterates through each comment in the post. 
    for top_level_comment in submission.comments:
        # Use the isinstance function to check whether the comment is a MoreComments object. If so, continue.
        if isinstance(top_level_comment, MoreComments):
            continue
        # Add the comment to the list
        top_comments.append(top_level_comment.body)

CPU times: user 690 ms, sys: 56.1 ms, total: 746 ms
Wall time: 45.7 s


#### 4. Inspect Comments

How many comments did you extract from the last step? Examine a few comments. 

In [37]:
#How many comments were extracted?
len(top_comments)

462

In [52]:
print(top_comments[2])

>*Following Hawking's death in March 2018, BBC science correspondent Pallab Ghosh shared an anecdote of his first interview with the physics luminary at Cambridge University in 2004.*

>*Seeking to adjust his lighting, the camera operator yanked a cable from a socket, at which point an alarm sounded and Hawking slumped forward as if unplugged from his life support. The anxious visitors rushed over to find Hawking very much alive and giddy at his joke – the alarm was simply over the office computer losing its power supply.*

I can imagine for a moment, that camera man saw his life flash before his eyes.


In [53]:
import random

[random.choice(top_comments) for i in range(3)]

['TIL I cannot work at the FBI until May 14th 2023',
 'Is it true that the cast of Fraiser were paid even more than that per episode?',
 'Legend']

<details> <summary>Some of the comments from `r/machinelearning` subreddit are:</summary>

    ['Awesome visualisation',
    'Similar to a stack or connected neurons.',
    'Will this Turing pass the Turing Test?']
</details>

💽❓ Data Question:

3. After having a chance to review a few samples of 5 comments from the subreddit, what can you say about the data? 

HINT: Think about the "cleanliness" of the data, the content of the data, think about what you're trying to do - how does this data line up with your goal?

Answer:
The data contains different entities, such as names, people, and places, that would be interesting to extract to perform some type of topic modeling. For example, if I wanted to dive deeper into the world of Stephen Hawking, the comments would provide a window into some of the more poignant parts of his life. However, these comments are told by people who likely did not have direct contact with Hawking. The results of any topic modeling, or named entity recognition (NER), would need further review before drawing definitive conclusions. As previously mentioned

#### 5. Extract Top Level Comment from Subreddit `TSLA`.

Write your code to extract top level comments from the top 10 topics of a time period, e.g., year, from subreddit `TSLA` and store them in a list `top_comments_tsla`.  

In [55]:
# Create a list object to hold top comments for each post, and then populate the list by iterating through each post
top_comments_tsla = []
for submission in reddit.subreddit('tsla').top(time_filter = 'year', limit=10):
    for top_level_comment in submission.comments:
        if isinstance(top_level_comment, MoreComments):
            continue
        top_comments_tsla.append(top_level_comment.body)


In [56]:
len(top_comments_tsla) # Expected: 174 for r/machinelearning

109

In [57]:
[random.choice(top_comments_tsla) for i in range(3)]

['[removed]',
 "I don't see my split...",
 'Who believes that crazy rumor he is going to announce something on 12/9?']

<details>
<summary>Some of the comments from `r/TSLA` subreddit:</summary>

    ['I bought puts',
    '100%',
    'Yes. And I’m bag holding 1200 calls for Friday and am close to throwing myself out the window']
</details>

💽❓ Data Question:

4. Now that you've had a chance to review another subreddits comments, do you see any differences in the kinds of comments either subreddit has - and how might this relate to bias?

Answer: Absolutely. It is more apparent in the TSLA subreddit that there are stronger currents of bias. I believe this has to do with the purpose of the subreddit itself, which is to provide a forum for Tesla enthusiasts. This is different than the r/TIL subreddit, which isn't centered around a product. For example, in just a few comments in the TSLA subreddit I can see a temptation to use the information to draw immediate conclusions about the health of Tesla the company (confirmation bias). Selection and availability bias are also evident. If my goal were to analyze customer sentiment surrounding Tesla, then I wouldn't use just this dataset to analyze Tesla. I would search for other data sources to balance r/TSLA's inherent bias.

### Task III: Sentiment Analysis

Let us analyze the sentiment of comments scraped from `r/TSLA` using a pre-trained HuggingFace model to make the inference. Take a [Quick tour](https://huggingface.co/docs/transformers/quicktour). 

#### 1. Import `pipeline`

In [96]:
from transformers import pipeline

#### 2. Create a Pipeline to Perform Task "sentiment-analysis"

In [113]:
#Create a pipeline object using the twitter-roberta-base sentiment model - trained on 58m tweets
sentiment_model = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


#### 3. Get one comment from list `top_comments_tsla` from Task II - 5.

In [114]:
import numpy as np

np.random.seed(42)
comment = np.random.choice(top_comments_tsla)

In [115]:
print(comment)

Got 100 shares and also got a bunch of Put options.  Protective put.


The example comment is: `'Bury Burry!!!!!'`. Print out what you get. For reproducibility, use the same comment in the next step; consider setting a seed.

#### 4. Make Inference!

In [116]:
sentiment = sentiment_model(comment)

What is the type of the output `sentiment`?

In [117]:
type(comment)

numpy.str_

```
Answer: The type of the output 'sentiment' is a numpy.string_ because I set the seed using the Numpy module
```

In [118]:
print(f'The comment: {comment}')
print(f'Predicted Label is {sentiment[0]["label"]} and the score is {sentiment[0]["score"]:.3f}')

The comment: Got 100 shares and also got a bunch of Put options.  Protective put.
Predicted Label is POSITIVE and the score is 0.620


For the example comment, the output is:

    The comment: Bury Burry!!!!!
    Predicted Label is NEGATIVE and the score is 0.989

🖥️❓ Model Question:

1. What does the score represent?

### Task IV: Put All Together

Let's pull all the piece together, create a simple script that does 

- get the subreddit
- get comments from the top posts for given subreddit
- run sentiment analysis 

#### Complete the Script

Once you complete the code, running the following block writes the code into a new Python script and saves it as `top_tlsa_comment_sentiment.py` under the same directory with the notebook. 

In [146]:
%%writefile top_tsla_comment_sentiment.py

import secrets_reddit
import random
import praw

from typing import Dict, List

from praw import Reddit
from praw.models.reddit.subreddit import Subreddit
from praw.models import MoreComments

from transformers import pipeline


def get_subreddit(display_name:str) -> Subreddit:
    """Get subreddit object from display name

    Args:
        display_name (str): [description]

    Returns:
        Subreddit: [description]
    """
    reddit = praw.Reddit(
        client_id=secrets_reddit.REDDIT_API_CLIENT_ID,        
        client_secret=secrets_reddit.REDDIT_API_CLIENT_SECRET,
        user_agent=secrets_reddit.REDDIT_API_USER_AGENT
        )
    
    subreddit = reddit.subreddit(display_name)
    return subreddit

def get_comments(subreddit:Subreddit, limit:int=3) -> List[str]:
    """ Get comments from subreddit

    Args:
        subreddit (Subreddit): [description]
        limit (int, optional): [description]. Defaults to 3.

    Returns:
        List[str]: List of comments
    """
    top_comments = []
    for submission in subreddit.top(limit=limit):
        for top_level_comment in submission.comments:
            if isinstance(top_level_comment, MoreComments):
                continue
            top_comments.append(top_level_comment.body)
    return top_comments

def run_sentiment_analysis(comment:str) -> Dict:
    """Run sentiment analysis on comment using default distilbert model
    
    Args:
        comment (str): [description]
        
    Returns:
        str: Sentiment analysis result
    """
    sentiment_model = pipeline("sentiment-analysis")
    sentiment = sentiment_model(comment)
    return sentiment[0]


if __name__ == '__main__':
    subreddit = get_subreddit('tsla')
    comments = get_comments(subreddit)
    comment = random.choice(comments)
    sentiment = run_sentiment_analysis(comment)
    
    print(f'The comment: {comment}')
    print(f'Predicted Label is {sentiment["label"]} and the score is {sentiment["score"]:.3f}')

Writing top_tsla_comment_sentiment.py


Run the following block to see the output.

In [148]:
!python top_tsla_comment_sentiment.py

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The comment: BEEEEET-CO-NNNNNNNNNNEEEEEEEEEEEEEEEECT
Predicted Label is NEGATIVE and the score is 0.995


<details><summary> Expected output:</summary>

    No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
    The comment: When is DOGE flying
    Predicted Label is POSITIVE and the score is 0.689
</details>

💽❓ Data Question:

5. Is the subreddit active? About how many posts or threads per day? How could you find this information?

Answer: If by active we mean there are active users, then r/TSLA is active. You can quickly look at active_user_count when you use the built-in vars() function on the tsla subreddit. Basically, tsla_subreddit.active_user_count. It appears that getting posts (or threads) per day is a little more challening. I think the best way to do this would be to bring the comments into a dataframe and conduct some lightweight EDA to determine the mean/median of posts per day. 

💽❓ Data Question:

6. Does there seem to be a large distribution of posters or a smaller concentration of posters who are very active? What kind of impact might this have on the data?

Answer: There are a smaller concentraction of posters (12) who are very active. This presents a narrower viewpoint of the subject matter, which is organized around Tesla's stock. Without more diversity of content and opinion, then it's difficult to tell if the data has a hidden agenda. I would not rely on this data alone to help me determine the short and long-term health of Tesla's stock. 