<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Reddit Data using Reddit API</h1>

In this live coding session, we leverage the Python Reddit API Wrapper (`PRAW`) to retrieve data from subreddits on [Reddit](https://www.reddit.com), and perform sentiment analysis using [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) from [HuggingFace ( 🤗 the GitHub of Machine Learning )](https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/), powered by [transformer](https://arxiv.org/pdf/1706.03762.pdf).

## Objectives

At the end of the session, you will 

- know how to work with APIs
- feel more comfortable navigating thru documentation, even inspecting the source code
- understand what a `pipeline` object is in HuggingFace
- perform sentiment analysis using `pipeline`
- run a python script in command line and get the results

## How to Submit

- At the end of each task, commit* the work into the repository you created before the assignment
- After completing all three tasks, make sure to push the notebook containing all code blocks and output cells to your repository you created before the assignment
- Submit the link to the notebook in Canvas

\***NEVER** commit a notebook displaying errors unless it is instructed otherwise. However, commit often; recall git ABC = **A**lways **B**e **C**ommitting.

## Tasks

### Task I: Instantiate a Reddit API Object

The first task is to instantiate a Reddit API object using [PRAW](https://praw.readthedocs.io/en/stable/), through which you will retrieve data. PRAW is a wrapper for [Reddit API](https://www.reddit.com/dev/api) that makes interacting with the Reddit API easier unless you are already an expert of [`requests`](https://docs.python-requests.org/en/latest/).

#### 1. Install packages

Please ensure you've ran all the cells in the `imports.ipynb`, located [here](https://github.com/FourthBrain/MLE-8/blob/main/assignments/week-3-analyze-sentiment-subreddit/imports.ipynb), to make sure you have all the required packages for today's assignment.

####  2. Create a new app on Reddit 

Create a new app on Reddit and save secret tokens; refer to [post in medium](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c) for more details.

- Create a Reddit account if you don't have one, log into your account.
- To access the API, we need create an app. Slight updates, on the website, you need to navigate to `preference` > `app`, or click [this link](https://www.reddit.com/prefs/apps) and scroll all the way down. 
- Click to create a new app, fill in the **name**, choose `script`, fill in  **description** and **redirect uri** ( The redirect URI is where the user is sent after they've granted OAuth access to your application (more info [here](https://github.com/reddit-archive/reddit/wiki/OAuth2)) For our purpose, you can enter some random url, e.g., www.google.com; as shown below.


    <img src="https://miro.medium.com/max/700/1*lRBvxpIe8J2nZYJ6ucMgHA.png" width="500"/>
- Jot down `client_id` (left upper corner) and `client_secret` 

    NOTE: CLIENT_ID refers to 'personal use script" and CLIENT_SECRET to secret.
    
    <div>
    <img src="https://miro.medium.com/max/700/1*7cGAKth1PMrEf2sHcQWPoA.png" width="300"/>
    </div>

- Create `secrets_reddit.py` in the same directory with this notebook, fill in `client_id` and `secret_id` obtained from the last step. We will need to import those constants in the next step.
    ```
    REDDIT_API_CLIENT_ID = "client_id"
    REDDIT_API_CLIENT_SECRET = "secret_id"
    REDDIT_API_USER_AGENT = "any string except bot; ex. My User Agent"
    ```
- Add `secrets_reddit.py` to your `.gitignore` file if not already done. NEVER push credentials to a repo, private or public. 

#### 3. Instantiate a `Reddit` object

Now you are ready to create a read-only `Reddit` instance. Refer to [documentation](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html) when necessary.

In [1]:
import praw
import secrets_reddit

# Create a Reddit object which allows us to interact with the Reddit API
reddit = praw.Reddit(
    client_id = secrets_reddit.REDDIT_API_CLIENT_ID,
    client_secret = secrets_reddit.REDDIT_API_CLIENT_SECRET,
    user_agent = secrets_reddit.REDDIT_API_USER_AGENT,
)

In [2]:
print(reddit) 

<praw.reddit.Reddit object at 0x7ff04877a160>


<details>
<summary>Expected output:</summary>   

```<praw.reddit.Reddit object at 0x10f8a0ac0>```
</details>

#### 4. Instantiate a `subreddit` object

Lastly, create a `subreddit` object for your favorite subreddit and inspect the object. The expected outputs you will see are from `r/machinelearning` unless otherwise specified.

In [3]:
subreddit = reddit.subreddit("machinelearning")

What is the display name of the subreddit?

In [4]:
print(subreddit.display_name)

machinelearning


<details>
<summary>Expected output:</summary>   

    machinelearning
</details>

How about its title, is it different from the display name?

In [5]:
print(subreddit.title)

Machine Learning


<details>
<summary>Expected output:</summary>   

    Machine Learning
</details>

Print out the description of the subreddit:

In [6]:
print(subreddit.description)

**[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
--------
+[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
--------
+[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
--------
+[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
--------
+[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ANews)
--------
***[@slashML on Twitter](https://twitter.com/slashML)***
--------
***[Chat with us on Slack](https://join.slack.com/t/rml-talk/shared_invite/enQtNjkyMzI3NjA2NTY2LWY0ZmRjZjNhYjI5NzYwM2Y0YzZhZWNiODQ3ZGFjYmI2NTU3YjE1ZDU5MzM2ZTQ4ZGJmOTFmNWVkMzFiMzVhYjg)***
--------
**Beginners:**
--------
Please have a look at [our FAQ and Link-Collection](http://www.reddit.com/r/MachineLearning/wiki/index)

[Metacademy](http://www.metacademy.org) is a great resource which compiles le

<details>
<summary>Expected output:</summary>

    **[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
    --------
    +[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
    --------
    +[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
    --------
    +[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
    --------
    +[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict
</details>

### Task II: Parse comments

#### 1. Top Posts of All Time

Find titles of top 10 posts of **all time** from your favorite subreddit. Refer to [Obtain Submission Instances from a Subreddit Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html)) if necessary. Verify if the titles match what you read on Reddit.

In [7]:
# try run this line, what do you see? press q once you are done
?subreddit.top 
# An explanation of this "method" called subreddit.top including which
# paramaters can be used and that it is a ListingGenerator.
# Which will come in handy down below in the question about 
# the top 10 posts of this week! Thank you.

In [8]:
for submission in subreddit.top(limit=10):
    print(submission.title)

[Project] From books to presentations in 10s with AR + ML
[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
[R] First Order Motion Model applied to animate paintings
[N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
[D] This AI reveals how much time politicians stare at their phone at work
[D] Types of Machine Learning Papers
[D] The machine learning community has a toxicity problem
I made a robot that punishes me if it detects that if I am procrastinating on my assignments [P]
[Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
[P] Using oil portraits and First Order Model to bring the paintings back to life


<details> <summary>Expected output:</summary>

    [Project] From books to presentations in 10s with AR + ML
    [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition
    [R] First Order Motion Model applied to animate paintings
    [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this
    [D] This AI reveals how much time politicians stare at their phone at work
    [D] Types of Machine Learning Papers
    [D] The machine learning community has a toxicity problem
    [Project] NEW PYTHON PACKAGE: Sync GAN Art to Music with "Lucid Sonic Dreams"! (Link in Comments)
    [P] Using oil portraits and First Order Model to bring the paintings back to life
    [D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG)    
</details>

#### 2. Top 10 Posts of This Week

What are the titles of the top 10 posts of **this week** from your favorite subreddit?

In [9]:
for submission in subreddit.top(limit=10, time_filter="week"):
    print(submission.title)

[P] Simple fastai based face restoration project, GitHub link in comments.
[R] SIMPLERECON — 3D Reconstruction without 3D Convolutions — 73ms per frame !
[P] pytorch's Newest nvFuser, on Stable Diffusion to make your favorite diffusion model sample 2.5 times faster (compared to full precision) and 1.5 times faster (compared to half-precision)
[P] Cozy Auto Texture - A Blender add-on that allows you to generate free textures with Stable Diffusion
[D] How do you find your collaborators in AI research?
[P] Teach new concepts to Stable Diffusion with 3-5 images only - and browse a library of learned concepts to use with a gradio demo in colab
[P] Stable Diffusion web UI with Outpainting, Inpainting, Prompt matrix, Upscale, Textual Inversion and many more features
[N] NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut
[D] Most Popular AI Research August 2022 - Ranked By Twitter Likes
[N] Stable Diffusion Image Variations released, allows you to do variations like DALLE-2


<details><summary>Expected output:</summary>

    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    [R] Meta is releasing a 175B parameter language model
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    [P] T-SNE to view and order your Spotify tracks
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    [D] Do you use NLTK or Spacy for text preprocessing?
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
</details>

💽❓ Data Question:

Check out what other attributes the `praw.models.Submission` class has in the [docs](https://praw.readthedocs.io/en/stable/code_overview/models/submission.html). 

1. After having a chance to look through the docs, is there any other information that you might want to extract? How might this additional data help you?

Write a sample piece of code below extracting three additional pieces of information from the submission below.

In [10]:
# Here is what I did. I went for the top 5 of all time posts/submissions
# on the Machine Learning (ML) subreddit and then added in this order their
# author, title, score (which mimics the number of upvotes), number of
# comments and finally the URL in case I wanted to quickly access and
# read them.
# An interesting (and unexpected) insight for me is that currently the 
# author "TheInsaneApp" has the #2, #4 and #5 ML posts of all time!

for submission in subreddit.top(limit=5):

    print("Author:", submission.author, "|", "Title:", 
          submission.title, "|", "Score:", submission.score, 
          "|", "Number of comments:", submission.num_comments,
         "|", "URL:", submission.url)
    print()

Author: cyrildiagne | Title: [Project] From books to presentations in 10s with AR + ML | Score: 7436 | Number of comments: 184 | URL: https://v.redd.it/v492uoheuxx41

Author: TheInsaneApp | Title: [D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition | Score: 5650 | Number of comments: 128 | URL: https://v.redd.it/25nxi9ojfha61

Author: programmerChilli | Title: [R] First Order Motion Model applied to animate paintings | Score: 4665 | Number of comments: 110 | URL: https://v.redd.it/rlmmjm1q5wu41

Author: TheInsaneApp | Title: [N] AI can turn old photos into moving Images / Link is given in the comments - You can also turn your old photo like this | Score: 4578 | Number of comments: 230 | URL: https://v.redd.it/ikd5gjlbi8k61

Author: TheInsaneApp | Title: [D] This AI reveals how much time politicians stare at their phone at work | Score: 4344 | Number of comments: 228 | URL: https://i.redd.it/34sgziebfia71.jpg



💽❓ Data Question:

2. Is there any information available that might be a concern when it comes to Ethical Data?

In [11]:
# Potentially yes. As we've just seen, a lot of the data on Reddit
# is public and easily accessible and analyzable via APIs.
# That in itself in my view poses two possible risks:
# 1) That users inadvertently and naively post private information
# about themselves that they clearly shouldn't and
# 2) That confidential, classified and/or intellectual property- (IP)
# protected information is relayed as if it were public information
# even when it clearly isn't.

#### 3. Comment Code

Add comments to the code block below to describe what each line of the code does (Refer to [Obtain Comment Instances Section](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html) when necessary). The code is adapted from [this tutorial](https://praw.readthedocs.io/en/stable/tutorials/comments.html)

The purpose is 
1. to understand what the code is doing 
2. start to comment your code whenever it is not self-explantory if you have not (others will thank you, YOU will thank you later 😊) 

In [12]:
%%time
from praw.models import MoreComments

# Create variable "top_comments" and assign an empty list to it.
top_comments = []

# As before, we assigned the Machine Learning subreddit to the subreddit
# variable
subreddit = reddit.subreddit("machinelearning")

# Iterate through every submission of the top 10 comments in the subreddit
for submission in subreddit.top(limit=10):
    # Iterate through every top level comment across all submissions' comments
    for top_level_comment in submission.comments:
        # If a top level comment is classed as "MoreComments", just ignore/skip it
        if isinstance(top_level_comment, MoreComments):
            continue
        # Append (i.e. add) the body of all the top level comments to 
        # the currently list/variable called "top_comments
        top_comments.append(top_level_comment.body)

CPU times: user 575 ms, sys: 35.7 ms, total: 611 ms
Wall time: 15.1 s


#### 4. Inspect Comments

How many comments did you extract from the last step? Examine a few comments. 

In [13]:
len(top_comments)  # the answer may vary 693 for r/machinelearning

742

In [14]:
import random

[random.choice(top_comments) for i in range(3)]

['Hahahaha. \n\nBottom left corner is why I left ML reasearch. What was ridiculous was CVPR actually accepting the <1% improvements.',
 'Well, I found some statements here are actually incorrect or superficial. For example, you cannot simply draw a conclusion based on a single BERT paper without much context, and do not consider a lot of confounding factors (e.g. its results are much better than others). If you just want to reason by a single example, why not look at the two concurrent papers of VAE, [one](https://arxiv.org/abs/1312.6114) from Universiteit van Amsterdam which is cited \\~10K times, [the other](https://arxiv.org/abs/1401.4082) from Deepmind which is cited <3K. Can you draw an opposite conclusion from this?',
 "That's awesome dude! I love it."]

<details> <summary>Some of the comments from `r/machinelearning` subreddit are:</summary>

    ['Awesome visualisation',
    'Similar to a stack or connected neurons.',
    'Will this Turing pass the Turing Test?']
</details>

💽❓ Data Question:

3. After having a chance to review a few samples of 5 comments from the subreddit, what can you say about the data? 

HINT: Think about the "cleanliness" of the data, the content of the data, think about what you're trying to do - how does this data line up with your goal?

In [15]:
# My first reaction is to think that the "data quality" is uneven.
# In other words, and remembering this is a voluntary, most often 
# anonymous forum of people talking about machine learning,
# there is extra helpful stuff but also day-to-day pleasantries
# not to mention low-value content that might not even be directly 
# related to machine learning.

#### 5. Extract Top Level Comment from Subreddit `TSLA`.

Write your code to extract top level comments from the top 10 topics of a time period, e.g., year, from subreddit `TSLA` and store them in a list `top_comments_tsla`.  

In [16]:
%%time
from praw.models import MoreComments

# Create variable "top_comments_tsla" and assign an empty list to it.
top_comments_tsla = []

# Assign the TSLA subreddit to the subreddit variable
subreddit = reddit.subreddit("TSLA")

# Iterate through every submission of the top 10 comments
# of this week in the subreddit
for submission in subreddit.top(limit=10, time_filter="week"):
    # Iterate through every top level comment across all submission's comments
    for top_level_comment in submission.comments:
        # If a top level comment is classed as "MoreComments", just ignore/skip it
        if isinstance(top_level_comment, MoreComments):
            continue
        # Append (i.e. add to the end) the body of all the top level comments to 
        # the currently list/variable called top_comments_tsla
        top_comments_tsla.append(top_level_comment.body)

CPU times: user 85.8 ms, sys: 8.7 ms, total: 94.5 ms
Wall time: 1.48 s


In [17]:
len(top_comments_tsla) # Expected: 174 for r/machinelearning

32

In [18]:
[random.choice(top_comments_tsla) for i in range(3)]

['$169.04 X70', '$207/ 1004 shares', '165@$114 I think']

<details>
<summary>Some of the comments from `r/TSLA` subreddit:</summary>

    ['I bought puts',
    '100%',
    'Yes. And I’m bag holding 1200 calls for Friday and am close to throwing myself out the window']
</details>

💽❓ Data Question:

4. Now that you've had a chance to review another subreddits comments, do you see any differences in the kinds of comments either subreddit has - and how might this relate to bias?

In [19]:
# This subreddit seems to be even more focused than the Machine Learning one.
# It's largely about company- and stock-related information.
# I saw less disperson here. It's, again, much more focused.
# I've also noticed the ML one has 2.5 million members and this one
# fewer than 14K. This in turn produce fewer comments since there is 
# less activity in absolute terms.
# Regarding bias, the content is not representative of all the possible
# Tesla investors of the world, obviously. It represents the group
# of folks who chose to join this subreddit/forum.
# Thus "buyer" (in this case meaning reader) beware.

### Task III: Sentiment Analysis

Let us analyze the sentiment of comments scraped from `r/TSLA` using a pre-trained HuggingFace model to make the inference. Take a [Quick tour](https://huggingface.co/docs/transformers/quicktour). 

#### 1. Import `pipeline`

In [20]:
from transformers import pipeline

#### 2. Create a Pipeline to Perform Task "sentiment-analysis"

In [21]:
sentiment_model = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


#### 3. Get one comment from list `top_comments_tsla` from Task II - 5.

In [22]:
comment = random.choice(top_comments_tsla)

In [23]:
print(comment)

£103


The example comment is: `'Bury Burry!!!!!'`. Print out what you get. For reproducibility, use the same comment in the next step; consider setting a seed.

#### 4. Make Inference!

In [24]:
sentiment = sentiment_model("comment")
print(sentiment)

[{'label': 'POSITIVE', 'score': 0.9425747394561768}]


What is the type of the output `sentiment`?

```
YOUR ANSWER HERE
```

In [25]:
# [{'label': 'POSITIVE', 'score': 0.9425747394561768}]

In [26]:
print(f'The comment: {comment}')
print(f'Predicted Label is {sentiment[0]["label"]} and the score is {sentiment[0]["score"]:.3f}')

The comment: £103
Predicted Label is POSITIVE and the score is 0.943


For the example comment, the output is:

    The comment: Bury Burry!!!!!
    Predicted Label is NEGATIVE and the score is 0.989

🖥️❓ Model Question:

1. What does the score represent?

In [27]:
# The probability or confidence level (from 0 to 1) that the sentiment prediction (positive or negative) is accurate.

### Task IV: Put All Together

Let's pull all the piece together, create a simple script that does 

- get the subreddit
- get comments from the top posts for given subreddit
- run sentiment analysis 

#### Complete the Script

Once you complete the code, running the following block writes the code into a new Python script and saves it as `top_tlsa_comment_sentiment.py` under the same directory with the notebook. 

In [85]:
%%writefile top_tlsa_comment_sentiment.py

import secrets_reddit
import random

from typing import Dict, List

from praw import Reddit
from praw.models.reddit.subreddit import Subreddit
from praw.models import MoreComments

from transformers import pipeline


def get_subreddit(display_name:str) -> Subreddit:
    """Get subreddit object from display name

    Args:
        display_name (str): [description]

    Returns:
        Subreddit: [description]
    """    
    reddit = Reddit(
        client_id = secrets_reddit.REDDIT_API_CLIENT_ID,
        client_secret = secrets_reddit.REDDIT_API_CLIENT_SECRET,
        user_agent = secrets_reddit.REDDIT_API_USER_AGENT,
        )
    subreddit = reddit.subreddit("display_name")
    return subreddit

def get_comments(subreddit:Subreddit, limit:int=3) -> List[str]:
    """ Get comments from subreddit

    Args:
        subreddit (Subreddit): [description]
        limit (int, optional): [description]. Defaults to 3.

    Returns:
        List[str]: List of comments
    """
    top_comments = []
    for submission in subreddit.top(limit=limit):
        for top_level_comment in submission.comments:
            if isinstance(top_level_comment, MoreComments):
                continue
            top_comments.append(top_level_comment.body)
    return top_comments

def run_sentiment_analysis(comment:str) -> Dict:
    """Run sentiment analysis on comment using default distilbert model
    
    Args:
        comment (str): [description]
        
    Returns:
        str: Sentiment analysis result
    """
    sentiment_model = pipeline("sentiment-analysis")
    sentiment = sentiment_model(comment)
    return sentiment[0]


if __name__ == '__main__':
    subreddit = get_subreddit("TSLA")
    comments = get_comments(subreddit)
    comment = random.choice(comments)
    sentiment = run_sentiment_analysis(comment)
    
    print(f'The comment: {comment}')
    print(f'Predicted Label is {sentiment["label"]} and the score is {sentiment["score"]:.3f}')

Overwriting top_tlsa_comment_sentiment.py


Run the following block to see the output.

In [84]:
!python top_tlsa_comment_sentiment.py

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The comment: What
Predicted Label is POSITIVE and the score is 0.994


<details><summary> Expected output:</summary>

    No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
    The comment: When is DOGE flying
    Predicted Label is POSITIVE and the score is 0.689
</details>

💽❓ Data Question:

5. Is the subreddit active? About how many posts or threads per day? How could you find this information?

In [110]:
# The TSLA subreddit is somewhat active. As a quick reference,
# I pulled up the top comments in the last month (see below code and output)
# and found a total of 52 submissions i.e. fewer than 2 a day, on average.
# Furthermore the score and number of comments per submission are not
# particularly high either.


In [109]:
for submission in subreddit.top(time_filter="month"):

    print("Author:", submission.author, "|", "Title:", 
          submission.title, "|", "Score:", submission.score, 
          "|", "Number of comments:", submission.num_comments,
         "|", "URL:", submission.url)
    print()

Author: IhateFARTINGatWORK | Title: Happy Split day everyone.. | Score: 38 | Number of comments: 37 | URL: https://www.reddit.com/r/TSLA/comments/wwjg5g/happy_split_day_everyone/

Author: droneauto | Title: Tesla says Autopilot is preventing ~40 crashes per day from wrong pedal errors alone | Score: 28 | Number of comments: 6 | URL: https://www.reddit.com/r/TSLA/comments/wuwum2/tesla_says_autopilot_is_preventing_40_crashes_per/

Author: localbrada | Title: Tesla battery fire and crash were faked by insuranc | Score: 24 | Number of comments: 2 | URL: https://electrek.co/2022/08/30/tesla-battery-fire-crash-faked-insurance-company-bizarre-showcase/

Author: wewewawa | Title: Tesla (TSLA) is facing ‘unprecedented demand’ | Score: 25 | Number of comments: 4 | URL: https://electrek.co/2022/08/31/tesla-tsla-facing-unprecedented-demand/

Author: localbrada | Title: Tesla Sues to Sell Cars Directly to Consumers in Louisiana | Score: 24 | Number of comments: 5 | URL: https://www.wsj.com/articles

💽❓ Data Question:

6. Does there seem to be a large distribution of posters or a smaller concentration of posters who are very active? What kind of impact might this have on the data?

In [None]:
# Using the 52 above listed submissions from the past month as reference,
# 2 posters account for about 40% of all submissions.
# Here are their names and number of submissions in the last month:
# wewewava (13) and localbrada (8).

# This certainly increases the chance the data will be biased since the
# conversation is largely influenced by very few people.