# Ethical Moderation Lab

## Introduction
Content Moderation is a subtopic within Computer Science ethics that has gained traction since the rise of popular social media platforms. Successful platforms such as Twitter, Reddit, and Quora have produced a space where everyone is allowed to voice their opinions on any topic.

Today, we will explore content moderation with a Computer Science perspective, and the delicate issues that arise from too much or too little moderation.


## Lab Background
__Sport-It__ is a popular social media platform which is known for its variety of communities, all of which relate to a specific sport. For example, there is a Sport-It community for the NBA, NFL, NHL, and many more. 

The moderators at Sport-It have decided that they want to create an environment where users only post about sports, and not controversial topics that may be harmful or too political. Today, you will be helping the moderation team by creating an algorithm that will __flag all posts not directly related to sports__.

Through this lab, you will be tasked to __create an accurate machine learning algorithm__, while also questioning your own biases that may appear as you go through the lab. You will also be challenged to __think about many different edge cases__. For example, do you believe that a post harshly criticizing Colin Kapernick should be deemed as a on-topic post on Sport-It? *__And is there a right or wrong answer?__*


***

# Part 1: "Naive" Moderation

## The Naive Bayes Classifiers Algorithm
Let’s take a moment to discuss the algorithm we are going to use to properly calculate our likelihood probabilities for on-topic and off-topic posts on Sport-It. We will be using the Naive Bayes Classifiers algorithm.

The Naive Bayes algorithm is a classification technique that classifies an object to a label based on prior probabilities and feature probabilities. In today’s lab, our Naive Bayes Algorithm will assign on-topic or off-topic labels to posts, depending on the probability that the words in the post would appear in either the on-topic or off-topic label.

There are __two types of probabilities__ to look out for; the __prior probability__, and the __feature probability__. Let’s go over what each one means, and how to calculate them.

__Prior Probability__, in this case, is the __probability that the post is an on-topic/off-topic post__. Mathematically, it would look like this: <br>
$${P}(Label = "y") = \frac{{k} + count(on topic)}{2{k} + count(posts)}\$$

__Feature Probability__, in this case, is the __probability that a specific word appears in an on-topic/off-topic label__. Mathematically, it would look like this: <br>
$${P}(Word = "election" | Label = "n") = \frac{{k} + count(off topic "election" posts)}{2{k} + count(off topic posts)}\$$

You may have noticed a constant ${k}$ appearing in the formulas above. We don’t want to run into a situation where our feature probability is 0. Adding a constant ${k}$ in the numerator and denominator fixes this issue. This technique is called [Laplace Smoothing](https://towardsdatascience.com/laplace-smoothing-in-na%C3%AFve-bayes-algorithm-9c237a8bdece#:~:text=Laplace%20smoothing%20is%20a%20smoothing%20technique%20that%20helps%20tackle%20the,the%20positive%20and%20negative%20reviews.).

## Using the Training Data
We must use training data to “train” our model. We will use sample data from past posts from Sport-It, alongside pre-existing labels, to train our data. Simply put, each post will be given a label “on-topic” or “off-topic”.

__Let’s start by importing any needed libraries:__

In [None]:
from lab import __version__
from math import exp, log
from lab import data_tools, display

print('Content filtering lab, version', __version__)

A training dataset is already provided. Each datapoint contains __text (the post title)__, and a label __y/n (whether the post is on-topic/off-topic)__. Here is an example:
<blockquote>y: Musgrove throws first no-hitter in Padres history.</blockquote>
Here, the post about Musgrove is considered on-topic, as it fully pertains to a sport.

Let’s start by parsing through our data and assigning the parsed data to variable *training_data*:

In [None]:
training_data = data_tools.parse_data('./data/politics.csv', './data/espn.txt', limit=10)

Now that we parsed our data, we should be able to easily display our data. Using some display helper methods, we can display our training data. 

As you look through the dataset, ask yourself: __Do you agree with the labels provided? What would you change?__ Take a moment to discuss with your group. Remember, most edge cases have no right or wrong answer.

In [None]:
display.display_labelled_data(training_data)

Notice that in the code below, each datapoint in the list passed into the extend function has a label __“y/n”. Your group should assign these labels. Discuss with your group, and decide a label for each datapoint.__ Remember, there is no right or wrong answer!

In [None]:
# TODO: Assign each title in each datapoint a label "y" or "n".
# Discuss within your group what you think the labels should be, and why.
training_data.extend([
    ('y/n', 'Deshaun Watson Admits Encounters with Masseuses, Always Consensual'),
    ('y/n', 'Bruce Jenner is actually getting the Arthur Ashe Courage Award?'),
    ('y/n', 'The real Tim Tebow: anti gay, anti choice, and a very unexceptional QB who owes a great deal to a teammate who is very good at kicking long fieldgoals when Tim cant get near the red zone.'),
    ('y/n', 'TIL All NFL players have to do their physicals completely nude and are often nude for over an hour and a half. Many players have fears they were videotaped.'),
    ('y/n', 'Broncos Brandon Marshall kneels during national anthem, follows Colin Kaepernick’s path'),
    ('y/n', 'Megan Rapinoe says ‘not many, if any’ US womens soccer player’s would attend White House'),
    ('y/n', 'With the upcoming Thursday night NFL game, remember that this presents a simplified view of an entire culture, caricatures facial features based on race, depicts an outdated/inaccurate style of headdress, paints them as warmongering aggressors and overly glamorizes the violent side of their history.')
])

display.display_labelled_data(training_data)

## Calculating Probabilities Using The Training Data
Now that we parsed our data, we can now utilize it to our advantage. We will be calculating the prior probability of each label (on-topic and off-topic), and every feature probability for every word that appears in each group of posts. 

Recall that the prior probability is simply the probability of a certain label appearing in the training data. As a reminder, the equation looks like this:

$${P}(Label = "y") = \frac{{k} + count(on-topic)}{2{k} + count(posts)}\$$

Let’s go ahead and look at the _prior_probabilities_ function. __Keep in mind that ${k} = 1$.__ We have provided a helper class with methods to deal with the actual calculations.


In [None]:
training_data_statistics = data_tools.DataStats(training_data)

In [None]:
def prior_probabilities(label):
    """
    Function input: label
    Global/implicit input: training_data_statistics (DataStats object)
    Output: P(Label=label)
    """
    k = 1
    num_invalid = len(training_data_statistics.invalid_posts)
    num_valid = len(training_data_statistics.valid_posts)
    num_total = training_data_statistics.num_posts
    if label == 'y':
        return (k + num_valid) / (2*k + num_total)
    elif label == 'n':
        return (k + num_invalid) / (2*k + num_total)
    else:
        raise KeyError('Unsupported label: {}'.format(label))
print(prior_probabilities('n'))
print(prior_probabilities('y'))

We currently have the prior probabilities of both the on-topic and off-topic posts. In order to properly label posts on Sport-It as on-topic or off-topic, we also need to calculate the feature probabilities. That is, __the probability of each word appearing in each group of posts; on-topic and off-topic.__

Let’s use the _word_given_label_probability_ function, that will go through each word in each post in each label, and calculate the feature probability.

In [None]:
def word_given_label_probability(word, label):
    """
    Function input: label
    Global/implicit input: training_data_statistics (DataStats object)
    Output: P(word | Label=label)
    """
    if label == 'y':
        return training_data_statistics.valid_counter[word] / training_data_statistics.total_invalid_words
    elif label == 'n':
        return training_data_statistics.invalid_counter[word] / training_data_statistics.total_invalid_words
    else:
        raise KeyError('Unsupported label: {}'.format(label))

## Predicting Labels for Posts
Time to see if our algorithm and model work! We now have the prior probabilities, as well as all the feature probabilities. We can correctly predict what label a post should have. For example if we were given the following post:
<blockquote>The game last night was absolutely terrible!</blockquote>

We should be able to __determine this post's label by the likelihood score of it being on-topic and off-topic__:

<blockquote>${P}$(Label='<font color='green'>y</font>'|AllWords) = ${P}$(Label='<font color='green'>y</font>') $* {P}$(Word='<font color='blue'>The</font>'|Label='<font color='green'>y</font>') $* {P}$(Word='<font color='blue'>game</font>'|Label='<font color='green'>y</font>') $* {P}$(Word='<font color='blue'>last</font>'|Label='<font color='green'>y</font>') $*$ ...</blockquote>

Once you have the probability of the post being on-topic or off-topic given the words in the posts, __choose the label corresponding to the highest probability of the two to correctly classify that post__. 

Create a function _post_validity_ that __returns the predicted label assigned to the submission post__:

In [None]:
def post_validity(submission, threshold = 0):
    """
    Input: A particular submission (title of a post)
    Threshold: a threshold for tuning to mark for manual review
    """
    word_arr = data_tools.preprocess_submission(submission)
    sum_log_word_given_valid = 0
    sum_log_word_given_invalid = 0
    for word in word_arr:
        word_given_y = word_given_label_probability(word, 'y')
        word_given_n = word_given_label_probability(word, 'n')
        if word_given_y > 0:
            sum_log_word_given_valid += log(word_given_y)
        if word_given_n > 0:
            sum_log_word_given_invalid += log(word_given_n)
    log_ratio = log(prior_probabilities('y')) - log(prior_probabilities('n')) + sum_log_word_given_valid - sum_log_word_given_invalid
    if log_ratio < -1 * threshold:
        return 'n'
    elif log_ratio > -1 * threshold:
        return 'y'
    else:
        # manual review
        return '?'

The code below returns an array of tuples(labels, titles) based on the probabilities we found before. __With your group, quickly discuss the code and what it does__.

In [None]:
from lab.data_tools import parse_unlabeled_reddit_feed, parse_unlabeled_espn
espn_data = parse_unlabeled_espn('./data/test/espn.txt', limit=100)
politics_data = parse_unlabeled_reddit_feed('./data/test/politics.txt', limit=100)

testing_data = espn_data + politics_data
solution = [('y', e) for e in espn_data] + [('n', p) for p in politics_data]

def filter_posts(posts):
    """
    Input: array of posts to filter WITHOUT labels (see output of parse_unlabeled_espn/reddit_feed)
    Output: array of posts as tuples (label, post title), see output of parse_data
    """
    # your code here!
    result = []
    for submission in testing_data:
        validity = post_validity(submission)
        result.append((validity, submission))

    return result
filtering_result = filter_posts(testing_data)
display.display_labelled_data(filtering_result)

Great! Now we predicted labels for the posts in the testing set, based on the probabilities we calculated. Let’s go ahead and test how accurate your algorithm is! Run the code below to get a percentage of labels you correctly predicted.

In [None]:
def verify_algorithm(test_result, solution):
    """
    Input: result of the test labelling, and the solution labelling. They should both be arrays of tuples (see output of parse_data for info)
    Output: Entries in test_result that did not appear in solution -- also known as wrong entries
    """
    return list(set(test_result) - set(solution))

In [None]:
mislabelled = verify_algorithm(filtering_result, solution)
score = (1 - (len(mislabelled) / len(solution)))
print('Your accuracy is:', 100*score,'%')
display.display_labelled_data(mislabelled)

Look at the percentage you got when you ran the code block above. Were you expecting this score? Modify your past code, and see how the percentage changes. Try to get the best possible percentage. __Remember, incorrect moderation has real consequences for Sport-It.__

***

# Part 2: The Consequences of Naivety

### __More about the effects of a Naive content moderation algorithm can be found on the [ethical moderation website](https://dylanirlbeck.github.io/ethical-moderation/project#part-2-the-consequences-of-naivety)__

***

# Part 3: Introducing Human Content Moderation

### __More about human content moderation can be found on the [ethical moderation website](https://dylanirlbeck.github.io/ethical-moderation/project#part-3-introducing-human-content-moderation)__

***

# Resources

 - [At the End of the Day Facebook Does What it Wants](https://s3.amazonaws.com/kvaccaro.com/documents/vaccaro_cscw2020.pdf)
 - [Introduction to Bag-Of-Words Models](https://machinelearningmastery.com/gentle-introduction-bag-words-model/)
 - [Introduction to NBC](https://towardsdatascience.com/introduction-to-naive-bayes-classification-4cffabb1ae54)
 - [Laplace Smoothing](https://towardsdatascience.com/laplace-smoothing-in-na%C3%AFve-bayes-algorithm-9c237a8bdece#:~:text=Laplace%20smoothing%20is%20a%20smoothing%20technique%20that%20helps%20tackle%20the,the%20positive%20and%20negative%20reviews.)
 - [Sample Data Examples](https://www.reddit.com/r/sports/controversial)
