In [2]:
from lab import __version__
from math import exp, log

print('Content filtering lab, version', __version__)

Lab libraries imported!
Content filtering lab, version 0.1


# Ethical Moderation Lab

## Introduction
Content Moderation is a subtopic within Computer Science ethics that has gained traction since the rise of popular social media platforms. Successful platforms such as Twitter, Reddit, and Quora have bred a space where everyone is allowed to voice their opinions on any topic they please.

Today, we will explore content moderation in a Computer Science perspective, and the delicate issues that arise from too much or too little moderation.


## Lab Background
Sport-It is a popular social media platform which is known for its variety of communities, all of which relate to a specific sport. For example, there is a Sport-It community for the NBA, NFL, NHL, and many more. 

The moderators at Sport-It have decided that they want to create an environment where users only post about sports, and not controversial topics that may be harmful or too political. Today, you will be helping the moderation team by creating an algorithm that will __flag all posts not directly related to sports__.

Through this lab, you will be tasked to __create an accurate machine learning algorithm__, while also questioning your own biases that may appear as you go through the lab. You will also be challenged to __think about many different edge cases__. For example, do you believe that a post harshly criticizing Colin Kapernick should be deemed as a valid post on Sport-It? *__And is there a right or wrong answer?__*


# 1: Automated Content Moderation

## The Naive Bayes Algorithm
Let’s take a moment to discuss the algorithm we are going to use to properly calculate our likelihood probabilities for valid and invalid posts on Sport-It. We will be using the Naive Bayes Classifiers algorithm.
### What does the Algorithm do?
The Naive Bayes algorithm is a classification technique that classifies an object to a label based on prior probabilities and feature probabilities. In today’s lab, our Naive Bayes Algorithm will assign valid or invalid labels to posts, depending on the probability that the words in the post would appear in either the valid or invalid label.

There are __two types of probabilities__ to look out for; the __prior probability__, and the __feature probability__. Let’s go over what each one means, and how to calculate them.

__Prior Probability__, in this case, is the __probability that the post is a valid/invalid post__. Mathematically, it would look like this: <br>
$${P}(Label = "on topic") = \frac{{k} + count(on topic posts)}{2{k} + count(posts)}\$$

__Feature Probability__, in this case, is the __probability that a specific word appears in a on-topic/off-topic label__. Mathematically, it would look like this: <br>
$${P}(Word = "election" | Label = "off topic") = \frac{{k} + count(off topic "election" posts)}{2{k} + count(off topic posts)}\$$

You may have noticed a constant k appearing in the formulas above. We don’t want to run into a situation where our feature probability is 0. Adding a constant k in the numerator and denominator fixes this issue.

## Using the Training Data
We must use training data to “train” our model. We will use sample data from past posts from Sport-It, alongside pre-existing labels, to train our data. Simply put, each post will be given a label “on-topic” or “off-topic”.

__Let’s start by importing any needed libraries:__

In [None]:
from lab import data_tools, display

A training dataset is already provided. Each datapoint contains __text (the post title)__, and a label __y/n (whether the post is on-topic or off-topic)__. Here is an example:
<blockquote>y: Musgrove throws first no-hitter in Padres history.</blockquote>
Here, the post about Musgrove is considered on-topic, as it fully pertains to a sport.

Let’s start by parsing through our data and assigning the parsed data to variable *training_data*:

In [None]:
training_data = data_tools.parse_data('./data/politics.csv', './data/espn.txt', limit=10)

Now that we parsed our data, we should be able to easily display our data. Using some display helper methods, we can display our training data. 

As you look through the dataset, ask yourself: __Do you agree with the labels provided? What would you change?__ Take a moment to discuss with your group. Remember, most edge cases have no right or wrong answer.

In [None]:
display.display_labelled_data(training_data)

Notice that in the code below, each datapoint in the list passed into the extend function has a label __“y/n”. Your group should assign these labels. Discuss with your group, and decide a label for each datapoint.__ Remember, there is no right or wrong answer!

In [None]:
# TODO: Assign each title in each datapoint a label "y" or "n".
# Discuss within your group what you think the labels should be, and why.
training_data.extend([
    ('y/n', 'Deshaun Watson Admits Encounters with Masseuses, Always Consensual'),
    ('y/n', 'Bruce Jenner is actually getting the Arthur Ashe Courage Award?'),
    ('y/n', 'The real Tim Tebow: anti gay, anti choice, and a very unexceptional QB who owes a great deal to a teammate who is very good at kicking long fieldgoals when Tim cant get near the red zone.'),
    ('y/n', 'TIL All NFL players have to do their physicals completely nude and are often nude for over an hour and a half. Many players have fears they were videotaped.'),
    ('y/n', 'Broncos Brandon Marshall kneels during national anthem, follows Colin Kaepernick’s path'),
    ('y/n', 'Megan Rapinoe says ‘not many, if any’ US womens soccer player’s would attend White House'),
    ('y/n', 'With the upcoming Thursday night NFL game, remember that this presents a simplified view of an entire culture, caricatures facial features based on race, depicts an outdated/inaccurate style of headdress, paints them as warmongering aggressors and overly glamorizes the violent side of their history.')
])

display.display_labelled_data(training_data)

## Calculating Probabilities Using The Training Data
Now that we parsed our data, we can now utilize it to our advantage. We will be calculating the prior probability of each label (on-topic and off-topic), and every feature probability for every word that appears in each group of posts. 

Recall that the prior probability is simply the probability of a certain label appearing in the training data. As a reminder, the equation looks like this:



Let’s go ahead and look at the prior_probability function. __Keep in mind that ${k} = 1$.__ We have provided a helper class with methods to deal with the actual calculations.


In [None]:
training_data_statistics = data_tools.DataStats(training_data)

In [None]:
def prior_probabilities(label):
    """
    Function input: label
    Global/implicit input: training_data_statistics (DataStats object)
    Output: P(Label=label)
    """
    k = 1
    num_invalid = len(training_data_statistics.invalid_posts)
    num_valid = len(training_data_statistics.valid_posts)
    num_total = training_data_statistics.num_posts
    if label == 'y':
        return (k + num_valid) / (2*k + num_total)
    elif label == 'n':
        return (k + num_invalid) / (2*k + num_total)
    else:
        raise KeyError('Unsupported label: {}'.format(label))
print(prior_probabilities('n'))
print(prior_probabilities('y'))

Use the _prior_probabilities_ function to assign the prior probabilities to on-topic posts and off_topic posts.

In [None]:
# TODO MIGHT NOT BE NEEDED
prior_probability_ontopic = 
prior_probability_offtopic =

We currently have the prior probabilities of both the on-topic and off-topic posts. In order to properly label posts on Sport-It as on-topic or off-topic, we also need to calculate the feature probabilities. That is, __the probability of each word appearing in each group of posts; on-topic and off-topic.__

Let’s use the _word_given_label_probability_ function, that will go through each word in each post in each label, and calculate the feature probability.

In [None]:
def word_given_label_probability(word, label):
    """
    Function input: label
    Global/implicit input: training_data_statistics (DataStats object)
    Output: P(word | Label=label)
    """
    if label == 'y':
        return training_data_statistics.valid_counter[word] / training_data_statistics.total_invalid_words
    elif label == 'n':
        return training_data_statistics.invalid_counter[word] / training_data_statistics.total_invalid_words
    else:
        raise KeyError('Unsupported label: {}'.format(label))

Code where they test the probability of the post being valid and the probability of the post being invalid
Returns a tuple: (p_valid, p_invalid)

In [None]:
def submission_probabilities(submission):
    """
    Function input: submission post
    Output: the predicted label
    """
    # TODO: Using past functions and helper methods, find the likelihood of the post being on-topic and off-topic
    likelihood_scores = (0.0, 0.0)
    
    
    if likelihood_scores[0] > likelihood_scores[1]:
        return 'y'
    return 'n'

## Predicting Labels for Posts
Time to see if our algorithm and model work! We now have the prior probabilities, as well as all the feature probabilities. We can correctly predict what label a post should have. For example if we were given the following post:
<blockquote>The game last night was absolutely terrible!</blockquote>

We should be able to __determine if this post is good or bad by the likelihood score of the on-topic label and the off-topic label__:


Code that returns the maximum of the probability of being valid and invalid

In [None]:
def post_validity(submission, threshold = 0):
    """
    Input: A particular submission (title of a post)
    Threshold: a threshold for tuning to mark for manual review
    """
    # TODO: Is this log calculation problematic?
    word_arr = data_tools.preprocess_submission(submission)
    sum_log_word_given_valid = 0
    sum_log_word_given_invalid = 0
    for word in word_arr:
        word_given_y = word_given_label_probability(word, 'y')
        word_given_n = word_given_label_probability(word, 'n')
        if word_given_y > 0:
            sum_log_word_given_valid += log(word_given_y)
        if word_given_n > 0:
            sum_log_word_given_invalid += log(word_given_n)
    log_ratio = log(prior_probabilities('y')) - log(prior_probabilities('n')) + sum_log_word_given_valid - sum_log_word_given_invalid
    # TODO: Maybe assert threshhold is positive for this to work
    if log_ratio < -1 * threshold:
        return 'n'
    elif log_ratio > -1 * threshold:
        return 'y'
    else:
        # TODO: Maybe this isn't the symbol you want?
        return '?'

(UNOFFICIAL) Code that compiles all of the students' functions into a single model object generator thing
It should also process the actual dataset

Code that returns the array of tuples(label, title) based on the probabilities that we found before
Input: testing dataset

In [None]:
from lab.data_tools import parse_unlabeled_reddit_feed, parse_unlabeled_espn
espn_data = parse_unlabeled_espn('./data/test/espn.txt', limit=100)
politics_data = parse_unlabeled_reddit_feed('./data/test/politics.txt', limit=100)

testing_data = espn_data + politics_data
solution = [('y', e) for e in espn_data] + [('n', p) for p in politics_data]

def filter_posts(posts):
    """
    Input: array of posts to filter WITHOUT labels (see output of parse_unlabeled_espn/reddit_feed)
    Output: array of posts as tuples (label, post title), see output of parse_data
    """
    # your code here!
    result = []
    for submission in testing_data:
        validity = post_validity(submission)
        result.append((validity, submission))

    return result
filtering_result = filter_posts(testing_data)
display.display_labelled_data(filtering_result)

Code block that returns the percentage of labels they predicted correctly
This should *just work*, e.g. it should already be implemented

In [None]:
# TODO: Calculate percent correctness - could just do by calculating len(verify_algorithm(test_result, solution)) / len(solution)
def verify_algorithm(test_result, solution):
    """
    Input: result of the test labelling, and the solution labelling. They should both be arrays of tuples (see output of parse_data for info)
    Output: Entries in test_result that did not appear in solution -- also known as wrong entries
    """
    return list(set(test_result) - set(solution))

In [None]:
mislabelled = verify_algorithm(filtering_result, solution)
# TODO: Alter number of test cases (there are a lot of mistakes so far, I think)
score = (1 - (len(mislabelled) / len(solution)))
print('Your accuracy is:', 100*score,'%')
display.display_labelled_data(mislabelled)

Look at the percentage you got when you ran the code block above. Were you expecting this score? Modify your past code, and see how the percentage changes. Try to get the best possible percentage. __Remember, incorrect moderation has real consequences for Sport-It.__

# 2: The Effects of a Bag-Of-Words Algorithm

## Bag of Words Algorithms
The algorithm we have implemented is a __bag-of-words algorithm__. Notice that our algorithm only accounts for the __*presence*__ of words. It __does not account for any order__ or structure the post may have. Simply put, we consider each post as a random collection, or bag, of words, rather than a structure of words. There are some major downsides to just considering the amount of times a word appears in a certain label, which we will explore soon.

## Involuntary Censorship
As you get the chance to predict labels for posts in the testing data, you should have begun to realize that all off-topic posts generally have a common theme in their text. Most of them may contain words such as “politics”, “kneeling”, “religion”, and more. However, __there may be some cases where our algorithm misinterpret some posts as off-topic, when in actuality, many would agree that it should be considered on-topic__. Let’s take a look at one common example:
<blockquote>Colin Kaepernick Signs a 6 Year Contract</blockquote>

This post has no underlying political tones. It is simply stating a fact about Colin Kapaernick. However, using our naive algorithm, __we might accidentally remove this post__. This is because the probability of the words “Colin Kaepernick” will show up in the off-topic group much more than the on-topic group. __Within your group, discuss how a Bag-of-words algorithm may create involuntary censorship, and who it could affect__.

# 3: The Use of Human Content Moderation

## Introducing Human Content Moderation
You may have been wondering why we chose to create a simple machine learning algorithm to moderate content at all. Why wouldn’t Sport-It create a simple flag function that flags any post that contains offensive or political words, and have employees or volunteers sift through the rest of the posts?

There have been many studies that prove that there are extremely traumatic effects on humans whose job is to filter through hundreds and thousands of posts daily, that contain topics much heavier than political views on a sports platform. It is important that as we extend the idea of automated content moderation from Sport-It to real platforms used by millions of users daily, we understand that __the need for human content moderators must be kept at an absolute minimum_.

__As you go through this next exercise, keep in mind the balance between accurate flagging, and effects of human content moderation on a large scale__.


## Handling a Threshold
As mentioned before, we want to keep the amount of content reviewed by people as low as possible, without ruining the validity of our algorithm. Therefore, we will only request human moderation on posts that the algorithm is on the fence about. In other words, if the difference between the on-topic and off-topic probability is extremely low, only then will we request further review.
Take a look at the code below. There is a number that represents the threshold. This threshold decides whether or not the post should be sent in a pile for further review. Play around with the threshold, and look at the resulting count of posts in the pile.

In [None]:
#TODO: IMPLEMENT

As you increase or decrease the threshold, __keep in mind the implication of your action__. By changing the threshold, you are effectively choosing what this reviewer will see. __Within your group, discuss at what point does the accuracy of the algorithm start to plateau and the psychological toll on the reviewer rise? How do you properly balance between these two concepts?__ Remember that human moderation should be used only when extremely necessary, and should be kept at a low whenever possible.

# Discussion Questions

It’s important to understand the real world implications our algorithm can have on an individual's freedom of speech. What may seem like a trivial content moderation task can single-handedly take down companies if users feel like their content is too heavily monitored.<br>
 - __How would an inaccurate algorithm affect Sport-It?__
 - __How did you and your group handle edge cases in the data set? Did you agree with the labels assigned to the edge cases?__
 - __This was a simple algorithm that just handled the frequencies of words, and not necessarily the order of words. Do some quick research and think of another algorithm that might be more effective.__
 - __What platforms that you use today may benefit from a content moderation algorithm?__
 - __What problems may we run into if there is too much moderation on a platform? Consider the concepts of diversity of opinions and “bubbles.”__
 - __Who decides what constitutes off topic or political conversations?__