# Training a Classifier: Assignment 5

This week we will learn how to be a requestor on Amazon Mechanical Turk. The results of your HITs will be used as datasets to train image classifiers. We will be using [fastai](https://docs.fast.ai/), a high-level PyTorch wrapper that provides straightforward methods for deep learning. Deep learning utilizes multiple layers of neural networks in order to extract and transform data. Applications of deep learning can be found all around you, including speech recognition, autonomous driving, and [board games](https://www.youtube.com/watch?v=WXuK6gekU1Y&ab_channel=DeepMind). 

Watch [Lesson 1](https://course.fast.ai/videos/?lesson=1) for an introduction to deep learning. Note that the code that needs to be written for this assignment is slightly different than the code found in the video. Nevertheless, the video provides a comprehensive overview about the subject. 

Below contains the instructions on what code needs to be written. Follow the instructions and post on Piazza for any clarity. Please also save a copy of this Colab file to your personal Google Drive and work on that copy. You will submit a link to your Colab notebook on Gradescope.

# Install dependencies

You only need to run this once to set the notebook up. Make sure you select Runtime > Change Runtime Type > GPU to get a GPU on Google Colab.

In [None]:
#hide
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

#hide
from fastbook import *

#hide
! curl http://nets213-hw5.s3.amazonaws.com/results.csv -o results.csv 
! curl https://nets213-hw5.s3.amazonaws.com/weddings-indian-languages.zip -o  weddings-indian-languages.zip
! curl https://nets213-hw5.s3.amazonaws.com/weddings-european-language.zip -o  weddings-european-language.zip
! unzip -o weddings-indian-languages.zip 1>/dev/null
! unzip -o weddings-european-language.zip 1>/dev/null

[K     |████████████████████████████████| 727kB 18.1MB/s 
[K     |████████████████████████████████| 1.2MB 49.5MB/s 
[K     |████████████████████████████████| 194kB 54.6MB/s 
[K     |████████████████████████████████| 51kB 7.6MB/s 
[K     |████████████████████████████████| 61kB 9.1MB/s 
[K     |████████████████████████████████| 51kB 7.4MB/s 
[?25hMounted at /content/gdrive
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  236k  100  236k    0     0  2786k      0 --:--:-- --:--:-- --:--:-- 2819k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.4M  100 95.4M    0     0  39.9M      0  0:00:02  0:00:02 --:--:-- 39.9M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent   

# Load "gold standard" results

We've provided results.csv, which contains the results from our course's Amazon MTurk HITs. Read it in and for each image, calculate the count of how many Turkers voted yes or no for the image. Each image has three votes, meaning that three Turkers voted for each image. We will be using these results as the "gold standard" dataset – images that we assume to be correctly labeled. This "gold standard" dataset will be used later in the assignment to assess your own classifiers.

In [None]:
import pandas as pd 
from collections import Counter

df = pd.read_csv("results.csv") # reads a .csv file into a DataFraame
counts = Counter() # holds the count of each of the elements present in it

for _, row in df.iterrows():
  true_images = row['Answer.selected'].split('|') # splits the string into a list with the specified '|' separator
  for i in range(1, 12):
    url = row["Input.image" + str(i)] 
    if type(url) is not str: continue # this line skips bad entries in CSV

    ##### START CODE HERE: Add 1 to counts when the url has a vote (in other words, when the url is in true_image) #####

    ##### END CODE HERE #####

# Train an image classifier to classify weddings

Instead of classifying images of cats vs. dogs like in the fastai lesson, we're going to be classifying whether a photo contains a wedding or not. We'll classify an image as a wedding if its counter value is 2 or more (which means a majority of the workers said the image represented a wedding).

In [None]:
from fastai.vision.all import *

def get_path_from_url(url):
  return url.replace('https://s3.amazonaws.com/nets213-hw5/', '')

image_urls = list(counts.keys()) # remote URLs to the images
paths = [get_path_from_url(url) for url in image_urls] # local paths to the image files in the Google Colab files

##### START CODE HERE: Write get_label() in order to return True if the image has at least 2 votes, and False otherwise #####
def get_label(url): 
  pass
##### END CODE HERE #####

##### START CODE HERE: Run get_label() on image_urls to create a list called "labels" #####
labels = None
##### END CODE HERE #####

We will now specify the structure of our dataset and then create and train our model. If the given parameter values are used, the training time should be a few seconds and the error rate should be around 20%. Try to experiment around with different parameters to further tune the model! Examples of potential improvements are data augmentations, additional epochs, and batch sizes. 

Reference the [documentation](https://docs.fast.ai/vision.data.html#ImageDataLoaders) of ImageDataLoaders to determine the appropriate method to use for the first line.

In [None]:
##### START CODE HERE: Specify what type of dataset we have and how it's structured with ImageDataLoaders – use a validation percentage of 20%, seed of 42, and item transformation of Resize(224) #####
dls = None
##### END CODE HERE #####

##### START CODE HERE: Create a convolutional neural network called "learn" with cnn_learner() – use an architecture of resnet34, metric of error_rate, and pretrained of True #####
learn = None
##### END CODE HERE #####

##### START CODE HERE: Fit the model with fine_tune() – use an epoch of 1

##### END CODE HERE #####

Try using a sample wedding image found online with our classifier in order to see if it was successfully trained.

In [None]:
# Click on the gray "Upload" button in order to upload your sample wedding image
uploader = widgets.FileUpload()
uploader

FileUpload(value={}, description='Upload')

In [None]:
# The model will predict whether or not your uploaded image is that of a wedding
img = PILImage.create(uploader.data[0])
is_wedding,_,probs = learn.predict(img)
print(f"Is this an image of a wedding?: {is_wedding}.")
print(f"Probability it's a wedding: {probs[1].item():.6f}")

NameError: ignored

# Train an image classifier to classify weddings with your HITs from US-based Turkers

Now we'll do the same thing but with the results of your Amazon MTurk HITs! Upload your MTurk output CSV from US-based Turkers to your Google Colab files. Follow the "gold standard" code as a guide on how to load in a CSV. Create the lists of paths and labels for your HITs from US-based Turkers.

In [None]:
##### START CODE HERE: Load in your results from your US-based CSV file, similarly to what you did with the "gold standard" results.csv file #####

##### END CODE HERE #####

Train a new classifier on only the results of your HITs from US-based Turkers. Keep your classifier variable name as "classifier_US".

In [None]:
##### START CODE HERE: Train a new classifier with the results of your HITs from US-based Turkers #####
classifier_US = None
##### END CODE HERE #####

Try using a variety of sample wedding images found online with this classifier. Are there any wedding images that aren't accurately predicted to be wedding images? Why do you think that is?



In [None]:
# Click on the gray "Upload" button in order to upload your sample wedding image
uploader = widgets.FileUpload()
uploader

FileUpload(value={}, description='Upload')

In [None]:
# The model will predict whether or not your uploaded image is that of a wedding
img = PILImage.create(uploader.data[0])
is_wedding,_,probs = classifier_US.predict(img)
print(f"Is this an image of a wedding?: {is_wedding}.")
print(f"Probability it's a wedding: {probs[1].item():.6f}")

Is this an image of a wedding?: True.
Probability it's a wedding: 0.736173


# Train an image classifier to classify weddings with your HITs from India-based Turkers

Upload your MTurk output CSV from India-based Turkers to your Google Colab files. Follow the "gold standard" code as a guide on how to load in a CSV. Create the lists of paths and labels for your HITs from India-based Turkers.

In [None]:
##### START CODE HERE: Load in your results from your India-based CSV file, similarly to what you did with the "gold standard" results.csv file #####

##### END CODE HERE #####

Train a new classifier on only the results of your HITs from India-based Turkers. Keep your classifier variable name as "classifier_India".

In [None]:
##### START CODE HERE: Train a new classifier with the results of your HITs from India-based Turkers #####
classifier_India = None
##### END CODE HERE #####

Try using a variety of sample wedding images found online with this classifier. Are there any wedding images that aren't accurately predicted to be wedding images? Why do you think that is?

In [None]:
# Click on the gray "Upload" button in order to upload your sample wedding image
uploader = widgets.FileUpload()
uploader

FileUpload(value={}, description='Upload')

In [None]:
# The model will predict whether or not your uploaded image is that of a wedding
img = PILImage.create(uploader.data[0])
is_wedding,_,probs = classifier_India.predict(img)
print(f"Is this an image of a wedding?: {is_wedding}.")
print(f"Probability it's a wedding: {probs[1].item():.6f}")

IndexError: ignored

# Calculate evaluation metrics for our two classifiers

We will now calculate several evaluation metrics to assess our two classifiers. Several metrics include precision, recall, and F1-Score. These three metrics can be used to assess how good our classifiers are. An overview can be found [here](https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-36b5fd0dd3ec). 

A confusion matrix can help visualize the components used in calculating these metrics:

<figure align="center">
<img src="https://miro.medium.com/max/700/1*OhEnS-T54Cz0YSTl_c3Dwg.jpeg" />
</figure>

*   Precision: The ratio of what our model predicted correctly to what our model predicted
*   Recall: Ratio of what our model predicted correctly to what the actual labels are
*   F1-Score: Harmonic mean of precision and recall

<figure align="center">
<img src="https://miro.medium.com/max/1068/1*EXa-_699fntpUoRjZeqAFQ.jpeg" />
</figure>


In order to have a baseline for comparison, we created a classifier pretrained on ImageNet and obtained the following metrics after testing it with our "gold standard" dataset:

*   Non-Western Precision: 0.658008658008568
*   Non-Western Recall: 0.18225419664268586
*   Non-Western F1: 0.28544600938967135
*   Western Precision: 0.7463414634146341
*   Western Recall: 0.504950495049505
*   Western F1: 0.6023622047244095

Notice that our classifier does worse on the Non-Western images than on Western images, as seen in the difference in F1-Scores. This result could have been anticipated. [Google researchers](https://research.google/pubs/pub46553/) found that ImageNet “appear[s] to exhibit an observable amerocentric and eurocentric representation bias,” as demonstrated by the distribution of geographically identifiable images in the datasets, with 2/3 of the images from the Western world. 

<figure align="center">
<img src="<figure align="center">
<img src="http://crowdsourcing-class.org/images/imagenet_pie_chart.jpg" />
</figure>

We will calculate these metrics for our two classifiers as well. Again, we will use the "gold standard" dataset from our results.csv file from the start of this Colab notebook. We will assume that these images and labels are correct. We will have our classifiers predict whether or not these images contain a wedding or not in order to determine our True Positive, True Negative, False Positive, and False Negative values and ultimately calculate our F1-Scores.

# Separate the "gold standard" dataset into Western images and Non-Western images

We will first separate the "gold standard" dataset into two datasets – one dataset that only includes the images and labels of Western images and another dataset that only includes the images and labels of Non-Western images. 

Note that the variable "paths" contains all the image paths and the variable "labels" contains all the correct labels for our "gold standard" dataset. You can determine if an image is Western or Non-Western if the path contains the text "weddings-european-language" or "weddings-indian-languages", respectively.


In [None]:
# First dataset – contains only the Western photos
paths_Western = []
labels_Western = []

# Second dataset – contains only the NonWestern photos
paths_NonWestern = []
labels_NonWestern = []

##### START CODE HERE: Separate the "gold standard" dataset into the two datasets #####

##### END CODE HERE #####

# Calculate evaluation metrics for your US-trained classifier

We will first calculate the precision, recall, and F1-score for your US-trained classifier with the Western dataset of images.


In [None]:
predictions_Western_US = []

##### START CODE HERE: Create a list called predictions__Western_US that stores all predictions for the images in "paths_Western" #####
predictions_Western_US = None
##### END CODE HERE #####

In [None]:
##### START CODE HERE: Calculate the number of True Positives, False Positives, and False Negatives for your classifier_US on the Western images #####
TP_Western_US = None
FP_Western_US = None
FN_Western_US = None
##### END CODE HERE #####

##### START CODE HERE: Calculate precision, recall, and F1-score for your classifier_US on the Western images #####
precision_Western_US = None
recall_Western_US = None
f1_score_Western_US = None
##### END CODE HERE #####

# Display your precision, recall, and F1-score for your classifier_US on the Western images
print("Western Precision from your US-trained classifier: " + str(precision_Western_US))
print("Western Recall from your US-trained classifier: " + str(recall_Western_US))
print("Western F1-Score from your US-trained classifier: " + str(f1_score_Western_US))

Western Precision from your US-trained classifier: None
Western Recall from your US-trained classifier: None
Western F1-Score from your US-trained classifier: None


We will now calculate the precision, recall, and F1-score for your US-trained classifier with the Non-Western dataset of images.

In [None]:
predictions_NonWestern_US = []

##### START CODE HERE: Create a list called predictions__NonWestern_US that stores all predictions for the images in "paths_NonWestern" #####
predictions_NonWestern_US = None
##### END CODE HERE #####

In [None]:
##### START CODE HERE: Calculate the number of True Positives, False Positives, and False Negatives for your classifier_US on the NonWestern images #####
TP_NonWestern_US = None
FP_NonWestern_US = None
FN_NonWestern_US = None
##### END CODE HERE #####

##### START CODE HERE: Calculate precision, recall, and F1-score for your classifier_US on the NonWestern images #####
precision_NonWestern_US = None
recall_NonWestern_US = None
f1_score_NonWestern_US = None
##### END CODE HERE #####

# Display your precision, recall, and F1-score for your classifier_US on the NonWestern images
print("NonWestern Precision from your US-trained classifier: " + str(precision_NonWestern_US))
print("NonWestern Recall from your US-trained classifier: " + str(recall_NonWestern_US))
print("NonWestern F1-Score from your US-trained classifier: " + str(f1_score_NonWestern_US))

NonWestern Precision from your US-trained classifier: None
NonWestern Recall from your US-trained classifier: None
NonWestern F1-Score from your US-trained classifier: None


# Calculate evaluation metrics for your India-trained classifier

We will first calculate the precision, recall, and F1-score for your India-trained classifier with the Western dataset of images.

In [None]:
predictions_Western_India = []

##### START CODE HERE: Create a list called predictions__Western_India that stores all predictions for the images in "paths_Western" #####
predictions_Western_India = None
##### END CODE HERE #####

In [None]:
##### START CODE HERE: Calculate the number of True Positives, False Positives, and False Negatives for your classifier_India on the Western images #####
TP_Western_India = None
FP_Western_India = None
FN_Western_India = None
##### END CODE HERE #####

##### START CODE HERE: Calculate precision, recall, and F1-score for your classifier_India on the Western images #####
precision_Western_India = None
recall_Western_India = None
f1_score_Western_India = None
##### END CODE HERE #####

# Display your precision, recall, and F1-score for your classifier_India on the Western images
print("Western Precision from your India-trained classifier: " + str(precision_Western_India))
print("Western Recall from your India-trained classifier: " + str(recall_Western_India))
print("Western F1-Score from your India-trained classifier: " + str(f1_score_Western_India))

Western Precision from your India-trained classifier: None
Western Recall from your India-trained classifier: None
Western F1-Score from your India-trained classifier: None


We will now calculate the precision, recall, and F1-score for your India-trained classifier with the Non-Western dataset of images.

In [None]:
predictions_NonWestern_India = []

##### START CODE HERE: Create a list called predictions__NonWestern_India that stores all predictions for the images in "paths_NonWestern" #####
predictions_NonWestern_India = None
##### END CODE HERE #####

In [None]:
##### START CODE HERE: Calculate the number of True Positives, False Positives, and False Negatives for your classifier_India on the NonWestern images #####
TP_NonWestern_India = None
FP_NonWestern_India = None
FN_NonWestern_India = None
##### END CODE HERE #####

##### START CODE HERE: Calculate precision, recall, and F1-score for your classifier_India on the NonWestern images #####
precision_NonWestern_India = None
recall_NonWestern_India = None
f1_score_NonWestern_India = None
##### END CODE HERE #####

# Display your precision, recall, and F1-score for your classifier_India on the NonWestern images
print("NonWestern Precision from your India-trained classifier: " + str(precision_NonWestern_India))
print("NonWestern Recall from your India-trained classifier: " + str(recall_NonWestern_India))
print("NonWestern F1-Score from your India-trained classifier: " + str(f1_score_NonWestern_India))

NonWestern Precision from your India-trained classifier: None
NonWestern Recall from your India-trained classifier: None
NonWestern F1-Score from your India-trained classifier: None


You have calculated the following:
*   Precision, Recall, F1-Score for your US-trained classifier on Western images
*   Precision, Recall, F1-Score for your US-trained classifier on NonWestern images
*   Precision, Recall, F1-Score for your India-trained classifier on Western images
*   Precision, Recall, F1-Score for your India-trained classifier on NonWestern images

Compare these metrics together. Do you see any significant differences? What do you think caused these differences?


# Conclusion

Congratulations on successfully training your image classifiers! Upload this Colab file to Gradescope before the deadline.