In [1]:
from __future__ import print_function

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import itertools
import csv
from os import path

# ChiliCritic
#### By: Tyler Korte | 09/10/2020

### Synopsis
My employer recently hosted a chili cook-off and one of the more favored chilis ran out before the last few people could rate it. I was on awaiting access for my security clearance so I set out to use machine learning to predict how they would have voted on that chili based on how they rated the other chilis. The idea is that the machine learning will detect the patterns in preference between the voters that tried the last chili and those that didn't. For the official competition I suggested giving the final chili the average score of all previous votes and I will compare the results.

Each chili was to be given a score in the range \[1, 5\] and some people used decimal answers (one used pi).

For this project, I chose to use <a href="https://pytorch.org/">pytorch</a>.

### Data Loading Function

In [2]:
def load_data(filename):
    inputs = list()
    with open(filename, 'r') as csv_file:
        csv_reader = csv.reader(csv_file, quoting=csv.QUOTE_NONNUMERIC)

        for r in csv_reader:
            inputs.append(r)
    return inputs

The training and test data came in the form of .csv files where the votes are positional and the final position represents the vote for the chili that ran out. The input data is also a .csv file, but it is missing the final data point.

I chose to use the first 7 numbers as input and the last as the target output.

### Defining the Model

In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(7, 12)
        self.fc2 = nn.Linear(12, 12)
        self.fc3 = nn.Linear(12, 1)
        # self.fc4 = nn.Linear(12, 1)
        # self.fc5 = nn.Linear(4, 1)

    def forward(self, x):
        # x = self.fc5(torch.tanh(self.fc4(torch.tanh(self.fc3(torch.tanh(self.fc2(torch.tanh(self.fc1(x)))))))))
        # x = self.fc3(F.relu(self.fc2(F.relu(self.fc1(x)))))
        x = self.fc3(torch.sigmoid(self.fc2(torch.sigmoid(self.fc1(x)))))
        # x = self.fc4(torch.sigmoid(self.fc3(torch.sigmoid(self.fc2(torch.sigmoid(self.fc1(x)))))))
        return x
    
# Setting up the neural net
net = Net()
loss_fn = nn.L1Loss()
optimizer = optim.SGD(net.parameters(), lr=0.01)  # , momentum=0.5

Here I define the model as a fully-connected, linear model with a sigmoid activation function. This model has an input, hidden, and output layer. I have left in some of the other things I tried. Extra layers and different activation functions did not make a justifiable difference. This network uses the L1Loss function which calculates the mean absolute error (MAE) to do stochastic gradient descent.

### Loading Data

In [4]:
input_data = load_data('input_data.csv')
raw_validation_data = load_data('validation_data.csv')
raw_training_data = load_data('training_data.csv')

In [5]:
validation_data = list()
for row in raw_validation_data:
    validation_data.append((row[0:-1], row[-1]))

training_data = list()
for row in raw_training_data:
    training_data.append((row[0:-1], row[-1]))

### The Data

In [6]:
training_format_data = ['              Training Data               '] + training_data
format_data = ['              Test Data                   '] + validation_data + ['              Input Data                  '] + input_data
for x, y in itertools.zip_longest(training_format_data, format_data, fillvalue=' '*33):
    print(f"{x}   -   {y}]")

              Training Data                  -                 Test Data                   ]
([5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0], 4.0)   -   ([5.0, 3.0, 5.0, 2.0, 3.0, 3.0, 5.0], 4.0)]
([5.0, 2.0, 3.0, 2.0, 3.0, 3.0, 4.0], 2.0)   -   ([1.0, 2.0, 3.0, 3.0, 4.0, 1.0, 5.0], 5.0)]
([5.0, 3.0, 4.0, 3.0, 3.0, 2.0, 3.0], 5.0)   -   ([4.0, 5.0, 4.0, 3.0, 5.0, 4.0, 4.0], 3.0)]
([4.0, 3.0, 3.0, 2.0, 2.0, 2.0, 4.0], 3.0)   -                 Input Data                  ]
([5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 4.0], 4.0)   -   [4.0, 3.0, 1.0, 2.0, 2.0, 1.0, 4.0]]
([3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 3.0], 3.0)   -   [5.0, 3.0, 5.0, 3.0, 3.0, 3.0, 3.0]]
([3.0, 1.0, 5.0, 4.0, 3.0, 4.0, 5.0], 4.0)   -   [4.0, 1.0, 5.0, 3.0, 2.0, 1.0, 3.0]]
([5.0, 1.0, 4.0, 3.0, 5.0, 2.0, 3.0], 4.0)   -   [3.0, 2.0, 3.0, 2.0, 2.0, 4.0, 2.0]]
([3.0, 2.0, 4.0, 3.0, 4.0, 3.0, 2.0], 3.0)   -   [3.0, 2.0, 3.0, 2.0, 4.0, 3.0, 3.0]]
([5.0, 4.0, 3.0, 2.0, 3.0, 2.0, 4.0], 5.0)   -                                    ]
([4.0, 3.0, 2.0, 3.0,

Here I am separating the last piece of the data from the rest as the input and output.

### Training the Model

In [7]:
# Naming scheme
# ChiliCritic_LossFN_ActivationFN_HiddenLayers_Size
nn_path = 'NeuralNets/ChiliCritic_L1_Sigmoid_1_12_725'

Here I check to see if a trained model exists to save time retraining it. I used the naming convention listed in order to retain different trained models for comparison while I was tweaking hyperperameters. If no saved model exists with the specified name, I train and save the model under that name.

In [8]:
if path.exists(nn_path):
    net = torch.load(nn_path)
    net.eval()
else:

    # Training the neural net
    for epoch in range(725):
        # i is a counter, dat represents the row in the data
        for i, dat in enumerate(training_data):
            # X represents the input data, Y represents the actual output
            X, Y = iter(dat)
            X, Y = Variable(torch.FloatTensor(X), requires_grad=True), Variable(torch.FloatTensor([Y]), requires_grad=False)

            optimizer.zero_grad()

            outputs = net(X)

            loss = loss_fn(outputs, Y)
            loss.backward()

            optimizer.step()

            # if i % 5 == 0:
        if epoch % 76 == 0:
            print(f'    Epoch {epoch} --- Loss: {loss.data.item()}')
            print(f'prediction: {outputs[0]}  actual: {Y[0]}')
    # Save NN so it doesn't need to be recomputed
    torch.save(net, nn_path)

    Epoch 0 --- Loss: 1.2479742765426636
prediction: 1.7520257234573364  actual: 3.0
    Epoch 76 --- Loss: 1.346364974975586
prediction: 4.346364974975586  actual: 3.0
    Epoch 152 --- Loss: 1.3724098205566406
prediction: 4.372409820556641  actual: 3.0
    Epoch 228 --- Loss: 1.239067554473877
prediction: 4.239067554473877  actual: 3.0
    Epoch 304 --- Loss: 1.2394137382507324
prediction: 4.239413738250732  actual: 3.0
    Epoch 380 --- Loss: 0.9669528007507324
prediction: 3.9669528007507324  actual: 3.0
    Epoch 456 --- Loss: 1.0361099243164062
prediction: 4.036109924316406  actual: 3.0
    Epoch 532 --- Loss: 0.9819433689117432
prediction: 3.981943368911743  actual: 3.0
    Epoch 608 --- Loss: 0.9252259731292725
prediction: 3.9252259731292725  actual: 3.0
    Epoch 684 --- Loss: 0.8569152355194092
prediction: 3.856915235519409  actual: 3.0


### Testing the Model

In [9]:
print(nn_path.split('/')[-1])
# Testing the neural net
for i, dat in enumerate(validation_data):
    # X represents the input data, Y represents the actual output
    X, Y = iter(dat)
    X, Y = Variable(torch.FloatTensor(X), requires_grad=True), Variable(torch.FloatTensor([Y]), requires_grad=False)

    outputs = net(X)

    loss = loss_fn(outputs, Y)

    prediction = outputs[0]
    actual = Y[0]
    difference = round(float(Y[0] - outputs[0]), 1)
    pd = round(float((difference/actual)*100), 1)
    print(f'prediction: {prediction}  actual: {actual}  difference: {difference}  percent difference: {pd}%')

ChiliCritic_L1_Sigmoid_1_12_725
prediction: 4.065948486328125  actual: 4.0  difference: -0.1  percent difference: -2.5%
prediction: 4.59492301940918  actual: 5.0  difference: 0.4  percent difference: 8.0%
prediction: 2.9165267944335938  actual: 3.0  difference: 0.1  percent difference: 3.3%


### Using the Model to Predict Votes

In [10]:
# Guesses for the unknowns
guesses = list()
for i, dat in enumerate(input_data):
    # X represents the input data, Y represents the actual output
    X = dat
    X = Variable(torch.FloatTensor(X), requires_grad=False)

    outputs = net(X)

    prediction = round(float(outputs[0]))
    guesses.append(prediction)
    print(f'prediction: {prediction}')

print(sum(guesses)/len(guesses))

prediction: 5
prediction: 5
prediction: 6
prediction: 3
prediction: 3
4.4


### Results and Takeaways
I believe that I was able to get reasonable results given the sample size. There is an issue where the third prediction is out of the range of possible values at 6 (that person would have loved the chili). 

The average of the scores for the chili before it ran out was 3.95238095238095 and this model predicts near that. The average is certainly adequate for declaring a winner, but it does not account for personal preference at all. This model predicts the same outcome and this chili (which everyone liked so much that it ran out) still won. I did not look at the data to see if it was possible for 5 votes to dethrone this winner, but I recall it being close.

Looking back, I definitely overfit the data with 725 epochs and only 39 pieces of training data. I probably should have divided all of the data by 5 before use and multiplied by 5 and rounded in the end to prevent this issue. Maybe one day I will revisit this project and maybe even use tensorflow as a comparison.

Author: <a href="https://tylerkorte.com">Tyler Korte</a>