In [33]:
import matplotlib.pyplot as plt
import numpy as np
import utils

%matplotlib inline

## Exploratory Data Analysis
One of the challenges with this dataset was that all of its attributes, save for the test scores, were categorical. One of the advantages was that many of the categories were binary (lunch status/test completion/gender), and that was something we wanted to look into in more detail. As our initial plan involved decision trees and forests, we were hopeful that grouping by binary attributes would prove useful.

In [30]:
def make_dot_chart(table, s_att, s_ops, s_labels, chart_title):
    '''
        Create a dot/strip chart of frequency based on att
    '''
    title = "Score Distribution by " + chart_title
    fname = chart_title.lower().replace(' ', '_') + '_plot.pdf'

    m1 = [int(x[-3].strip('"')) for x in table if x[s_att] == s_ops[0]]
    m2 = [int(x[-3].strip('"')) for x in table if x[s_att] == s_ops[1]]

    r1 = [int(x[-2].strip('"')) for x in table if x[s_att] == s_ops[0]]
    r2 = [int(x[-2].strip('"')) for x in table if x[s_att] == s_ops[1]]

    w1 = [int(x[-1].strip('"')) for x in table if x[s_att] == s_ops[0]]
    w2 = [int(x[-1].strip('"')) for x in table if x[s_att] == s_ops[1]]

    gps = [m1, m2, r1, r2, w1, w2]
    y_vals = [[y + 1 for i in range(len(group))] for y, group in enumerate(gps)]

    plt.figure()
    plt.title(title)
    plt.xlabel("Raw Exam Score out of 100")
    avxs = [np.mean(x) for x in gps]
    avys = [y_vals[x][0] for x in range(len(avxs))]
    colors = ['olive', 'purple', 'orange', 'crimson',  'slateblue', 'mediumturquoise']
    for i, g in enumerate(gps):
        plt.scatter(g, y_vals[i], marker='.', s=500, alpha=0.05, color=colors[i])

    plt.scatter(avxs,avys, marker='x', s=250, alpha=1.0, c='black')
    ytks = avys
    ylbs = ['Math-' + s_labels[0], 'Math-' + s_labels[1],
            'Reading-' + s_labels[0], 'Reading-' + s_labels[1],
            'Writing-' + s_labels[0], 'Writing-' + s_labels[1],]
    plt.yticks(ticks=ytks, labels=ylbs)
    
    plt.grid(True)
    plt.tight_layout()
    plt.show()

In [37]:
f = 'StudentsPerformance.csv'
students = utils.read_table(f)
h = students[0]
s = utils.strip_quotation_marks_list(students[1:])
# make_dot_chart(s, 0, ['female', 'male'], ["Female", "Male"], "Gender")
# make_dot_chart(s, 3, ['standard', 'free/reduced'], ["Standard", "Free/Reduced"], "Lunch Status")
# make_dot_chart(s, 4, ['completed', 'none'], ["Prep Course", "No Prep Course"], "Preparation Course Completion")

AttributeError: module 'utils' has no attribute 'strip_quotation_marks_list'

This data was interesting to look at. Math seemed to be the subject that varied the most across all three groupings, while writing and reading seemed very similar to one another. After experimenting with results classifying as grades (A, B, C, D, F), we decided the simplest approach would be to try and predict whether or not a score was passing (>= 60) or failing. This was represented numerically as a 0 for failing and a 1 for passing.

### Neural Network
This was our new topic, and one of the more difficult things we attempted. What we ended up with was less a "Neural Network" and more a "Best Neuron" predictor. This was because we did not acheive any hidden layers between input and output, only a single neuron. This way, we were not required to do backpropagation in order to teach the system, we only had to update the weights of the single output neuron. 

Starting this endeavor, we had to convert all of our attributes to numbers. We converted lunch status to 1 (free/reduced) or 2 (standard), preparation course to 0 (no course) or 1 (completed), parents education to 1 (high school degree or some highschool), 2 (associate's degree or some college) or 3 (college or masters degree). 