# What This Is
In this notebook, I will be training several scikit learn models to gain a preliminary understanding of the dataset. Following that, I will be training some more customized models with PyTorch in a separate notebook.
## Purpose
The goal here is to understand the dataset and what I should expect from models I train in the future. For this reason, data visualization is a prime focus.

## About the Data Set
The data set is from [Kaggle](https://www.kaggle.com/mohansacharya/graduate-admissions#Admission_Predict_Ver1.1.csv). It has 7 features, namely: 
-1. Student index (non-predictive)
0. GRE Score
1. TOEFL Score
2. Univeristy Rating
3. SOP (Statement of Purpose)
4. LOR (Letters of Recommendation)
5. CGPA
6. Research
7. Chance of Admission

## Procedure


## Outcomes
### Linear Regression
The final squared error of a linear regression was `0.00186`. This is incredibly low and implies high linearity 


In [1]:
# Importing Dependencies
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import csv
from sklearn import svm
from sklearn import neural_network

In [2]:
# Load the diabetes dataset
dataIn = csv.reader(open('../dataset/Admission_Predict_Ver1.1.csv'), delimiter=",")

In [3]:
def scale(n_arr_in):
    for i in range(len(n_arr_in[0])):
        max_ind_in_column_i = n_arr_in[:,i].argmax()
        n_arr_in[:, i] = n_arr_in[:, i]*(1/(n_arr_in[max_ind_in_column_i,i]))
    
    return n_arr_in

In [4]:
# Format the data
listIn = list(dataIn)

titles = listIn[0]
titles = titles[1:]
n_arr_in = (np.array(listIn))[1:]
n_arr_in = n_arr_in.astype(np.float)

# FEATURE SCALING!
n_arr_in = scale(n_arr_in)
# i = 1

# max_ind_in_column_i = n_arr_in[:,i].argmax()

# print(n_arr_in[max_ind_in_column_i,i])


n_arr_in = n_arr_in[:, 1:]

X_train = n_arr_in[0:400, 0:7]
Y_train = n_arr_in[0:400, 7]

X_test = n_arr_in[401:, 0:7]
Y_test = n_arr_in[401:, 7]

# print(X_train[0:10])

In [5]:
# Create and train linear regression object
regr = linear_model.LinearRegression()

# Train model
regr.fit(X_train, Y_train)

# Test model
Y_pred = regr.predict(X_test)

mse = mean_squared_error(Y_pred, Y_test)

print("Mean squared error on Validation Set: ",mse)
print('Variance score: %.2f' % r2_score(Y_pred, Y_test))

print("Parameters of Linear Regression Model (Scaled): ",(1/(regr.coef_[regr.coef_.argmax()]))*regr.coef_)
print("Parameters of Linear Regression Model (Unscaled): ",(regr.coef_))



Mean squared error on Validation Set:  0.00197218880699025
Variance score: 0.88
Parameters of Linear Regression Model (Scaled):  [ 0.50066129  0.29693647  0.0242256  -0.01400639  0.09472632  1.
  0.02078611]
Parameters of Linear Regression Model (Unscaled):  [ 0.60898962  0.36118476  0.02946731 -0.01703695  0.11522231  1.21637049
  0.02528361]


In [6]:
# Create and train svm regression object
clf = svm.SVR(C=10,kernel='linear')

# Train model
clf.fit(X_train, Y_train)

# Test model
Y_pred = clf.predict(X_test)

# print("Y_pred: ",Y_pred[0:10])
# print("Y_test: ",Y_test[0:10])

mse = mean_squared_error(Y_pred, Y_test)

print("Mean squared error on Validation Set: ",mse)
print('Variance score: %.2f' % r2_score(Y_pred, Y_test))




Mean squared error on Validation Set:  0.0030605769808085123
Variance score: 0.78


In [27]:
# Create and train linear regression object
regr = neural_network.MLPRegressor(hidden_layer_sizes=(150,50,10))

# Train model
regr.fit(X_train, Y_train)

# Test model
Y_pred = regr.predict(X_test)

mse = mean_squared_error(Y_pred, Y_test)

print("Mean squared error on Validation Set: ",mse)
print('Variance score: %.2f' % r2_score(Y_pred, Y_test))

# print("Parameters of Linear Regression Model (Scaled): ",(1/(regr.coef_[regr.coef_.argmax()]))*regr.coef_)
# print("Parameters of Linear Regression Model (Unscaled): ",(regr.coef_))



Mean squared error on Validation Set:  0.004941543312384438
Variance score: 0.54
