## Introduction:
*  The regression algorithms contained in this notebook are K nearest neighbor and neural network(keras).
*  The original data is standardized and then trained with the regressors. 
*  The evaluation metric is `PRE(Proportion of reduction in error)` which is defined as:
$\frac{MSE_{baseline}-MSE_{regression}}{MSE_{baseline}}$

where $MSE_{baseline}$ is the Mean squared error of estimating with sample mean and $MSE_{regression}$ is the Mean squared error of estimating with regressor(ie.NN,KNN). The PRE is a the percentage of reduction in error which mostly ranges from 0 to 1.Note that this value could be negative if our model perform worse than using the sample mean.

## Graduate admission rate: KNN 

In [132]:
#Import Libaries
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score

#import data
admission_rate=pd.read_csv('C:/Users/zhenguo/Desktop/STA141C/Admission_Predict.csv')
y=admission_rate['Chance of Admit ']
x=admission_rate.iloc[:,1:8]

#use 10 neighbors
def Knn_estimator(x,y,n_neigh=10):
    scaler = StandardScaler(with_mean=False)
    x_std=scaler.fit_transform(x)

    X_train, X_test, y_train, y_test = train_test_split(x_std, y, test_size=0.3)
    
    neigh=KNeighborsRegressor(n_neighbors=n_neigh)
    neigh.fit(X_train,y_train)
    
    pred_with_mean=[sum(y_test)/len(y_test)]*len(y_test)
    baseline=mean_squared_error(y_test,pred_with_mean)
    
    pred_mse=mean_squared_error(list(y_test),neigh.predict(X_test))
    r_square=(baseline-pred_mse)/baseline
    
    print("Baseline MSE:",round(baseline,4),"KNN MSE:",round(pred_mse,4),"PRE:",round(r_square,4))
    return r_square


overall=sum([Knn_estimator(x,y) for i in range(10)])/10
print("Overall PRE over 10 trials:",overall)

    

Baseline MSE: 0.0182 KNN MSE: 0.0048 PRE: 0.7392
Baseline MSE: 0.0203 KNN MSE: 0.0053 PRE: 0.7394
Baseline MSE: 0.0193 KNN MSE: 0.0043 PRE: 0.7772
Baseline MSE: 0.0181 KNN MSE: 0.004 PRE: 0.7783
Baseline MSE: 0.0177 KNN MSE: 0.0046 PRE: 0.7381
Baseline MSE: 0.0214 KNN MSE: 0.0044 PRE: 0.7948
Baseline MSE: 0.0194 KNN MSE: 0.0053 PRE: 0.7284
Baseline MSE: 0.0178 KNN MSE: 0.004 PRE: 0.7724
Baseline MSE: 0.0207 KNN MSE: 0.004 PRE: 0.8076
Baseline MSE: 0.0209 KNN MSE: 0.0057 PRE: 0.7253
Overall PRE over 10 trials: 0.7600823073968096


## Regression with Neural Network(Keras)

In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [133]:
#use no hidden layers since it is a small dataset, 30 iteration.
def keras_model():
    X_train, X_test, y_train, y_test = train_test_split(x_std, y, test_size=0.3)
    model=Sequential()
    model.add(Dense(7,input_dim=7,activation='relu',kernel_initializer='normal'))
    model.add(Dense(1,kernel_initializer='normal'))
    model.compile(loss='mean_squared_error',optimizer='adam')
    history=model.fit(X_train,y_train,epochs=30,validation_split=0.1,verbose=0)
    pred=model.predict(X_test)

    pred_with_mean=[sum(y_test)/len(y_test)]*len(y_test)
    baseline=mean_squared_error(y_test,pred_with_mean)
    pred_mse=mean_squared_error(list(y_test),pred)    
    r_square=(baseline-pred_mse)/baseline
    print("Baseline MSE:",round(baseline,4),"Keras MSE:",round(pred_mse,4),"PRE:",round(r_square,4))
    return r_square



In [134]:
overall=sum([keras_model() for i in range(10)])/10
print("Overall PRE over 10 trials:",overall)

Baseline MSE: 0.0214 Keras MSE: 0.0105 PRE: 0.5086
Baseline MSE: 0.02 Keras MSE: 0.009 PRE: 0.5482
Baseline MSE: 0.017 Keras MSE: 0.0072 PRE: 0.5765
Baseline MSE: 0.0193 Keras MSE: 0.0083 PRE: 0.5709
Baseline MSE: 0.0186 Keras MSE: 0.0081 PRE: 0.5677
Baseline MSE: 0.0182 Keras MSE: 0.0082 PRE: 0.5523
Baseline MSE: 0.0218 Keras MSE: 0.0099 PRE: 0.5477
Baseline MSE: 0.0196 Keras MSE: 0.01 PRE: 0.4901
Baseline MSE: 0.0221 Keras MSE: 0.011 PRE: 0.4997
Baseline MSE: 0.0192 Keras MSE: 0.0079 PRE: 0.5918
Overall PRE over 10 trials: 0.5453546316152135


Summary for this dataset: KNN outperforms neural network with relatively small sample size and low dimension and the computation for KNN is much shorter than neural network. 
