# Deep Learning Regression with Admissions Data

For this project, you will create a deep learning regression model that predicts the likelihood that a student applying to graduate school will be accepted based on various application factors (such as test scores).

By analyzing the parameters in this graduate admissions dataset, you will use TensorFlow with Keras to create a regression model that can evaluate the chances of an applicant being admitted. You hope this will give you further insight into the graduate admissions world and improve your test prep strategy.

If you take a look at admissions_data.csv, you’ll see parameters that admissions officers commonly use to evaluate university applicants. This data is from Kaggle and provides information about 500 applications for various universities and what their chance of admittance is.

This is a regression problem because the probability of being admitted is a continuous label between 0 and 1.
Load the csv file into a DataFrame and investigate the rows and columns to get familiarity with the dataset.
To get more information about each parameter in admissions_data.csv click the hint below.

Split it up the data into feature parameters and the labels.
You are creating a model that predicts an applicant’s likelihood of being admitted to a master’s program, so take some time to look at the features of your model and which column you are trying to predict. Also consider if there are any dataset features that should not be included as a predictor.
Make sure all of your variables are numerical.
If there are any categorical variables, be sure to map them to numerical values, using techniques such as one-hot-encoding, so they can be used in a regression analysis.


Since you are creating a learning model, you must have a training set and a test set. Remember that this allows you to measure the effectiveness of your model.

You have created two DataFrames: one for features DataFrame and one for labels. Now, you must split each of these into a training set and a test set.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow	import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import layers

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
from sklearn.metrics import r2_score

df=pd.read_csv('admissions_data.csv')
# print(df.head())
# print(df.describe())
print(df.columns[df.isnull().any()])

print(df.corr())

#splitting into features and labels
labels=df.iloc[:,-1]
# print(labels.unique())
features=df.iloc[:,0:-1]
print(features.columns)

#split features into training set and test set
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size = 0.20, random_state=1)

Index([], dtype='object')
                   Serial No.  GRE Score  TOEFL Score  University Rating  \
Serial No.           1.000000  -0.103839    -0.141696          -0.067641   
GRE Score           -0.103839   1.000000     0.827200           0.635376   
TOEFL Score         -0.141696   0.827200     1.000000           0.649799   
University Rating   -0.067641   0.635376     0.649799           1.000000   
SOP                 -0.137352   0.613498     0.644410           0.728024   
LOR                 -0.003694   0.524679     0.541563           0.608651   
CGPA                -0.074289   0.825878     0.810574           0.705254   
Research            -0.005332   0.563398     0.467012           0.427047   
Chance of Admit      0.008505   0.810351     0.792228           0.690132   

                        SOP      LOR       CGPA  Research  Chance of Admit   
Serial No.        -0.137352 -0.003694 -0.074289 -0.005332          0.008505  
GRE Score          0.613498  0.524679  0.825878  0.563398


If you look through the admissions_data.csv, you may notice that there are many different scales being used. For example, the GRE Score is out of 340 while the University Rating is out of 5. Can you imagine why this might be a problem when using a regression learning model?

You should either scale or normalize your data so that all columns/features have equal weight in the learning model.

In [4]:
# scale features
scaler=StandardScaler()
features_train_scaled=scaler.fit_transform(features_train)
features_test_scaled=scaler.transform(features_test)

Create a neural network model to perform a regression analysis on the admission data.

When designing your own neural network model, consider the following:

The shape of your input
Adding hidden layers as well as how many neurons they have
Including activation functions
The type of loss function and metrics you use
The type of gradient descent optimizer you use
Your learning rate



It’s time to test out the model you created!
Fit your model with your training set and test it out with your test set.
It’s okay if it is not that accurate right now. You can play around with your model and tweak it to increase its accuracy.

In [5]:
# neural network model for regression analysis
my_model = Sequential()
from tensorflow.keras.layers import InputLayer
input = InputLayer(input_shape = (features.shape[1], ))

my_model.add(input)

from tensorflow.keras.layers import Dense
my_model.add(Dense(64, activation = "relu"))
my_model.add(Dense(1))
print(my_model.summary())


from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate = 0.1)
my_model.compile(loss = 'mse', metrics = ['mae'], optimizer = opt)
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience = 20)
my_model.fit(features_train_scaled, labels_train, epochs = 40, batch_size = 1, verbose = 1,validation_split = 0.2, callbacks = [es])

history=my_model.fit(features_train_scaled, labels_train, epochs = 40, batch_size = 1, verbose = 1,validation_split = 0.2, callbacks = [es])

res_mse , res_mae = my_model.evaluate(features_test_scaled, labels_test, verbose = 0)

print(res_mse, res_mae)


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 64)                576       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 641
Trainable params: 641
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 00026: early stopping
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
E

Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40
0.07705709338188171 0.14807042479515076


Using the Matplotlib Library , see if you can plot the model loss per epoch as well as the mean-average error per epoch for both training and validation data. This will give you an insight into how the model performs better over time and can also help you figure out better ways to tune your hyperparameters.

Because of the way Matplotlib plots are displayed in the learning environment, please use fig.savefig('static/images/my_plots.png') at the end of your graphing code to render the plot in the browser. If you wish to display multiple plots, you can use .subplot() or .add_subplot() methods in the Matplotlib library to depict multiple plots in one figure.

In [6]:
# Plot loss and val_loss over each epoch
ax2 = fig.add_subplot(2, 1, 2)
ax2.plot(history.history['loss'])
ax2.plot(history.history['val_loss'])
ax2.set_title('model loss')
ax2.set_ylabel('loss')
ax2.set_xlabel('epoch')
ax2.legend(['train', 'validation'], loc='upper left')
 
# used to keep plots from overlapping each other  
fig.tight_layout()
fig.savefig('static/images/my_plots.png')
#implement R-sqaured score calculation



NameError: name 'fig' is not defined

Let’s say you wanted to evaluate how strongly the features in admissions.csv predict an applicant’s admission into a graduate program. We can use something called an R-squared value. It is also known as the coefficient of determination; feel free to explore more about it here.

Basically, we can use this calculation to see how well the features in our regression model make predictions. An R-squared value near close to 1 suggests a well-fit regression model, while a value closer to 0 suggests that the regression model does not fit the data well.

See if you can apply this to your model after it has been evaluated using a .predict() method on your features_test_set and the r2_score() function on your labels_test_set.

In [7]:
predicted_values = my_model.predict(features_test_scaled)

print(r2_score(labels_test, predicted_values))

-2.990320745589194
