Boston dataset is one of the datasets available in sklearn.
You are given a Training dataset csv file with X train and Y train data. As studied in lecture, your task is to come up with Gradient Descent algorithm and thus predictions for the test dataset given.


Your task is to:


1. Code Gradient Descent for N features and come with predictions.

2. Try and test with various combinations of learning rates and number of iterations.

3. Try using Feature Scaling, and see if it helps you in getting better results. 


Read Instructions carefully -


1. Use Gradient Descent as a training algorithm and submit results predicted.

2. Files are in csv format, you can use genfromtxt function in numpy to load data from csv file. Similarly you can use savetxt function to save data into a file.

3. Submit a csv file with only predictions for X test data. File name should not have spaces. File should not have any headers and should only have one column i.e. predictions. Also predictions shouldn't be in exponential form.

4. Your score is based on coefficient of determination.

In [1]:
# Ignore warnings
import warnings
warnings.filterwarnings("ignore", category = FutureWarning)

In [2]:
# Loading the training dataset
import numpy as np
training_data = np.loadtxt("training_boston.csv", delimiter = ",")
training_data

array([[-0.40784991, -0.48772236, -1.2660231 , ...,  0.41057102,
        -1.09799011, 37.9       ],
       [-0.40737368, -0.48772236,  0.24705682, ...,  0.29116915,
        -0.52047412, 21.4       ],
       [ 0.1251786 , -0.48772236,  1.01599907, ..., -3.79579542,
         0.89107588, 12.7       ],
       ...,
       [-0.40831101, -0.48772236,  0.24705682, ...,  0.33206621,
        -0.33404299, 20.8       ],
       [-0.41061997, -0.48772236, -1.15221381, ...,  0.203235  ,
        -0.74475218, 22.6       ],
       [ 0.34290895, -0.48772236,  1.01599907, ...,  0.38787479,
        -1.35871335, 50.        ]])

In [3]:
# Input features (training)
X_train = training_data[:,:13]

# Output (training)
Y_train = training_data[:,13]

In [4]:
# Shape of input features (training)
X_train.shape

(379, 13)

In [5]:
# Shape of output (training)
Y_train.shape

(379,)

In [6]:
# Converting the input features (training) into Pandas dataframe to check for string, NaN values
import pandas as pd
df = pd.DataFrame(X_train)
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
count,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0,379.0
mean,0.019628,0.002455,0.03617,0.028955,0.028775,0.032202,0.038395,-0.001288,0.043307,0.043786,0.019218,-0.015785,0.018418
std,1.06749,1.000813,1.017497,1.048995,0.999656,1.001174,0.985209,1.027803,1.016265,1.019974,1.000296,1.015797,1.015377
min,-0.417713,-0.487722,-1.516987,-0.272599,-1.465882,-3.880249,-2.335437,-1.267069,-0.982843,-1.31399,-2.707379,-3.883072,-1.531127
25%,-0.408171,-0.487722,-0.867691,-0.272599,-0.878475,-0.57148,-0.768994,-0.829872,-0.637962,-0.755697,-0.488039,0.197588,-0.828856
50%,-0.383729,-0.487722,-0.180458,-0.272599,-0.144217,-0.103479,0.338718,-0.329213,-0.523001,-0.440915,0.297977,0.374827,-0.161629
75%,0.055208,0.156071,1.015999,-0.272599,0.628913,0.529069,0.911243,0.674172,1.661245,1.530926,0.806576,0.429868,0.647173
max,9.941735,3.804234,2.422565,3.668398,2.732346,3.555044,1.117494,3.960518,1.661245,1.798194,1.638828,0.441052,3.409999


You have learnt how to code Gradient Descent for a single featured dataset. Try to code a more Generic Gradient Descent. Let us consider that the $i^{th}$ feature for the first row is $x_1^i$. Similarily for the $j^th$ row, the $i^{th}$ feature will be $x_j^i$.So, your cost function would look something like :

$$ cost = \frac{1}{M}\sum_i^M (y_i - (m_ix_i^1 + m_ix_i^2 + m_ix_i^3 + ...... + m_{n + 1}x_{n + 1} ))^2 $$

Here $m_{n + 1}x_{n + 1}$ is actually 'c', constant value. (We usually take them to be 1)

Also, to find the next m (m'), our equation becomes :
$$ m_j' = m_j - \alpha\frac{\partial cost}{\partial m_j} $$

and 

$$\frac{\partial cost}{\partial m_i} = \frac{1}{M}\sum_i^M 2(y_i - (m_ix_i^1 + m_ix_i^2 + m_ix_i^3 + ...... + m_{n + 1}x_{n + 1} ))x_i^j $$

In [7]:
# This function finds the new gradient at each step
def step_gradient(x_, y_, m, learning_rate):
    m_slope = np.zeros(len(x_[0]))
    M = len(x_)
    for i in range(M) :
        x = x_[i]
        y = y_[i]
        for j in range(len(x)):
            m_slope[j] += (-2/M) * (y - sum(m * x))*x[j]
    new_m = m - learning_rate * m_slope
    return new_m

In [8]:
# Gradient Descent Function
def gd(x_, y_, learning_rate, num_iterations):
    m = np.zeros(len(x_[0]))     # Intial random values taken as 0
    for i in range(num_iterations):
        m = step_gradient(x_, y_, m, learning_rate)
        print(i, " Cost: ", cost(x_, y_, m))
    return m

In [9]:
# This function finds the new cost after each optimisation
def cost(x_, y_, m):
    total_cost = 0
    M = len(x_)
    for i in range(M):
        total_cost += (1/M)*((y_[i] - sum(m*x_[i]))**2)
    return total_cost

In [10]:
def run(x_, y_):
    learning_rate = 0.1
    num_iterations = 500
    m = gd(x_, y_, learning_rate, num_iterations)
    print("Final m :", m[0:-1])
    print("Final c :", m[-1])
    return m

In [11]:
# Feature Scaling (training data)
from sklearn import preprocessing
standard_scaler_object = preprocessing.StandardScaler()
standard_scaler_object.fit(X_train)
X_train = standard_scaler_object.transform(X_train)

In [12]:
x_ = np.append(X_train, np.ones(len(X_train)).reshape(-1, 1), axis = 1)
y_ = Y_train
m = run(x_, y_)

0  Cost:  372.6402282491625
1  Cost:  246.08068449886875
2  Cost:  166.3146252952613
3  Cost:  115.5014638404627
4  Cost:  83.05871412434605
5  Cost:  62.30835093474387
6  Cost:  49.01041492738445
7  Cost:  40.46819132006822
8  Cost:  34.964357574517365
9  Cost:  31.40411193003394
10  Cost:  29.08883961950816
11  Cost:  27.572339144821747
12  Cost:  26.569371917610482
13  Cost:  25.897428428927448
14  Cost:  25.439610994540992
15  Cost:  25.120956642829366
16  Cost:  24.893322775748786
17  Cost:  24.72573279388676
18  Cost:  24.59820571807398
19  Cost:  24.497810335880732
20  Cost:  24.416140489513502
21  Cost:  24.34769871842124
22  Cost:  24.288860770386073
23  Cost:  24.237211732207697
24  Cost:  24.191120020119126
25  Cost:  24.149463690506032
26  Cost:  24.111454347094874
27  Cost:  24.07652362150328
28  Cost:  24.044249803514337
29  Cost:  24.014310258104167
30  Cost:  23.986450424888826
31  Cost:  23.960463498173404
32  Cost:  23.93617700077588
33  Cost:  23.91344381979905
34  C

271  Cost:  23.466549551063284
272  Cost:  23.466536782699702
273  Cost:  23.46652434119951
274  Cost:  23.46651221819387
275  Cost:  23.46650040552835
276  Cost:  23.466488895257342
277  Cost:  23.466477679638718
278  Cost:  23.466466751128735
279  Cost:  23.466456102376725
280  Cost:  23.466445726220318
281  Cost:  23.466435615680663
282  Cost:  23.466425763957496
283  Cost:  23.466416164424825
284  Cost:  23.466406810626186
285  Cost:  23.466397696270644
286  Cost:  23.466388815228218
287  Cost:  23.4663801615259
288  Cost:  23.466371729343738
289  Cost:  23.466363513010684
290  Cost:  23.466355507000934
291  Cost:  23.466347705930268
292  Cost:  23.466340104552202
293  Cost:  23.46633269775463
294  Cost:  23.466325480556364
295  Cost:  23.466318448103763
296  Cost:  23.466311595667413
297  Cost:  23.46630491863897
298  Cost:  23.466298412528168
299  Cost:  23.466292072959657
300  Cost:  23.46628589567006
301  Cost:  23.46627987650522
302  Cost:  23.466274011417322
303  Cost:  23.46

In [13]:
# Loading the testing dataset
import numpy as np
testing_data = np.loadtxt("testing_boston.csv", delimiter = ",")
testing_data

array([[ 2.91816626, -0.48772236,  1.01599907, ...,  0.80657583,
        -1.59755122,  1.04106182],
       [-0.40339151, -0.48772236,  0.40609801, ..., -1.13534664,
         0.44105193, -0.89473812],
       [-0.4131781 , -0.48772236,  0.11573841, ...,  1.17646583,
         0.44105193, -0.50084979],
       ...,
       [-0.41001449,  2.08745172, -1.37837329, ..., -0.0719129 ,
         0.39094481, -0.68167397],
       [-0.40317611, -0.48772236, -0.37597609, ...,  1.13022958,
         0.34007019,  0.20142086],
       [-0.13356344, -0.48772236,  1.2319449 , ..., -1.73641788,
        -2.93893082,  0.48877712]])

In [14]:
# Input features (testing)
X_test = testing_data[:,:13]

In [15]:
# Shape of input features (training)
X_test.shape

(127, 13)

In [16]:
# Converting the input features (testing) into Pandas dataframe to check for string, NaN values
import pandas as pd
df = pd.DataFrame(X_test)
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
count,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0,127.0
mean,-0.058575,-0.007327,-0.107939,-0.08641,-0.085871,-0.096098,-0.114581,0.003845,-0.12924,-0.13067,-0.05735,0.047105,-0.054965
std,0.769837,1.005445,0.945672,0.839435,1.003998,0.998196,1.042254,0.920171,0.946051,0.933732,1.004824,0.957787,0.958559
min,-0.417173,-0.487722,-1.557842,-0.272599,-1.431329,-3.058221,-2.225199,-1.263551,-0.982843,-1.308051,-2.707379,-3.907193,-1.496084
25%,-0.410832,-0.487722,-0.891036,-0.272599,-0.947582,-0.567918,-1.240171,-0.762417,-0.637962,-0.785394,-0.765457,0.246544,-0.67887
50%,-0.398269,-0.487722,-0.375976,-0.272599,-0.299707,-0.127698,0.11113,-0.202052,-0.523001,-0.601276,0.113032,0.396098,-0.28358
75%,-0.2429,-0.219475,1.015999,-0.272599,0.434551,0.283316,0.898797,0.604198,-0.350561,0.072833,0.806576,0.441052,0.389254
max,3.966816,3.589637,2.117615,3.668398,2.732346,3.476688,1.117494,3.2873,1.661245,1.530926,1.268938,0.441052,3.548771


In [17]:
# Feature Scaling (testing data)
standard_scaler_object = preprocessing.StandardScaler()
standard_scaler_object.fit(X_test)
X_test = standard_scaler_object.transform(X_test)

In [18]:
testing_data = np.append(X_test, np.ones(len(X_test)).reshape(-1, 1), axis = 1)

**Predictions**

In [19]:
def predict(final_m, testing_data):
    y_pred = []
    for i in testing_data:
        ans = sum(i * m)
        y_pred.append(ans)
    return y_pred

In [20]:
y_pred = predict(m, testing_data)
y_pred

[11.11689676206052,
 28.729862780718193,
 22.527728851834723,
 23.98661842522167,
 20.51449806223987,
 1.9041072424365026,
 30.561375942894276,
 24.808123365414538,
 18.473831208655973,
 23.50888057538465,
 23.92593205019116,
 17.39232932531255,
 16.540449973171512,
 21.193879991549576,
 43.40783765204707,
 23.27204074528622,
 24.215604876229712,
 27.591785664018538,
 19.492685511758257,
 31.27995275548348,
 23.70459756417188,
 24.867249909132738,
 34.250092424967214,
 37.367001417000786,
 31.432735312267575,
 16.268602668724597,
 23.406962643998657,
 32.83989230967916,
 25.752154574644198,
 34.63649623030613,
 16.49696154226031,
 25.916946187500038,
 23.380497536430887,
 25.228093601125256,
 13.909407389028589,
 29.740599896982882,
 26.030240035913334,
 20.266745742213566,
 23.90014282686455,
 8.159321773071246,
 7.398852207833844,
 28.72205593137585,
 28.956681760803242,
 19.712674123478592,
 20.071597057248702,
 1.8482488427103156,
 39.87826904788368,
 25.667525238686302,
 29.762249

In [22]:
# Dumping the output obtained from the evaluation data into a "CSV" file
np.savetxt('Boston Predictions.csv', y_pred, fmt = '%.5f')