# Model training

After training the model using purely numpy, we will try and take advantage of the power of scikit learn, we will test the data with different machine learning models

---

Linear regression:

1. Predicting using simple gradient descent
2. Predicting using gradient descent with L2 penalty
3. Predicting using gradient descent with L1 penalty
4. Predicting using stochastic gradient descent with L2 penalty

In [1]:
import pandas as pd
import numpy as np

pd.pandas.set_option('display.max_columns', None)

In [2]:
def generateNumbers():
    i = 0
    while True:
        i += 1
        yield i

x = generateNumbers()

In [3]:
# Load the training set
X_train = pd.read_csv('X_train.csv')
y_train = pd.read_csv('y_train.csv')

# Convert the dataframes into numpy arrays
#X = X_train.to_numpy()
#y = y_train.to_numpy()

# Load the test  set
X_test = pd.read_csv('X_test.csv')

---
1. Predict using simple gradient descent


In [4]:
from sklearn.linear_model import LinearRegression

predictor = LinearRegression()
predictor.fit(X_train, y_train)

In [5]:
# Let us generate predictions, remember we originally applied log to the SalePrice, so we need to revert it using exp

preds = predictor.predict(X_test)
preds = np.exp(preds)

In [6]:
# Now let us save the predictions into a csv file
ids = np.array([i for i in range(1461, 1461+preds.size)], dtype=int)

predictions = pd.DataFrame(np.column_stack((ids, preds)), columns=['Id','SalePrice'])
predictions['Id'] = predictions['Id'].astype(int)
predictions.head()

Unnamed: 0,Id,SalePrice
0,1461,101405.119284
1,1462,146415.460003
2,1463,167690.185848
3,1464,181777.560224
4,1465,192698.960579


In [7]:
predictions.to_csv(f'predictions/pred{str(x.__next__())}.csv', index=False)

---
2. Predicting using gradient descent with L2 penalty

In [8]:
from sklearn.linear_model import Ridge

predictor = Ridge()
predictor.fit(X_train, y_train)

In [9]:
preds = predictor.predict(X_test)
preds = np.exp(preds)

In [10]:
ids = np.array([i for i in range(1461, 1461+preds.size)], dtype=int)

predictions = pd.DataFrame(np.column_stack((ids, preds)), columns=['Id','SalePrice'])
predictions['Id'] = predictions['Id'].astype(int)
predictions.head()

Unnamed: 0,Id,SalePrice
0,1461,102039.096289
1,1462,146227.274892
2,1463,167960.351244
3,1464,182301.753868
4,1465,194477.420534


In [11]:
predictions.to_csv(f'predictions/pred{str(x.__next__())}.csv', index=False)

---
3. Predicting using gradient descent with L1 penalty


In [12]:
from sklearn.linear_model import Lasso
predictor = Lasso()
predictor.fit(X_train, y_train)

In [13]:
preds = predictor.predict(X_test)
preds = np.exp(preds)

In [14]:
ids = np.array([i for i in range(1461, 1461+preds.size)], dtype=int)

predictions = pd.DataFrame(np.column_stack((ids, preds)), columns=['Id','SalePrice'])
predictions['Id'] = predictions['Id'].astype(int)

In [15]:
predictions.to_csv(f'predictions/pred{str(x.__next__())}.csv', index=False)

---
4. Now let us predict using stochastic gradient descent with l2 penalty

In [18]:
from sklearn.linear_model import SGDRegressor
predictor = SGDRegressor(penalty='l2')
predictor.fit(X_train, y_train.to_numpy().flatten())

In [19]:
preds = predictor.predict(X_test)
preds = np.exp(preds)

In [20]:
ids = np.array([i for i in range(1461, 1461+preds.size)], dtype=int)

predictions = pd.DataFrame(np.column_stack((ids, preds)), columns=['Id','SalePrice'])
predictions['Id'] = predictions['Id'].astype(int)

In [21]:
predictions.to_csv(f'predictions/pred{str(x.__next__())}.csv', index=False)