<a href="https://colab.research.google.com/github/Deadkiller43/MLBasics/blob/main/multivariate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We need to import all the necessary packages.**bold text**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

Preprocessing the data

In [2]:
df = pd.read_csv('/content/50_Startups.csv')
df.sample(5)

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
2,153441.51,101145.55,407934.54,Florida,191050.39
31,61136.38,152701.92,88218.23,New York,97483.56
34,46426.07,157693.92,210797.67,California,96712.8
23,67532.53,105751.03,304768.73,Florida,108733.99
41,27892.92,84710.77,164470.71,Florida,77798.83


Diplaying the no.of columns

In [3]:
df = pd.get_dummies(df, columns=["State"])
df.columns

Index(['R&D Spend', 'Administration', 'Marketing Spend', 'Profit',
       'State_California', 'State_Florida', 'State_New York'],
      dtype='object')

Dividing the input,**bold text** target columns X, y respectively.

In [4]:
X = df.drop("Profit",axis=1).values
y = df["Profit"].values

In [5]:
X.shape, y.shape

((50, 6), (50,))

The y is the single dimensional array so we need to reshape the **y**

In [6]:
y = y.reshape(-1,1)

Performing the Linear Regression.

In [7]:
lr = LinearRegression()
lr.fit(X, y)

In [8]:
y_pred=lr.predict(X)

Calculating the metrices

In [9]:
print("MAE", mean_absolute_error(y, y_pred))
print("MSE", mean_squared_error(y, y_pred))
print("RMSE", np.sqrt(mean_squared_error(y, y_pred)))
print("R2 Score", r2_score(y, y_pred))

MAE 6475.500708609337
MSE 78406792.88803764
RMSE 8854.761029414494
R2 Score 0.9507524843355148


Spliting the dataset into train and test data

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=33)

In [11]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((40, 6), (10, 6), (40, 1), (10, 1))

###Without Regularization

In [12]:
lr = LinearRegression()
lr.fit(X_train, y_train)

In [13]:
y_pred_lr = lr.predict(X_test)

In [14]:
print("MAE", mean_absolute_error(y_test, y_pred_lr))
print("MSE", mean_squared_error(y_test, y_pred_lr))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
print("R2 Score", r2_score(y_test, y_pred_lr))

MAE 9932.179935040354
MSE 134140835.06275347
RMSE 11581.918453466742
R2 Score 0.9115245577202423


###L1 Regularization

In [15]:
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

In [16]:
y_pred_lasso = lasso.predict(X_train)

In [17]:
print("MAE", mean_absolute_error(y_train, y_pred_lasso))
print("MSE", mean_squared_error(y_train, y_pred_lasso))
print("RMSE", np.sqrt(mean_squared_error(y_train, y_pred_lasso)))
print("R2 Score", r2_score(y_train, y_pred_lasso))

MAE 6414.226959382949
MSE 73413666.80921319
RMSE 8568.17756639142
R2 Score 0.9539424364045493


In [18]:
y_pred_lasso2 = lasso.predict(X_test)

In [19]:
print("MAE", mean_absolute_error(y_test, y_pred_lasso2))
print("MSE", mean_squared_error(y_test, y_pred_lasso2))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_lasso2)))
print("R2 Score", r2_score(y_test, y_pred_lasso2))

MAE 9932.021558676342
MSE 134137662.69756405
RMSE 11581.781499301567
R2 Score 0.9115266501212109


In [20]:
lr.score(X, y)

0.946260100906504

In [21]:
ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)

In [22]:
y_pred_ridge = ridge.predict(X_train)

In [None]:
print("MAE", mean_absolute_error(y_train, y_pred_ridge))
print("MSE", mean_squared_error(y_train, y_pred_ridge))
print("RMSE", np.sqrt(mean_squared_error(y_train, y_pred_ridge)))
print("R2 Score", r2_score(y_train, y_pred_ridge))

In [23]:
y_pred_ridge2 = ridge.predict(X_test)

In [24]:
print("MAE", mean_absolute_error(y_test, y_pred_ridge2))
print("MSE", mean_squared_error(y_test, y_pred_ridge2))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_ridge2)))
print("R2 Score", r2_score(y_test, y_pred_ridge2))

MAE 9927.584793071022
MSE 134046153.47904706
RMSE 11577.830257826683
R2 Score 0.911587006973598


In [25]:
lr.score(X_test, y_test)

0.9115245577202423