### Using OLS function from the statsmodels package to create MLR models
The Python package *statsmodels* has a function OLS (Ordinary Least Squares) that can be used for creating MLR models. This function conveniently creates and displays all the regression metrics that we have been using so far. 

This unit illustrates the steps involved in using OLS. This unit also uses a few very convenient *pandas* functions ... 

We have more about *pandas* and *statsmodels* coming up in later units.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Read the data file and display the first 5 rows

df = pd.read_csv('non-linear-data-set-for-regression.csv')
df.head()

In [None]:
# As explained in "DS203-2023-08-18-MLR-of-Non-Linear-Y.pdf", in this data set y is a non-linear function of x
# We will therefore add a few polynomial features to allow flexibility in the model
# Create three features x2, x3, x4 from x1 as explained in the above PDF

df['x2'] = df['x1']**2
df['x3'] = df['x1']**3
df['x4'] = df['x1']**4

In [None]:
# Import the statsmodels package and create the matrix / vector of imput and output data
import statsmodels.api as sm

X = df[['x1', 'x2', 'x3', 'x4']]
print(X.head())

# Add a constant (ie. column of ones) to the matrix, as required by the matrix representation of the system of equations for MLR
# See DS203-2023-08-18-MLR-Gradient-Descent-Derivation.pdf for details ...

X = sm.add_constant(X)
print(X.head())

y = df['y']

In [None]:
# Pass the X and y data to the OLS function to fit the model and print the model summary
# Note the details of the model, and compare it with that presented in DS203-2023-08-18-MLR-of-Non-Linear-Y.pdf

mlr_model = sm.OLS(y, X).fit()
print(mlr_model.summary())

In [None]:
# Using the model, make predictions, and create the error vector and it's plot

y_pred = mlr_model.predict(X)
y_pred.head()

e = y - y_pred
plt.scatter(X['x1'],e)

In [None]:
# Create a scatter plot of y and y_pred
plt.scatter(X['x1'],y_pred, color='red')
plt.scatter(X['x1'],y, color='green')