# About Linear Regression
Linear regression is used as a predictive model that assumes a linear relationship between the dependent variable (which is the variable we are trying to predict/estimate) and the independent variable/s (input variable/s used in the prediction).

Under Simple Linear Regression, only one independent/input variable is used to predict the dependent variable. It has the following structure:

# Y = C + M*X

Y = Dependent variable (output/outcome/prediction/estimation)
C = Constant (Y-Intercept)
M = Slope of the regression line (the effect that X has on Y)
X = Independent variable (input variable used in the prediction of Y)

In reality, a relationship may exist between the dependent variable and multiple independent variables. For these types of models (assuming linearity), we can use Multiple Linear Regression with the following structure:

# Y = C + M1*X1 + M2*X2 + …

# An Example (with the Dataset to be used)
For illustration purposes, let’s suppose that you have a fictitious economy with the following parameters:


<img src="https://datatofish.com/wp-content/uploads/2018/04/001_statsmodels.png">


# The Python Code using Statsmodels
The following Python code includes an example of Multiple Linear Regression, where the input variables are:

-Interest_Rate
-Unemployment_Rate

These two variables are used in the prediction of the dependent variable of Stock_Index_Price.

In [1]:
from pandas import DataFrame
import statsmodels.api as sm

Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
                }

df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price']) 

X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for the multiple linear regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example
Y = df['Stock_Index_Price']

X = sm.add_constant(X) # adding a constant

model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 

print_model = model.summary()
print(print_model)

  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
  x = pd.concat(x[::order], 1)


                            OLS Regression Results                            
Dep. Variable:      Stock_Index_Price   R-squared:                       0.898
Model:                            OLS   Adj. R-squared:                  0.888
Method:                 Least Squares   F-statistic:                     92.07
Date:                Tue, 05 Jul 2022   Prob (F-statistic):           4.04e-11
Time:                        22:50:02   Log-Likelihood:                -134.61
No. Observations:                  24   AIC:                             275.2
Df Residuals:                      21   BIC:                             278.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1798.4040    899.24

# This is the result that you’ll get once you run the Python code:
    
<img src="https://datatofish.com/wp-content/uploads/2018/04/00A_statsmodels.png">

# Interpreting the Regression Results
I highlighted several important components within the results:
1. <b>Adjusted. R-squared</b> reflects the fit of the model. R-squared values range from 0 to 1, where a higher value generally indicates a better fit, assuming certain conditions are met.

2. <b>const coefficient</b> is your Y-intercept. It means that if both the Interest_Rate and Unemployment_Rate coefficients are zero, then the expected output (i.e., the Y) would be equal to the const coefficient.

3. <b>Interest_Rate coefficient</b> represents the change in the output Y due to a change of one unit in the interest rate (everything else held constant)

4. <b>Unemployment_Rate coefficient</b> represents the change in the output Y due to a change of one unit in the unemployment rate (everything else held constant)

5. <b>std err</b> reflects the level of accuracy of the coefficients. The lower it is, the higher is the level of accuracy

6. <b>P >|t|</b> is your p-value. A p-value of less than 0.05 is considered to be statistically significant

7. <b>Confidence Interval</b> represents the range in which our coefficients are likely to fall (with a likelihood of 95%)

# Making Predictions based on the Regression Results
Recall that the equation for the Multiple Linear Regression is:

<b>Y = C + M1*X1 + M2*X2 + … </b>

So for our example, it would look like this:

<b>Stock_Index_Price = (const coef) + (Interest_Rate coef)*X1 + (Unemployment_Rate coef)*X2</b>

And this is how the equation would look like once we plug the coefficients:

<b>Stock_Index_Price = (1798.4040) + (345.5401)*X1 + (-250.1466)*X2</b>



# Let’s suppose that you want to predict the stock index price, where you just collected the following values for the first month of 2018:

<b>Interest Rate = 2.75 (i.e., X1= 2.75)<br>
Unemployment Rate = 5.3 (i.e., X2= 5.3)</b>
<br>

When you plug those numbers you’ll get:

<b>Stock_Index_Price = (1798.4040) + (345.5401)*X1 + (-250.1466)*X2</b> <br>
<b>Stock_Index_Price = (1798.4040) + (345.5401)*(2.75) + (-250.1466)*(5.3) = 1422.86</b>

# The predicted/estimated value for the Stock_Index_Price in January 2018 is therefore 1422.86.

The predicted value can eventually be compared with the actual value to check the level of accuracy. If, for example, the actual stock index price for that month turned out to be 1435, then the prediction would be off by <b>1435 – 1422.86 = 12.14</b>

Disclaimer: this example should not be used as a predictive model for the stock market. It was based on a fictitious economy for illustration purposes only.