<a href="https://colab.research.google.com/github/DrSubbiah/2A-LinearModels/blob/main/1_1_LinearModel_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let us consider **Credit** data set from **ISLR**, one of the packages in R and the same can be accessed from [Here](https://github.com/DrSubbiah/LinearModels/blob/main/Credit.csv)

This data set has 400 observations (cases) and 12 variables (features).

First, let us consider three numeric predictors

- *Age, Income, Limit*

and the response, *Balance*

Underlying model is fitted using sklearn in python

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder

In [None]:
url = 'https://raw.githubusercontent.com/DrSubbiah/LinearModels/main/Credit.csv'
cr_da = pd.read_csv(url)

In [None]:
cr_da.head()

In [None]:
# Scatter PLot
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Relation between Three Numeric Predictors and the Response')
ax1.scatter(cr_da['Age'], cr_da['Balance'],c="orange")
ax2.scatter(cr_da['Income'], cr_da['Balance'],c='blue')
ax3.scatter(cr_da['Limit'], cr_da['Balance'],c='green')


for ax in fig.get_axes():
    ax.label_outer()


## <font color="maroon"> Interpretation of Weights $\beta$

### <font color="blue"> Numeric predictors

If the fitted LM has a numeric predictor $X$, then the corresponding weight $\beta$ is interpreted as the **change ** in the average value of the response variable $Y$ when the predictor $X$ is increased by one unit, provided other predictors are kept in a same value (constant); constant may be zero or any appropriate value

### <font color="blue">Categorical predictors

Following are the steps to interpret factor predictors

1. One of the levels will be considered Reference or Base level

1. Weights $(\beta)$ are the **changes in the mean** response $Y$ when the levels of factor $X$ are compared with base level

1. Other predictors are kept constant; factor predictors, are kept in the **same level**

### <font color="blue">Intercept or Constant

Constant in the model provides the average response when all the predictors are kept at the same level   


In [None]:
lm_mod = linear_model.LinearRegression()

In [None]:
#Three Numeric Predictors
x= cr_da[["Age","Income","Limit"]]
y= cr_da["Balance"]

fit1=lm_mod.fit(x,y)

In [None]:
op=np.append(fit1.intercept_,fit1.coef_)

In [None]:
op=pd.DataFrame(op)
op.columns =['Estimate']
op.index = ['Constant', 'Age', 'Income', 'Limit']
op

## <font color="maroon"> Treating Categorical Predicators




We fit the model

1.   First let us do One-hot encoding - convert categorical variables into a numerical format
1.   Then we drop One Category (usually, the first or last) as a reference category
1. We fit the model





In [None]:
# One-hot encode the categorical variable
encoder = OneHotEncoder(drop='first', sparse_output=False)  # Drop the first category
cr_da_encoded = encoder.fit_transform(cr_da[['Ethnicity']])

# Fit the linear regression model
model = LinearRegression()
model.fit(cr_da_encoded, cr_da['Balance'])

# Get the coefficients
coefficients = model.coef_
intercept = model.intercept_

# Retrieve feature names
feature_names = encoder.get_feature_names_out(input_features=['Ethnicity'])

# Create a DataFrame to display coefficients with feature names
coef_df = pd.DataFrame({'Feature': feature_names, 'Coefficient': coefficients})
coef_df.loc[len(coef_df)] = ['Intercept', intercept]

print(coef_df)


## <font color="maroon"> More than one Categorical Variable

In [None]:
# One-hot encode the categorical variable
encoder = OneHotEncoder(drop='first', sparse_output=False)  # Drop the first category
cr_da_encoded = encoder.fit_transform(cr_da[['Ethnicity','Married']])

# Fit the linear regression model
model = LinearRegression()
model.fit(cr_da_encoded, cr_da['Balance'])

# Get the coefficients
coefficients = model.coef_
intercept = model.intercept_

# Retrieve feature names
feature_names = encoder.get_feature_names_out(input_features=['Ethnicity',"Married"])

# Create a DataFrame to display coefficients with feature names
coef_df = pd.DataFrame({'Feature': feature_names, 'Coefficient': coefficients})
coef_df.loc[len(coef_df)] = ['Intercept', intercept]

print(coef_df)
