# Machine Learning

## Supervised Machine Learning

## Linear Regression
Linear Regression is a supervised machine learning algorithm used for predicting a continuous target variable based on one or more independent features by fitting a linear equation to the data.

## Equation of Linear Regression

### Simple Linear Regression (one independent variable): Single Dependent and Single Independent
y=β0+β1x+ε

Where:
   y: Dependent variable (target)

   x: Independent variable (feature)
   
   β0: Intercept
   
   β1: Slope or coefficient
   
   𝜀: Error term (residual)

### Multiple Linear Regression (multiple features): Single Dependent and Multiple Independent
𝑦=𝛽0+𝛽1𝑥1+𝛽2𝑥2+...+𝛽𝑛𝑥𝑛+𝜀

## Assumptions of Linear Regression
Linearity: The relationship between features and target is linear.

Independence: Observations are independent.

Homoscedasticity: Constant variance of errors.

Normality of Errors: Errors are normally distributed.

No Multicollinearity: Independent variables are not highly correlated.

## Metrics to Evaluate Linear Regression
Mean Absolute Error (MAE):It is Avg difference between predicted and actual value. 

Mean Squared Error (MSE): It is Squared of Avg difference between predicted and actual value.

Root Mean Squared Error (RMSE): It is Root of Squared of Avg difference between predicted and actual value.

R-squared (R²):It is Variance Between Predicted and Actual value.

## Real-world Use Cases
Predicting house prices

Forecasting sales

Estimating salary based on experience

Stock price estimation (basic form)

Energy consumption prediction

## Linear Regression Fails
Non-linear data: Use Polynomial Regression or other models.

Multicollinearity: Use Ridge or Lasso Regression.

Outliers: Can distort the model heavily. Consider using Robust Regression

## Model Parameter

| Parameter       | Type | Default | Description                                 |
| --------------- | ---- | ------- | ------------------------------------------- |
| `fit_intercept` | bool | True    | Whether to calculate intercept $\beta_0$    |
| `normalize`     | bool | -       | **Deprecated** – used to normalize features |
| `copy_X`        | bool | True    | Whether to copy input X or overwrite        |
| `n_jobs`        | int  | None    | Number of CPUs to use during training       |


## ML LifeCycle

### Data Preprocessing:
#### Clean the dataset using pandas and numpy like handling missing values and outliers.
### EDA:
#### Perfrom Exploratory data analysis to find trends and patterns, Which help to create new during feature engineering and data cleaning if needed
### Feature Engineering
#### Create New Using existing dataset which helps to predict dependent variable.
### Feature Selection:
#### Select most important and meaning full feature/column which helps to predict the regression and classification
### Model Selection
### HyperParameter Tunning
### Model Building
### Model Deployment

In [3]:
from sklearn.linear_model import LinearRegression
import pandas as pd

lr_model=LinearRegression(fit_intercept=True)

In [13]:
df=pd.read_csv("Walmart_Sales.csv")

In [14]:
df.head(2)

Unnamed: 0,Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
0,1,05-02-2010,1643690.9,0,42.31,2.572,211.096358,8.106
1,1,12-02-2010,1641957.44,1,38.51,2.548,211.24217,8.106


In [15]:
df.isnull().sum()

Store           0
Date            0
Weekly_Sales    0
Holiday_Flag    0
Temperature     0
Fuel_Price      0
CPI             0
Unemployment    0
dtype: int64

In [16]:
#df["Date"]=pd.to_datetime(df["Date"],errors='coerce')

In [26]:
X=df
X=X.drop(["Weekly_Sales","Date"],axis=1)
y=df["Weekly_Sales"]#Dependet Variable

In [27]:
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [28]:
print(len(x_train))
print(len(x_test))


5148
1287


In [None]:
predict=lr_model.fit(x_train,y_train)

In [32]:
from sklearn.metrics import root_mean_squared_error
y_pred=lr_model.predict(x_test)
model_rmse=root_mean_squared_error(y_test,y_pred)
print(model_rmse)

523884.7404541007
