# Simple Linear Regression using Scikit Learn

__what is Simple Linear Regression and when should you use it?__

Simple Linear Regression is a statistical method used to model the relationship between a dependent variable (also called the response variable) and one independent variable (also called the predictor variable). It is used to predict the value of the dependent variable based on the value of the independent variable.

The method assumes that there is a linear relationship between the two variables and that the relationship can be described by a straight line. The line is represented by an equation of the form:

y = b0 + b1 * x

Where:
y = dependent variable,
x = independent variable,
b0 = intercept,
b1 = slope of the line

The goal of Simple Linear Regression is to find the best-fitting line (i.e., the line that minimizes the difference between the predicted and actual values of the dependent variable) and to use this line to make predictions about future values of the dependent variable.

Simple Linear Regression is appropriate when:

- The relationship between the dependent and independent variables is linear
- There is only one independent variable
- The data is continuous and has a normal distribution
- There is little or no multicollinearity among the independent variables
- It's important to note that Simple Linear Regression is a good start for understanding the relationship between the variables and it can be a powerful tool for exploring the data and identifying potential relationships, but it can be limited when the relationship is not linear, or when there is more than one independent variable.

__When we shouldn't use Simple Linear Regression Algorithms?__

Simple Linear Regression is a powerful tool, but it has its limitations and there are certain situations where it may not be the best choice for modeling the relationship between the variables. Here are a few situations where you should not use Simple Linear Regression:

__Non-linear relationship:__ Simple Linear Regression assumes that the relationship between the variables is linear. If the relationship is not linear, using Simple Linear Regression may not provide accurate results.

__Multiple independent variables:__ Simple Linear Regression can only handle one independent variable. If you have multiple independent variables that you want to include in your model, you should use Multiple Linear Regression or another more advanced technique.

__Categorical independent variable:__ Simple Linear Regression assumes that the independent variable is continuous, but if the independent variable is categorical, you should use a different technique such as logistic regression or ANOVA.

__Non-normal distribution:__ Simple Linear Regression assumes that the data follows a normal distribution. If the data is not normally distributed, Simple Linear Regression may not be appropriate.

__Outliers:__ Simple Linear Regression is sensitive to outliers. If your data has outliers that might affect the model, it's better to use another technique that is robust to outliers such as robust linear regression or quantile regression.

__High multicollinearity:__ Simple Linear Regression assumes that the independent variables are not highly correlated with each other. High multicollinearity can cause problems with interpreting the coefficients and can make the model unstable.

It's worth noting that, even though Simple Linear Regression is not appropriate for certain situations, it can still be useful for exploring the data and identifying potential relationships. In those cases, it's better to use it to get a general idea of the data and then move to a more sophisticated model if necessary.

In [1]:
# Importing Important Libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

In [2]:

# Create a DataFrame with 50 rows and 2 columns
df = pd.DataFrame({'x': np.random.randint(10,100,50),
                   'y': np.random.randint(10,100,50)})

In [3]:

# Creating LinearRegression model
reg = LinearRegression().fit(df[['x']],df['y'])

In [4]:
# Get the coefficients
print(reg.coef_)

[-0.10963101]


In [5]:
# Get the intercept
print(reg.intercept_)

58.671980571770085


In [6]:
# Predict using the model
print(reg.predict([[65]]))

[51.54596517]




__Advantage of Simple Linear Regression Algorithms?__

Simple Linear Regression is a popular and widely used statistical method that has several advantages:

__Easy to understand and interpret:__ Simple Linear Regression is a simple and easy to understand model that can provide a clear picture of the relationship between the variables. The coefficients of the model can be interpreted as the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant.

__Fast to compute:__ Simple Linear Regression is a simple model with only one independent variable, it's easy and fast to compute.

__Easy to implement:__ Simple Linear Regression can be easily implemented in various programming languages such as R, Python, SAS and others.

__Good for Exploratory Data Analysis (EDA):__ Simple Linear Regression can be used to explore the relationship between variables and identify potential relationships. It can be used as a starting point for more advanced models if necessary.

__Low variance:__ Simple Linear Regression has a low variance, which means it's less affected by random errors in the data compared to more complex models.

__Handling Non-Linearity:__ Simple Linear Regression can be used to model non-linear relationships by transforming the data, such as taking the logarithm of the independent variable.

__Handling outliers:__ Simple Linear Regression is less affected by outliers as compared to other models, which can make it a good choice for datasets with outliers.

__Good for small datasets:__ Simple Linear Regression is a good choice for small datasets, as it does not require a large sample size to produce accurate results.

It's worth noting that, even though Simple Linear Regression has several advantages, it has its limitations and it's important to consider these limitations when choosing the appropriate model.

__Disadvantage of Simple Linear Regression Algorithms?__

Simple Linear Regression is a popular and widely used statistical method, but it also has several disadvantages:

__Assumes a linear relationship:__ Simple Linear Regression assumes that the relationship between the variables is linear. If the relationship is not linear, Simple Linear Regression may not provide accurate results.

__Only one independent variable:__ Simple Linear Regression can only handle one independent variable. If you have multiple independent variables that you want to include in your model, you should use Multiple Linear Regression or another more advanced technique.

__Sensitive to outliers:__ Simple Linear Regression is sensitive to outliers, which can affect the model's accuracy.

__Assumes normality:__ Simple Linear Regression assumes that the data follows a normal distribution. If the data is not normally distributed, Simple Linear Regression may not be appropriate.

__Assumes independence of errors:__ Simple Linear Regression assumes that the errors are independent, which may not be the case in certain situations.

__Assumes homoscedasticity:__ Simple Linear Regression assumes that the variance of the errors is constant, which may not be the case in certain situations.

__High bias:__ Simple Linear Regression has high bias, which means that it may not capture all the information in the data and may not perform well on unseen data.

__Limited ability to handle multicollinearity:__ Simple Linear Regression assumes that the independent variables are not highly correlated with each other. High multicollinearity can cause problems with interpreting the coefficients and can make the model unstable.

It's worth noting that, even though Simple Linear Regression has several disadvantages, it can still be useful for exploring the data and identifying potential relationships. In those cases, it's better to use it to get a general idea of the data and then move to a more sophisticated model if necessary.

##### Md. Ashiqur Rahman
##### Thank You