# Linear Regression: Concepts and Implementation

## Fundamentals of Linear Regression

Linear regression is a fundamental statistical method used for predicting the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables and aims to find the best-fitting straight line that minimizes the differences between the observed and predicted values. Here are the key concepts of linear regression:

+ <b>Dependent variable (Response variable):</b> This is the variable that you want to predict or explain. It is denoted as "Y" in the equation and is assumed to be dependent on the independent variables.
<br></br>
+ <b>Independent variables (Predictor variables):</b> These are the variables that are used to predict the dependent variable. In a simple linear regression, there is only one independent variable (denoted as "X"), while in multiple linear regression, there are multiple independent variables (X1, X2, X3, etc.).
<br></br>
+ <b>Linear equation:</b> The linear regression model is represented by a linear equation of the form: Y = β0 + β1X1 + β2X2 + ... + βn*Xn, where Y is the dependent variable, X1, X2, ..., Xn are the independent variables, and β0, β1, β2, ..., βn are the coefficients (also called regression coefficients or weights) that determine the slope and intercept of the line.
<br></br>
+ <b>Slope (β):</b> The slope represents the change in the dependent variable (Y) for a one-unit change in the independent variable (X). It indicates the direction and magnitude of the relationship between the variables.
<br></br>
+  <b>Intercept (β0):</b> The intercept is the value of the dependent variable (Y) when all independent variables are set to zero. It gives the starting point of the line on the y-axis.
<br></br>
+  <b>Residuals:</b> Residuals are the differences between the observed values and the predicted values by the linear regression model. The goal of linear regression is to minimize the sum of squared residuals, which is called the "least squares" method.
<br></br>
+ <b>Assumptions of Linear Regression: </b> Linear regression relies on several assumptions, including linearity, independence of errors, homoscedasticity (constant variance of residuals), normality of errors, and no multicollinearity among independent variables.
<br></br>
+ <b>R-squared (Coefficient of determination):</b> R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 0 indicates that the model does not explain any variation, and 1 indicates a perfect fit.
<br></br>
+ <b>Adjusted R-squared:</b> Adjusted R-squared is a modified version of R-squared that adjusts for the number of independent variables in the model. It penalizes adding irrelevant variables and generally provides a more accurate assessment of model performance.
<br></br>
+ <b>Outliers and influential points:</b> Outliers are data points that deviate significantly from the overall pattern and can have a substantial impact on the regression line. Influential points are observations that have a strong influence on the regression coefficients and can dramatically affect the fit of the model.

### Linear Regression with Life Expectancy Data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Load the data
data = pd.read_csv('Life Expectancy Data.csv')

In [3]:
data

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,...,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,...,6.0,8.16,65.0,0.1,584.259210,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,...,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,...,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.470,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,...,67.0,8.52,67.0,0.1,669.959000,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,...,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2933,Zimbabwe,2004,Developing,44.3,723.0,27,4.36,0.000000,68.0,31,...,67.0,7.13,65.0,33.6,454.366654,12777511.0,9.4,9.4,0.407,9.2
2934,Zimbabwe,2003,Developing,44.5,715.0,26,4.06,0.000000,7.0,998,...,7.0,6.52,68.0,36.7,453.351155,12633897.0,9.8,9.9,0.418,9.5
2935,Zimbabwe,2002,Developing,44.8,73.0,25,4.43,0.000000,73.0,304,...,73.0,6.53,71.0,39.8,57.348340,125525.0,1.2,1.3,0.427,10.0
2936,Zimbabwe,2001,Developing,45.3,686.0,25,1.72,0.000000,76.0,529,...,76.0,6.16,75.0,42.1,548.587312,12366165.0,1.6,1.7,0.427,9.8
