#**LINEAR REGRESSION MODEL** on Canada Income dataset

# Objective
Given Canada's adjusted net national income per capita (current US$) until the year 2018, answer the following questions using a Linear Regression Model.

The data has been taken from [The World Bank ](https://data.worldbank.org/indicator/NY.ADJ.NNTY.PC.CD?locations=CA)and simplified.

## Import Libraries
These libraries will be required and sufficient to perform this exercise

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Load Data
Use the following link to load the CSV file of the data.

https://raw.githubusercontent.com/dphi-official/Datasets/master/canada_income.csv

In [2]:
full_data=pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/canada_income.csv')

In [3]:
full_data.shape

(49, 2)

In [4]:
full_data['income'].nunique()

49

In [6]:
full_data['year'].nunique()

49

In [5]:
full_data.head()

Unnamed: 0,year,income
0,1970,3409.864065
1,1971,3753.355155
2,1972,4253.883264
3,1973,4832.638103
4,1974,5709.237948


## Divide data as Input and Target Variables
Put the year column into X and income column into y.

In [7]:
x=full_data.drop('income', axis =1)
y=full_data['income']

## Split data into Train and Test Sets
Split the data and put it into variables X_train, X_test, y_train, y_test.

The split ratio should be 80:20.

In [9]:
from sklearn.model_selection import train_test_split
x_train, x_test,  y_train, y_test = train_test_split(x, y, random_state=101, test_size=.20)

## Create and Train the model
Create a Linear Regression object.

Use the fit method to train the model.

In [10]:
print(x_train.shape)
print(y_train.shape)


(39, 1)
(39,)


In [11]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x_train, y_train)

Now that the model has been trained, perform the appropriate steps to answer the given questions.

In [12]:
y_intercept=lr.intercept_
print(f"is: {y_intercept:.3f}")

is: -1625594.599


In [13]:
from sklearn import metrics
# Assuming model is your trained model and X_test is your test feature set
y_pred = lr.predict(x_test)

# Now calculate the metrics
mae = metrics.mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mae)

mse = metrics.mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
rmse = np.sqrt(mse)
print("Root Mean Squared Error:", round(rmse, 3))

Mean Absolute Error: 2276.061453438604
Mean Squared Error: 8387142.901914376
Root Mean Squared Error: 2896.056


In [14]:
coefficient = lr.coef_

# Print the coefficient rounded to three decimal places
print("Slope (Coefficient) of the best-fit line:", round(coefficient[0], 3))

Slope (Coefficient) of the best-fit line: 825.17


In [15]:
year_to_predict = np.array([[2020]])
predicted_net_income=lr.predict(year_to_predict)
print('income:', round(predicted_net_income[0], 2))

income: 41248.69




In [16]:
year_to_predict = np.array([[2025]])
predicted_net_income=lr.predict(year_to_predict)
print('income:', round(predicted_net_income[0], 2))

income: 45374.54


