# Canada Per Capita Income Prediction Using Linear Regression

This notebook demonstrates how to use Linear Regression to predict per capita income in Canada based on the year.

We will go through the following steps:
1. Load and explore the dataset
2. Visualize the data
3. Train a Linear Regression model
4. Make predictions
5. Plot the regression line


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

### Step 1: Load and explore the dataset

We will load the data from the CSV file and rename the column for clarity.

In [None]:
data = pd.read_csv('canada_per_capita_income.csv')
data.rename(columns={'per capita income (US$)': 'income'}, inplace=True)
data

### Step 2: Visualize the data

Next, we will visualize the data to see the relationship between the year and the income.

In [None]:
plt.scatter(data[['year']], data.income, color='red')
plt.xlabel('year(input)')
plt.ylabel('income(output)')
plt.title('Income vs Year')

### Step 3: Train a Linear Regression model

Now, we will define our input (year) and output (income) variables and train a linear regression model.

In [None]:
model = LinearRegression()

In [None]:
input = data[['year']]
output = data.income
model.fit(input, output)

### Step 4: Make Predictions

We will load a new test dataset and use the trained model to make predictions for the income based on the year.

In [None]:
df = pd.read_csv('test.csv')
predict = model.predict(df).astype('int')
predict

### Step 5: Manual Prediction Using Model Coefficients

We can manually compute a prediction using the model's coefficients (slope and intercept) obtained from the linear regression formula:

`y = mx + b`

In [None]:
# y = m*x + b
# 828.46507522
# -1632210.7578554575
prediction = 828.46507522 * 2000 + -1632210.7578554575
prediction

### Step 6: Visualize the Regression Line

Finally, let's plot the regression line along with the original data to see how well the model fits.

In [None]:
plt.scatter(data[['year']], data.income, color='red')
plt.xlabel('year(input)')
plt.ylabel('income(output)')
plt.plot(data[['year']], model.predict(input), color='blue')
plt.title('Linear Regression Fit: Income vs Year')
plt.show()