# Regression Lab

## Introduction
Regression analysis is an important statistical technique used to model the relationship between a dependent variable and one or more independent variables. It is commonly used in various fields, such as finance, economics, engineering, and social sciences, to make predictions and understand the underlying patterns in data.

## Learning Objectives
- Practice training Linear Regression models
- Practice training Multiple Linear Regression models
- Practice training Polynomial Regression models
- Practice training Multiple Polynomial Regression models


**Emojis Legend**
- 👨🏻‍💻 - Instructions; Tells you about something specific you need to do.
- 🦉 - Tips; Will tell you about some hints, tips and best practices
- 📜 - Documentations; provides links to documentations
- 🚩 - Checkpoint; marks a good spot for you to commit your code to git
- 🕵️ - Tester; Don't modify code blocks starting with this emoji

## Setup
* Install this lab's dependencies by running the following command in your terminal: `pipenv install`
* Make sure you switch to the correct environment by choosing the correct kernel in the top right corner of the notebook.

### Package Imports
We will keep coming back to this cell to add "import" statements, and configure libraries as we need

- **Task 👨🏻‍💻**: Keep coming back to update this cell as you need to import new packages.
- **Task 👨🏻‍💻**: Check what's already been imported here

In [1]:
# Common imports
import numpy as np
import pandas as pd

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
plt.style.use("bmh")

# other imports
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

## Education-Seniority-Income Data
### EDA
The following dataset is a collection of data from a survey of 30 people. The data contains the following columns:
- `education`: Years of education
- `Seniority`: (months?) of work experience
- `Income`: Income in thousand dollars

<details>
  <summary>Data should look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/education-seniority-income-dataset.png" />
</details>

**Task 👨🏻‍💻**: Import the (income2.csv) dataset into a Pandas DataFrame:
1. name the DataFrame `income_df`
2. Print the first 5 rows

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: print the DataFrame's information <ins>and</ins> statistical summary

_hint:_ wrap your function calls in a `display()` function call so you can put them both in the cell

In [None]:
# FIXME
display(...)
display(...)

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Get the correlation matrix of your dataset

**Task 👨🏻‍💻**: Plot the correlation matrix of the DataFrame as a heatmap
<details>
  <summary>Graph should look like this:</summary>
  <p>this was created using `seaborn`'s heatmap function</p>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/income2-heatmap.png" />
</details>

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Plot the correlation matrix of the DataFrame as a scatter matrix of charts showing the relationship between each pair of variables
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/income2-scatter.png" />
</details>

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: using `Plotly` create a 3D scatter plot of the data
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/income-3d.gif" />
</details>

> 🚩 : Make a git commit here

### Simple Linear Regression

**Task 👨🏻‍💻**: Chart a scatter plot of `education` vs `income`
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/income2-edu-income-scatterplot.png" />
</details>

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Use `sklearn` to train a Linear Regression model on the `education` and `income` columns
- Just instantiating the model, and fitting the data is enough

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Plot the trained model (line) on top of the scatter plot
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/edu-income-linear-model.png" />
</details>

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Use the `Predict` method to predict the income of a person with 20 years of education

_hint:_ it's going to be `85.82661213`

> 🚩 : Make a git commit here

### Multiple Linear Regression

**Task 👨🏻‍💻**: Use `sklearn` to train a Linear Regression model on the `education`, `seniority` and `income` columns
- Just instantiating the model, and fitting the data is enough

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Plot the trained model (surface) on top of the scatter plot
* You can choose to use `Plotly` or `matplotlib` for this
<details>
  <summary>Graph would look something like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/income-multiple-linear-regression.png" />
</details>

> 🚩 : Make a git commit here

**✨ Extra Credit Task 👨🏻‍💻**: For 3 Points: Plot the trained model (surface) on top of the scatterplot using the other package
* If you used `Plotly`, use `matplotlib` and vice versa

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Use the `Predict` method to predict the income of a person with `15` years of education, and seniority score of `75`


_Hint:_ it should be `51.31186086`

> 🚩 : Make a git commit here

### Polynomial Regression

**Task 👨🏻‍💻**: Use `sklearn` to train a Polynomial Regression model (of the second degree) on the `education` and `income` columns
- Just instantiating the model, and fitting the data is enough
- You'll need to use the `PolynomialFeatures` class to transform the data into a polynomial form

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Plot the trained model (curve) on top of the scatter plot
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/edu-income-polynomial-model.png" />
</details>

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Use the `Predict` method to predict the income of a person with `11` years of education


_Hint:_ 
- it should be `25.48739205`
- You will need to create a numpy array of the input data, and reshape it. `np.array([11]).reshape(1, -1)`
- You will need to `fit-transform` the input data before you can `predict` it

> 🚩 : Make a git commit here

### Multiple Polynomial Regression

**Task 👨🏻‍💻**: Use `sklearn` to train a Multiple Polynomial Regression model (of the third degree) on the `education`, `seniority` and `income` columns

_Hint:_
- Just instantiating the model, and fitting the data is enough
- You'll need to use the `PolynomialFeatures` class to transform the data into a polynomial form

> 🚩 : Make a git commit here

**Task 👨🏻‍💻**: Plot the trained model (curved surface) on top of the scatter plot
* You can choose to use `Plotly` or `matplotlib` for this
<details>
  <summary>Graph would look like this:</summary>
  <img width="600" src="https://github.com/IT4063C/images/raw/main/regression-assignment/3rd-dec-income-surface.png" />
</details>

> 🚩 : Make a git commit here

**✨ Extra Credit Task 👨🏻‍💻**: For 3 Points: Plot the trained model (curved surface) on top of the scatterplot using the other package
* If you used `Plotly`, use `matplotlib` and vice versa

> 🚩 : Make a git commit here

## Wrap up
### 📝 Reflection
- What did you learn from this assignment?
- What was the most challenging part of this assignment?
- What would you do differently next time?

### Citations
Cite any resources you used to complete this assignment

This includes: 
- Individuals other than the instructor
- Websites
- Videos
- AI assistants such as GitHub Copilot or ChatGPT

#### 🦉: MAKE SURE YOU RUN THE FOLLOWING CELL BEFORE SUBMITTING
The following command converts this Jupyter notebook to a Python script. This allows me to provide feedback on your code.

In [10]:
!jupyter nbconvert --to python regression-notebook.ipynb

[NbConvertApp] Converting notebook regression-notebook.ipynb to python
[NbConvertApp] Writing 9078 bytes to regression-notebook.py


> 🚩 : Make a git commit here