<a href="https://colab.research.google.com/github/brugalbryan/Artificial-Intelligence/blob/main/Building%20regression%20%26%20classification%20model%20without%20sklearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Artificial Intelligence

In this project, you will build a regression model and a classification model from scratch. Please follow the instructions closely, and only use Python's Numpy, Pandas, and matplotlib library to complete this project. Using functions from `sklearn` is not allowed.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Part I: A Regression Model

In this part, please build a multilinear regression model that extracts the relationship between housing prices and other relevant variables. The training data is shown in the table below:


In [None]:
data = pd.DataFrame({
    "YearBuilt": [1974, 1996, 1968, 1962, 1960],
    "YearSold": [2015, 2017, 2020, 2010, 2016],
    "Bedrooms": [3, 10, 4, 5, 6],
    "TotalArea": [1500, 4000, 1700, 2500, 2000],
    "Quality": [7.5, 6, 4, 5.5, 5],
    "Price": [358500, 452600, 352100, 341300, 342200]
})

data

Unnamed: 0,YearBuilt,YearSold,Bedrooms,TotalArea,Quality,Price
0,1974,2015,3,1500,7.5,358500
1,1996,2017,10,4000,6.0,452600
2,1968,2020,4,1700,4.0,352100
3,1962,2010,5,2500,5.5,341300
4,1960,2016,6,2000,5.0,342200


### Task 1: Data Transformation
Create a new column named "Age" that represents the age of each house when it was sold.

In [None]:
data['Age'] = data['YearSold'] - data['YearBuilt']
data

Unnamed: 0,YearBuilt,YearSold,Bedrooms,TotalArea,Quality,Price,Age
0,1974,2015,3,1500,7.5,358500,41
1,1996,2017,10,4000,6.0,452600,21
2,1968,2020,4,1700,4.0,352100,52
3,1962,2010,5,2500,5.5,341300,48
4,1960,2016,6,2000,5.0,342200,56


### Task 2: Train a Multilinear Model

*   List item
*   List item


Assume that the price can be expressed as a linear combination of age, bedrooms, total area, and quality:

$Price = \theta_0 + \theta_1 \cdot Age + \theta_2 \cdot Bedrooms + \theta_3 \cdot TotalArea + \theta_4 \cdot Quality.$

Apply the normal equation to find the best values for the parameters:
1. Construct matrix $\textbf{X}$ and $\textbf{y}$ (the matrices are defined in Week 6 notebook and Chapter 4 of the textbook).
2. Calculate the parameter vector using the normal equation
$\theta = \big(\textbf{X}^T\cdot\textbf{X}\big)^{-1}\cdot\textbf{X}^T\cdot\textbf{y}$

In [None]:
# Construct matrix X using np.hstack(), np.ones()

# 1. Construct a column of ones
m, n = data.shape
X = np.hstack([np.ones([m, 1]), data[['Age',	'Bedrooms',	'TotalArea',	'Quality']].values])
print(X)


[[1.0e+00 4.1e+01 3.0e+00 1.5e+03 7.5e+00]
 [1.0e+00 2.1e+01 1.0e+01 4.0e+03 6.0e+00]
 [1.0e+00 5.2e+01 4.0e+00 1.7e+03 4.0e+00]
 [1.0e+00 4.8e+01 5.0e+00 2.5e+03 5.5e+00]
 [1.0e+00 5.6e+01 6.0e+00 2.0e+03 5.0e+00]]


In [None]:
# Construct vector y
y = data[['Price']].values
print(y)

[[358500]
 [452600]
 [352100]
 [341300]
 [342200]]


In [None]:
# Apply the normal equation to find theta

theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print("Theta:", theta)

Theta: [[ 5.92376387e+05]
 [-3.83925328e+03]
 [ 1.17271948e+04]
 [-3.11089808e+01]
 [-8.66468214e+03]]


### Task 3: Make A Prediction
Suppose that there is another house with the following attribute:
- YearBuilt: 1985
- YearSold: 2021
- Bedrooms: 6
- Total Area: 2500
- Quality: 5.5

Use the parameter values that you have calculated to make a prediction on its sale price.

In [None]:
# Let's use the vector form to get the prediction.
# prediction = inner-product of the parameter vector and the feature vector.
parameter_vector = np.array([5.92376387e+05,  -3.83925328e+03, 1.17271948e+04,  -3.11089808e+01, -8.66468214e+03])
feature_vector = np.array([1, 36, 6, 2500, 5.5])

prediction = parameter_vector.dot(feature_vector)

print('prediction on its sale price:', round(prediction))


prediction on its sale price: 399098
