# List Full Names of all the participants in your team below:

1. Gaurav Toravane
2. Gurleen Kaur
3. Vinci Wu
4. Joseph McCart
5. Shuoling Li
6. Gowtham Rajasekaran
7. Mayank Lara
8. 
9. 
10.
11. 
12. 

Hello Machine Learning Engineer Ganden Team, 

You have been given a data which is obtained from **Forest Fires** in California. 

Number of Instances: 517 <br>
Number of Attributes: 11 (including the target variable `y`)

Attribute Information: 
  * **y** area - the burned area of the forest in Calfornia(in ha): 0.00 to 1090.84
  * **f1** X - x-axis spatial coordinate within the California State Park: 1 to 9
  * **f2** Y - y-axis spatial coordinate within the California State Park: 2 to 9
  * **f3** FFMC - FFMC index from the FWI system: 18.7 to 96.20
  * **f4** DMC - DMC index from the FWI system: 1.1 to 291.3
  * **f5** DC - DC index from the FWI system: 7.9 to 860.6
  * **f6** ISI - ISI index from the FWI system: 0.0 to 56.10
  * **f7** temp - temperature in Celsius degrees: 2.2 to 33.30
  * **f8** RH - relative humidity in %: 15.0 to 100
  * **f9** wind - wind speed in km/h: 0.40 to 9.40
  * **f10** rain - outside rain in mm/m2 : 0.0 to 6.4

There are no missing Attribute Values.

Your task is to implement a **Gaussian Radial Basis Function based Linear Regression Algorithm** to predict the area burned during the Forest Fires in California.


# Closed Form Solution with Basis Functions
The **genesis equation** for Linear Regression with Gaussian Basis Function is of the form:

$y(x,w) = \phi(x).W$  

* $y(x,w)$ is predicted output,
* $\phi(x)$ is the Design Matrix
* $W = (w_{1}, ... w_{M})$ are the parameters to be learned from training samples

### Design Matrix
Each gaussian radial basis function $\phi_{j}$ converts the input instance to a value as shown below: <br>

$\phi_{j}(x) = \exp(-\frac{1}{2}(x - \mu_{j})^{T}\sum_{j}^{-1}(x - \mu_{j}))$

* $x$ is the input scaled dataset <br>
* $\mu_{j}$ is the center of the $j_{th}$ Guassian Radial Basis Function <br>
* $\sum_{j}$ decides how braodly the $j_{th}$ basis function spreads (Diagonal Covariance Matrix)

Repeated application of $j$ basis functions results in a Design Matrix as shown below:
![!picture](https://drive.google.com/uc?export=view&id=1j1kxv6nUPPECacd-_bDg_lL1yTJS5BwA)

For finding parameters $W$ for the above genesis using the **closed form solution** we pre-multiply by $\phi^{-1}(x)$ on LHS and RHS. We get,

$W = \phi^{-1}(x)Y$

But $\phi(x)$ is NOT A SQUARE MATRIX of FULL RANK! Hence, $\phi^{-1}(x)$ is intractable.

We therefore use the Moore-Penrose pseudo inverse as a generalization of the matrix inverse when the matrix may not be invertible. Hence, the final closed form solution for finding parameters $W$ with linear regression least squares solution is as follows:

$W = (\phi^{T}\phi)^{-1}\phi^{T}Y$

YOU NEED TO IMPLEMENT ABOVE EQUATION for finding $W$. 

<font color="red"> YOU CANNOT USE NUMPY linalg **pinv** https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html </font>

<font color="red">DO NOT USE SKLEARNS LINEAR REGRESSION LIBRARY DIRECTLY.</font>

<font color="green">YOU CAN USE np.linalg.inv, and np.dot FOR IMPLEMENTING PSEUDO-INVERSE</font>

### **Question:** In the following code cell implement the following:
* Step 1: Import the dataset (forestfires.csv) using Pandas Dataframe (Step 1 Implemented already)
* Step 2: Partition your dataset into training testing and validation using sklearns train_test_split library and split the features and target labels into seperate variables (Step 2 Implemented already)
* Step 3: Scale the features using sklearns min max scaling function (Step 3 Implemented already)
* Step 4: Convert Scaled Features and Labels into numpy arrays with dimensions required by closed form solution (Step 4 Implemented already)
* Step 5: Find the Mean ($\mu_{j}$) and Spread ($\sum_{j}$) for **3 basis functions** (Step 5 Implemented Already)
* Step 6: Create a Design Matrix using the scaled features, Mean ($\mu_{j}$) and Spread ($\sum_{j}$)
* Step 7: Train using Linear Regression algorithm with a Closed Form Solution **Hint: Use Pseudo Inverse Formula**
* Step 8: Test using Testing Dataset (Make sure you create a design matrix for Testing dataset using same Mean ($\mu_{j}$) and Spread ($\sum_{j}$) from Step 5)
* Step 9: Calculate Root Mean Squared Error (Erms) for Test Dataset
    * $Erms = \sqrt{\frac{1}{n}\sum_{i=0}^{i=n} (y\_test_{i} - y\_test\_pred_{i})^{2}}$ 

In [1]:
# Step 1 already implemented
import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/Mihir2/BreakoutSessionDataset/master/forestfires.csv"
s = requests.get(url).content
data = pd.read_csv(io.StringIO(s.decode('utf-8')))
data

# Step 2 already implemented
import numpy as np
from sklearn.model_selection import train_test_split
output = data['y']
input = data.to_numpy()[:,1:]
x_train, x_test, y_train, y_test = train_test_split(input, output, test_size = 0.2)

# Step 3 already implemented
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
sc_xtrain = scaler.fit_transform(x_train)
sc_xtest = scaler.transform(x_test)

# Step 4 already implemented
y_train_arr = y_train.to_numpy().reshape(y_train.shape[0],1)
x_train_arr = sc_xtrain
y_test_arr  = y_test.to_numpy().reshape(y_test.shape[0],1)
x_test_arr  = sc_xtest

# Step 5 already implemented
from  sklearn.cluster import MiniBatchKMeans
number_of_basis_function = 3
model = MiniBatchKMeans(n_clusters=number_of_basis_function)
distances = model.fit_transform(x_train_arr)
basis_means = model.cluster_centers_
basis_variances = np.zeros(number_of_basis_function)
i = 0
for label in model.labels_:
  basis_variances[label] = basis_variances[label] + (distances[i][label]**2)
  i = i + 1
for j in range(0,number_of_basis_function):
  basis_variances[j] = basis_variances[j]/np.count_nonzero(model.labels_ == j)
basis_variances = np.diag(basis_variances)
# print(basis_means)
# print(basis_variances)

## TA Response:

In [4]:
# Step 6
x_mu = np.zeros((number_of_basis_function,x_train_arr.shape[0]))
for i in range(0,number_of_basis_function):
  x_mu[i] = np.sum((x_train_arr - basis_means[i]),axis=1)

train_design_mat = np.exp(-0.5*np.multiply(np.dot(x_mu.T,np.linalg.inv(basis_variances)),x_mu.T))

# Step 7 
weights = np.dot(np.dot(np.linalg.inv(np.dot(train_design_mat.T,train_design_mat)),train_design_mat.T),y_train_arr)

# Step 8
x_mu = np.zeros((number_of_basis_function,x_test_arr.shape[0]))
for i in range(0,number_of_basis_function):
  x_mu[i] = np.sum((x_test_arr - basis_means[i]),axis=1)

test_design_mat = np.exp(-0.5*np.multiply(np.dot(x_mu.T,np.linalg.inv(basis_variances)),x_mu.T))
y_test_pred = np.dot(test_design_mat, weights)

#Step 9
Erms = np.sqrt(np.sum((y_test_pred - y_test_arr)**2)/y_test_arr.shape[0])
print(Erms)

30.385902712331177


## Student Response:

In [2]:
# Step 6 
inv = np.linalg.inv(basis_variances)
design_matrix = np.zeros((413,3))
# print(inv.shape)
for i in range(number_of_basis_function):
 #  design_matrix[:,i] = np.exp((-0.5) * (x_train_arr - basis_means[i,:]) * (inv[i,i]))
  design_matrix[:,i] = np.exp(-0.5*np.sqrt(np.sum((x_train_arr - basis_means[i,:])**2, axis= 1))*inv[i,i])
# design_matrix.shape
# design_matrix[1:10]

design_matrix_test = np.zeros(( x_test_arr.shape[0],number_of_basis_function))
for i in range(number_of_basis_function):
  # design_matrix[:,i] = np.exp((-0.5) * (x_train_arr - basis_means[i,:]) * (inv[i,i]))
  design_matrix_test[:,i] = np.exp(-0.5*np.sqrt(np.sum((x_test_arr - basis_means[i,:])**2, axis= 1))*inv[i,i])

multiplied = np.dot(design_matrix.T,design_matrix)
W = np.dot(np.linalg.inv(multiplied), design_matrix.T)
W = np.dot(W,y_train_arr)

# Step 7 
# print(W.shape)
# print(design_matrix_test.shape)

# Step 8 
y_preds = np.dot(design_matrix_test, W)

# Step 9 
from sklearn.metrics import mean_squared_error

rmse = np.sqrt(mean_squared_error(y_test_arr,y_preds))
print("rmse :{}".format(rmse))

rmse :30.340528822983973
