# List Full Names of all the participants in your team below:

1. Yinxia Chen
2. Arghya Dutta  
3. Yiming Zhang
4. Nitish Dhinaharan
5. Joshua Bukaty
6. Daniel Walsh
7. Jonathan Choi
8. Faizaan Arshad
9. Steve Glenn Joseph
10. Trishla Chaurasia

Hello Machine Learning Engineer Rato Team, 

You have been given a data which is obtained from **Facebook Comment Volume Dataset**.

Number of Instances: 40949 <br>
Number of Attributes: 38 (including the target variable `y`)

Attribute Information: 

* **y**
Target Variable
Decimal
Target
The no of comments in next H hrs(H is given in Feature no 39).
* **f1**
Page Popularity/likes
Decimal Encoding
Page feature
Defines the popularity or support for the source of the document.
* **f2**
Page Checkings
Decimal Encoding
Page feature
Describes how many individuals so far visited this place. This feature is only associated with the places eg:some institution, place, theater etc.
* **f3**
Page talking about
Decimal Encoding
Page feature
Defines the daily interest of individuals towards source of the document/ Post. The people who actually come back to the page, after liking the page. This include activities such as comments, likes to a post, shares, etc by visitors to the page.
* **f4**
Page Category
Value Encoding
Page feature
Defines the category of the source of the document eg: place, institution, brand etc.
* **f5 - f29**
Derived
Decimal Encoding
Derived feature
These features are aggregated by page, by calculating min, max, average, median and standard deviation of essential features.
* **f30**
CC1
Decimal Encoding
Essential feature
The total number of comments before selected base date/time.
* **f31**
CC2
Decimal Encoding
Essential feature
The number of comments in last 24 hours, relative to base date/time.
* **f32**
CC3
Decimal Encoding
Essential feature
The number of comments in last 48 to last 24 hours relative to base date/time.
* **f33**
CC4
Decimal Encoding
Essential feature
The number of comments in the first 24 hours after the publication of post but before base date/time.
* **f34**
CC5
Decimal Encoding
Essential feature
The difference between CC2 and CC3.
* **f35**
Base time
Decimal(0-71) Encoding
Other feature
Selected time in order to simulate the scenario.
* **f36**
Post length
Decimal Encoding
Other feature
Character count in the post.
* **f37**
Post Share Count
Decimal Encoding
Other feature
This features counts the no of shares of the post, that how many peoples had shared this post on to their timeline.
* **f38**
H Local
Decimal(0-23) Encoding
Other feature
This describes the H hrs, for which we have the target variable/ comments received.

There are no missing Attribute Values.

Based on the features, your task is to build a **Gaussian Radial Basis Function based Linear Regression Model** to predict the no. of comments in next H hrs for a post.


# Closed Form Solution with Basis Functions
The **genesis equation** for Linear Regression with Gaussian Radial Basis Function is of the form:

$y(x,w) = \phi(x).W$  

* $y(x,w)$ is predicted output,
* $\phi(x)$ is the Design Matrix
* $W = (w_{1}, ... w_{M})$ are the parameters to be learned from training samples

### Design Matrix
Each gaussian radial basis function $\phi_{j}$ converts the input instance to a value as shown below: <br>

$\phi_{j}(x) = \exp(-\frac{1}{2}(x - \mu_{j})^{T}\sum_{j}^{-1}(x - \mu_{j}))$

* $x$ is the input scaled dataset <br>
* $\mu_{j}$ is the center of the $j_{th}$ Guassian Radial Basis Function <br>
* $\sum_{j}$ decides how braodly the $j_{th}$ basis function spreads (Diagonal Covariance Matrix)

Repeated application of $j$ basis functions results in a Design Matrix as shown below:
![!picture](https://drive.google.com/uc?export=view&id=1j1kxv6nUPPECacd-_bDg_lL1yTJS5BwA)

For finding parameters $W$ for the above genesis using the **closed form solution** we pre-multiply by $\phi^{-1}(x)$ on LHS and RHS. We get,

$W = \phi^{-1}(x)Y$

But $\phi(x)$ is NOT A SQUARE MATRIX of FULL RANK! Hence, $\phi^{-1}(x)$ is intractable.

We therefore use the Moore-Penrose pseudo inverse as a generalization of the matrix inverse when the matrix may not be invertible. Hence, the final closed form solution for finding parameters $W$ with linear regression least squares solution is as follows:

$W = (\phi^{T}\phi)^{-1}\phi^{T}Y$

YOU NEED TO IMPLEMENT ABOVE EQUATION for finding $W$. 

<font color="red"> YOU CANNOT USE NUMPY linalg **pinv** https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html </font>

<font color="red">DO NOT USE SKLEARNS LINEAR REGRESSION LIBRARY DIRECTLY.</font>

<font color="green">YOU CAN USE np.linalg.inv, and np.dot FOR IMPLEMENTING PSEUDO-INVERSE</font>


### **Question:** In the following code cell implement the following:
* Step 1: Import the dataset (facebook_comment.csv) using Pandas Dataframe (Step 1 Implemented already)
* Step 2: Partition your dataset into training testing and validation using sklearns train_test_split library and split the features and target labels into seperate variables (Step 2 Implemented already)
* Step 3: Scale the features using sklearns min max scaling function (Step 3 Implemented already)
* Step 4: Convert Scaled Features and Labels into numpy arrays with dimensions required by closed form solution (Step 4 Implemented already)
* Step 5: Find the Mean ($\mu_{j}$) and Spread ($\sum_{j}$) for **3 basis functions** (Step 5 Implemented Already)
* Step 6: Create a Design Matrix using the scaled features, Mean ($\mu_{j}$) and Spread ($\sum_{j}$)
* Step 7: Train using Linear Regression algorithm with a Closed Form Solution **Hint: Use Pseudo Inverse Formula**
* Step 8: Test using Testing Dataset (Make sure you create a design matrix for Testing dataset using same Mean ($\mu_{j}$) and Spread ($\sum_{j}$) from Step 5)
* Step 9: Calculate Root Mean Squared Error (Erms) for Test Dataset
    * $Erms = \sqrt{\frac{1}{n}\sum_{i=0}^{i=n} (y\_test_{i} - y\_test\_pred_{i})^{2}}$ 

In [2]:
# Step 1 already implemented
import pandas as pd
import math as mt
import io
import requests
url="https://raw.githubusercontent.com/Mihir2/BreakoutSessionDataset/master/facebook_comment.csv"
s = requests.get(url).content
data = pd.read_csv(io.StringIO(s.decode('utf-8')))

# Step 2 already implemented
import numpy as np
from sklearn.model_selection import train_test_split
output = data['y']
input = data.to_numpy()[:,1:]
x_train, x_test, y_train, y_test = train_test_split(input, output, test_size = 0.2)

# Step 3 already implemented
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
sc_xtrain = scaler.fit_transform(x_train)
sc_xtest = scaler.transform(x_test)

# Step 4 already implemented
y_train_arr = y_train.to_numpy().reshape(y_train.shape[0],1)
x_train_arr = sc_xtrain
y_test_arr  = y_test.to_numpy().reshape(y_test.shape[0],1)
x_test_arr  = sc_xtest

# Step 5 already implemented
from  sklearn.cluster import MiniBatchKMeans
number_of_basis_function = 3
model = MiniBatchKMeans(n_clusters=number_of_basis_function)
distances = model.fit_transform(x_train_arr)
basis_means = model.cluster_centers_
basis_variances = np.zeros(number_of_basis_function)
i = 0
for label in model.labels_:
  basis_variances[label] = basis_variances[label] + (distances[i][label]**2)
  i = i + 1
for j in range(0,number_of_basis_function):
  basis_variances[j] = basis_variances[j]/np.count_nonzero(model.labels_ == j)
basis_variances = np.diag(basis_variances)
print(basis_means)
print(basis_variances)

[[3.93858488e-03 1.30550752e-02 9.40420815e-03 2.43537415e-01
  2.13165966e-04 1.21627485e-01 1.39483451e-02 8.85892676e-03
  5.66622633e-02 1.11941855e-05 8.23179140e-02 6.76115628e-03
  2.17268965e-03 5.18895892e-02 0.00000000e+00 1.01161366e-01
  2.63706490e-02 6.79960048e-03 4.39585020e-02 2.47289735e-04
  1.19830756e-01 1.60209278e-02 1.02152031e-02 5.45026208e-02
  3.76002199e-01 1.61383424e-01 9.93483416e-02 1.29627171e-01
  4.47305381e-02 1.14181723e-02 9.73321113e-03 4.67716468e-03
  1.19665161e-02 4.12366430e-01 2.21732985e-01 7.89775771e-03
  5.23991553e-03 9.92852889e-01]
 [1.45106480e-03 1.47546979e-02 2.56413894e-03 2.47825741e-01
  5.84770716e-05 7.65956931e-02 7.55003502e-03 4.32052852e-03
  3.42525172e-02 0.00000000e+00 4.65775491e-02 3.44856932e-03
  9.72326038e-04 2.82960742e-02 0.00000000e+00 6.37625534e-02
  1.45313276e-02 2.92249978e-03 2.61711447e-02 6.25066739e-05
  7.47582968e-02 8.63799885e-03 5.03482577e-03 3.27189373e-02
  3.90751587e-01 1.29390713e-01 9.879

## TA Response:

In [3]:
# Step 6
x_mu = np.zeros((number_of_basis_function,x_train_arr.shape[0]))
for i in range(0,number_of_basis_function):
  x_mu[i] = np.sum((x_train_arr - basis_means[i]),axis=1)

train_design_mat = np.exp(-0.5*np.multiply(np.dot(x_mu.T,np.linalg.inv(basis_variances)),x_mu.T))

# Step 7 
weights = np.dot(np.dot(np.linalg.inv(np.dot(train_design_mat.T,train_design_mat)),train_design_mat.T),y_train_arr)

# Step 8
x_mu = np.zeros((number_of_basis_function,x_test_arr.shape[0]))
for i in range(0,number_of_basis_function):
  x_mu[i] = np.sum((x_test_arr - basis_means[i]),axis=1)

test_design_mat = np.exp(-0.5*np.multiply(np.dot(x_mu.T,np.linalg.inv(basis_variances)),x_mu.T))
y_test_pred = np.dot(test_design_mat, weights)

#Step 9
Erms = np.sqrt(np.sum((y_test_pred - y_test_arr)**2)/y_test_arr.shape[0])
print(Erms)

36.10536999637453


## Student Answer:

In [5]:
phi = np.zeros(x_train_arr.shape[0], basis_variances.shape[0])
basis_variances_inv = np.linalg.inv(basis_variances)

for i in range (x_train_arr.shape[0]):
    arrTemp = np.zeros(3)
    for j in range(3):
        x_mu_j = x_train_arr[i] - basis_means[j]
        arrTemp[j] = np.exp(-0.5*np.dot(x_mu_j.T * basis_variances_inv[j][j]))



x_mu_j_0 = x_train_arr - basis_means[0, :]
x_mu_j_1 = x_train_arr - basis_means[1, :]
x_mu_j_2 = x_train_arr - basis_means[2, :]

for i in range (x_mu_j_0.shape[0]):
  

SyntaxError: ignored

In [6]:
phiXtrain = np.zeros((x_train_arr.shape[0], basis_variances.shape[0]))
basis_variances_inv = np.linalg.inv(basis_variances)

for i in range (x_train_arr.shape[0]):
    arrTemp = np.zeros(3)
    for j in range(3):
        x_mu_j = x_train_arr[i] - basis_means[j]
        arrTemp[j] = np.exp(-0.5*np.dot((x_mu_j.T * basis_variances_inv[j][j]),x_mu_j))
    phiXtrain[i] = arrTemp
print (phiXtrain.shape)
print (phiXtrain)


phiXtest = np.zeros((x_test_arr.shape[0], basis_variances.shape[0]))
basis_variances_inv = np.linalg.inv(basis_variances)

for i in range (x_test_arr.shape[0]):
    arrTemp = np.zeros(3)
    for j in range(3):
        x_mu_j = x_test_arr[i] - basis_means[j]
        arrTemp[j] = np.exp(-0.5*np.dot((x_mu_j.T * basis_variances_inv[j][j]),x_mu_j))
    phiXtest[i] = arrTemp
print (phiXtest.shape)
print (phiXtest)


(32759, 3)
[[0.18923858 0.81560572 0.16413753]
 [0.66177892 0.05256367 0.01526459]
 [0.15123457 0.64750141 0.15310371]
 ...
 [0.18444142 0.78619537 0.4266581 ]
 [0.8273941  0.00210888 0.00624089]
 [0.1087683  0.04750381 0.12298087]]
(8190, 3)
[[0.1391015  0.56516269 0.08438949]
 [0.135703   0.09665404 0.62430003]
 [0.76260588 0.07680335 0.06988793]
 ...
 [0.20168607 0.52087704 0.80719255]
 [0.1706162  0.64614521 0.33932341]
 [0.3507905  0.91288941 0.33114387]]


In [7]:
# Step 7

# (ϕTϕ)−1ϕTY
W = np.dot(np.linalg.inv(np.dot(phiXtrain.T, phiXtrain)),np.dot(phiXtrain.T, y_train_arr))
print (W.shape)
print (W)

(3, 1)
[[23.6719944 ]
 [ 1.33523356]
 [-9.6975875 ]]


In [8]:
Y = np.dot(phiXtest,W)
print (Y.shape)
print (Y)

(8190, 1)
[[ 3.22905965]
 [-2.71278779]
 [17.47720828]
 ...
 [-2.35801629]
 [ 1.61096203]
 [ 6.31153478]]


In [9]:
rms = np.sqrt(np.mean(y_test_arr - Y)**2)
print (rms)

1.9798937247585437
