### Day 9 - Multiple linear regression
________________________________________________
  
  <br/>

- [Background](#Background)
- [Task](#Task)

  <br/>
  
#### Background 

Multiple linear regression equation can be written as:

$$
y = a + b_1X_1 + b_2X_2 + ... + b_mX_m
$$

Considering regression equation

$$
y = a + b_1X_1 + b_2X_2
$$

we can rewrite it as:

$$
\begin{bmatrix} 1 X_1 X_2 \end{bmatrix} \times \begin{bmatrix} a \\ b_1 \\ b_2 \end{bmatrix} = a + b_1X_1 + b_2X_2
$$

Knowing X and y, we can find matrix B:

$$
\begin{eqnarray}
Y & =  X \times B \\
\Rightarrow &  X^T \times Y = X^T \times X \times B \\
\Rightarrow &  (X^T \times X)^{-1} \times X^T \times Y = I \times B \\
\Rightarrow &  B = (X^T \times X)^{-1} \times X^T \times Y
\end{eqnarray}
$$

[Full tutorial link on HackerRank](https://www.hackerrank.com/challenges/s10-multiple-linear-regression/tutorial)


#### Task

Andrea has a simple equation:

$$
Y = a + b_1f_1 + b_2f_2 + ... + b_mF_m
$$

for $ (m+1) $ real constants $ (a,f_1 ,f_2 ,... ,f_m )$. We can say that the value of $Y$ depends on $m$ features. Andrea studies this equation for $ n $ different feature sets $ (f_1, f_2, ..., f_m) $ and records each respective value of $Y$. If she has $ q $ new feature sets, can you help Andrea find the value of $ Y $ for each of the sets?

Note: You are not expected to account for bias and variance trade-offs.

##### Input Format

The first line contains $ 2 $ space-separated integers,$ m $  (the number of observed features) and $ n $ (the number of feature sets Andrea studied), respectively.
Each of the $ n $ subsequent lines contain $ m + 1 $ space-separated decimals; the first $ m $ elements are features $ (f_1, f_2, ..., f_m) $, and the last element is the value of $ Y $ for the line's feature set.
The next line contains a single integer, $ q $, denoting the number of feature sets Andrea wants to query for.
Each of the $ q $ subsequent lines contains $ m $ space-separated decimals describing the feature sets.

##### Constraints
- $ 1 <= m <= 10 $
- $ 5 <= n <= 100 $
- $ 0 <= x_i <= 1 $
- $ 0 <= Y <= 10^6 $
- $ 1 <= q <= 100 $

##### Output Format

For each of the $ q $  feature sets, print the value of $ Y $ on a new line (i.e., you must print a total of $ q $ lines).

In [11]:
import numpy as np

features = []
features_test = []
Y = []

# m,n = (2,7)
# features = [[1.0, 0.18, 0.89], [1.0, 1.0, 0.26], [1.0, 0.92, 0.11], [1.0, 0.07, 0.37], [1.0, 0.85, 0.16], [1.0, 0.99, 0.41], [1.0, 0.87, 0.47]]
# Y = [109.85, 155.72, 137.66, 76.17, 139.75, 162.6, 151.77]
# q = 4
# features_test = [[1.0, 0.49, 0.18], [1.0, 0.57, 0.83], [1.0, 0.56, 0.64], [1.0, 0.76, 0.18]]
m,n = map(int,input().split())
for _ in range(n):
    *f, y = map(float,input().split())
    features.append([1.0] + list(f))
    Y.append(y)

q = int(input())
for _ in range(q):
    features_test.append([1.0] + list(map(float,input().split())))
    
features = np.array(features)
features_test = np.array(features_test)
Y = np.array(Y)
features_T = features.transpose()

B = np.dot(np.dot(np.linalg.inv(np.dot(features_T,features)),features_T),Y)

for f in features_test:
    print(np.dot(np.array(f),B))

2 7
0.18 0.89 109.85
1.0 0.26 155.72
0.92 0.11 137.66
0.07 0.37 76.17
0.85 0.16 139.75
0.99 0.41 162.6
0.87 0.47 151.77
4
0.49 0.18
0.57 0.83
0.56 0.64
0.76 0.18
105.21455835106937
142.6709513072996
132.93605469124716
129.70175404502453
