# Predicting with `np.dot` (`@`)

1. use case for dot product
2. one's column
3. matrix dot vector

$\begin{bmatrix}
1 & 2 \\ 3 & 4\\
\end{bmatrix}
\cdot
\begin{bmatrix}
10 \\ 1 \\
\end{bmatrix}$

In [2]:
import pandas as pd
import numpy as np

houses = pd.DataFrame([[2,1,1985],
                       [3,1,1998],
                       [4,3,2005],
                       [4,2,2020]],
                      columns=["beds", "baths", "year"])
houses

Unnamed: 0,beds,baths,year
0,2,1,1985
1,3,1,1998
2,4,3,2005
3,4,2,2020


In [3]:
houses.iloc[0]

beds        2
baths       1
year     1985
Name: 0, dtype: int64

In [4]:
# take row (as Series)
# return estimated price (in thousands)
def predict_price(house):
    return ((house["beds"]*42.3) + (house["baths"]*10) + 
            (house["year"]*1.67) - 3213)

predict_price(houses.iloc[0])

196.54999999999973

In [13]:
h0 = houses.values[0:1]
h1 = houses.values[1:2]
h2 = houses.values[2:3]
h3 = houses.values[3:4]
h0

array([[   2,    1, 1985]])

In [14]:
c = np.array([42.3, 10, 1.67]).reshape(-1, 1)
c

array([[42.3 ],
       [10.  ],
       [ 1.67]])

In [15]:
print(h0 @ c - 3213)
print(h1 @ c - 3213)
print(h2 @ c - 3213)
print(h3 @ c - 3213)

[[196.55]]
[[260.56]]
[[334.55]]
[[349.6]]


In [20]:
# Matrix @ vertical vector: loops over each row in the matrix, dot products it with the coef
houses.values @ c - 3213

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

In [22]:
c = np.array([42.3, 10, 1.67, -3213]).reshape(-1, 1)
c

array([[ 4.230e+01],
       [ 1.000e+01],
       [ 1.670e+00],
       [-3.213e+03]])

In [25]:
houses.values

array([[   2,    1, 1985],
       [   3,    1, 1998],
       [   4,    3, 2005],
       [   4,    2, 2020]])

In [26]:
# size mismatch
# houses.values @ c

In [30]:
np.ones(len(X)).reshape(-1,1)

array([[1.],
       [1.],
       [1.],
       [1.]])

In [27]:
X = houses.values
X

array([[   2,    1, 1985],
       [   3,    1, 1998],
       [   4,    3, 2005],
       [   4,    2, 2020]])

In [33]:
X = np.concatenate([X, np.ones(len(X)).reshape(-1,1)], axis=1)
X

array([[2.000e+00, 1.000e+00, 1.985e+03, 1.000e+00],
       [3.000e+00, 1.000e+00, 1.998e+03, 1.000e+00],
       [4.000e+00, 3.000e+00, 2.005e+03, 1.000e+00],
       [4.000e+00, 2.000e+00, 2.020e+03, 1.000e+00]])

In [34]:
X @ c

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

In [37]:
houses["predictions"] = X @ c
houses

Unnamed: 0,beds,baths,year,predictions
0,2,1,1985,196.55
1,3,1,1998,260.56
2,4,3,2005,334.55
3,4,2,2020,349.6


# Fitting with `np.linalg.solve`

**Above:** we estimated house prices using a linear model based on the dot product as follows:

$Xc = y$

* $X$ (known) is a matrix with house features (from DataFrame)
* $c$ (known) is a vector of coefficients (our model parameters)
* $y$ (computed) are the prices

**Below:** what if X and y are know, and we want to find c?

In [45]:
houses = pd.DataFrame([[2,1,1985,196.55],
                       [3,1,1998,260.56],
                       [4,3,2005,334.55],
                       [4,2,2020,349.60]],
                      columns=["beds", "baths", "year", "price"])
houses

Unnamed: 0,beds,baths,year,price
0,2,1,1985,196.55
1,3,1,1998,260.56
2,4,3,2005,334.55
3,4,2,2020,349.6


If we assume price is linearly based on the features, with this equation:

* $beds*c_0 + baths*c_1 + year*c_2 + 1*c_3 = price$

Then we get four equations:

* $2*c_0 + 1*c_1 + 1985*c_2 + 1*c_3 = 196.55$
* $3*c_0 + 1*c_1 + 1998*c_2 + 1*c_3 = 260.56$
* $4*c_0 + 3*c_1 + 2005*c_2 + 1*c_3 = 334.55$
* $4*c_0 + 2*c_1 + 2020*c_2 + 1*c_3 = 349.60$

In [41]:
X = np.concatenate([houses.values[:, :-1], np.ones(len(houses)).reshape(-1,1)], axis=1)
X

array([[2.000e+00, 1.000e+00, 1.985e+03, 1.000e+00],
       [3.000e+00, 1.000e+00, 1.998e+03, 1.000e+00],
       [4.000e+00, 3.000e+00, 2.005e+03, 1.000e+00],
       [4.000e+00, 2.000e+00, 2.020e+03, 1.000e+00]])

In [49]:
y = houses["price"].values.reshape(-1,1)
y

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

In [50]:
c = np.linalg.solve(X, y)
c

array([[ 4.230e+01],
       [ 1.000e+01],
       [ 1.670e+00],
       [-3.213e+03]])

In [51]:
X @ c

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

# Two Perspectives on `Matrix @ vector`

$\begin{bmatrix}
4&5\\6&7\\8&9\\
\end{bmatrix}
\cdot
\begin{bmatrix}
2\\3\\
\end{bmatrix}
= ????
$

In [55]:
X = np.array([[4,5], [6,7], [8,9]])
c = np.array([2, 3]).reshape(-1,1)
X @ c

array([[23],
       [33],
       [43]])

## Row Picture

Do dot product one row at a time.

$\begin{bmatrix}
4&5\\6&7\\8&9\\
\end{bmatrix}
\cdot
\begin{bmatrix}
2\\3\\
\end{bmatrix}
=
\begin{bmatrix}
(4*2)+(5*3)\\
(6*2)+(7*3)\\
(8*2)+(9*3)\\
\end{bmatrix}
=
\begin{bmatrix}
23\\
33\\
43\\
\end{bmatrix}
$

In [62]:
def row_dot(X, c):
    c = c.reshape(-1)
    rv = []
    for row in X:
        row_total = 0
        for i in range(len(row)):
            row_total += row[i] * c[i]
        rv.append(row_total)
    return np.array(rv).reshape(-1,1)
    
row_dot(X, c)

array([[23],
       [33],
       [43]])

## Column Picture

$\begin{bmatrix}
c_0&c_1&c_2\\
\end{bmatrix}
\cdot
\begin{bmatrix}
x\\y\\z\\
\end{bmatrix}
=(c_0*x) + (c_1*y) + (c_2*z)
$

Dot product takes a **linear combination** of columns.

$\begin{bmatrix}
4&5\\6&7\\8&9\\
\end{bmatrix}
\cdot
\begin{bmatrix}
2\\3\\
\end{bmatrix}
=
\begin{bmatrix}
4\\6\\8\\
\end{bmatrix}*2
+
\begin{bmatrix}
5\\7\\9\\
\end{bmatrix}*3
=
\begin{bmatrix}
23\\
33\\
43\\
\end{bmatrix}
$

In [68]:
def col_dot(X, c):
    c = c.reshape(-1)
    rv = np.zeros(len(X)).reshape(-1,1)
    for i in range(X.shape[1]):
        col = X[:, i:i+1]
        rv += (col * c[i])
    return rv
    
col_dot(X, c)

array([[23.],
       [33.],
       [43.]])