# Lecture 10 – Feature Engineering, Clustering

## DSC 40A, Fall 2021

In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

## Feature Engineering and Transformations

### Example: Amdahl's Law

In [None]:
def solve_normal_equations(X, y):
    return np.linalg.solve(X.T @ X, X.T @ y)

In [None]:
X_amdahl = np.array([[1, 1],
                     [1, 1/2],
                     [1, 1/4]])

y_amdahl = np.array([8, 4, 3])

In [None]:
solve_normal_equations(X_amdahl, y_amdahl)

### Example: Fitting models that are not linear in terms of the parameters

In [None]:
# This cell generates our dataset.
np.random.seed(28)
x_fake = np.linspace(0, 20, 50) + np.random.normal(loc=0, scale=0.5, size=50)
y_fake = 0.5*np.random.normal(loc=2, scale=0.5, size=50) * np.e**(0.2 * x_fake)

In [None]:
px.scatter(x=x_fake, y=y_fake)

As per the lecture slides, we're trying to find a prediction rule of the form

$$H(x) = w_0 e^{w_1 x}$$

We re-wrote this as

$$\log H(x) = \log w_0 + w_1 x$$

As a result, our design matrix $X$ is still 

$$X = \begin{bmatrix}1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix}$$ but our observation vector is now

$$\vec{z} = \begin{bmatrix} \log y_1 \\ \log y_2 \\ \vdots \\ \log y_n \end{bmatrix}$$

and our parameter vector is $$\vec{b} = \begin{bmatrix} b_0 \\ b_1 \end{bmatrix} = \begin{bmatrix} \log w_0 \\ w_1 \end{bmatrix}$$

In [None]:
X_trans = np.vstack([
    np.ones_like(x_fake),
    x_fake
]).T

z_trans = np.log(y_fake)

In [None]:
b_trans = solve_normal_equations(X_trans, z_trans)
b_trans

Now that we have $\vec{b}^*$, we need to solve for $\vec{w}^*$:

In [None]:
b0, b1 = b_trans

In [None]:
w0_star = np.e**b0
w1_star = b1

In [None]:
w0_star, w1_star

Cool. Let's look at a plot of the resulting prediction rule, $H(x) = 0.965 e^{0.196 x}$:

In [None]:
x_range = np.arange(0, 25)

fig = go.Figure()
fig.add_trace(go.Scatter(x = x_fake, y = y_fake, mode = 'markers', name = 'actual'))
fig.add_trace(go.Scatter(x = x_range, 
                         y = w0_star * np.e**(w1_star * x_range), 
                         name = 'exponential prediction rule', 
                         line=dict(color='red')))