### 02.10: Univariate quadratic is easy!

Let's go back to our squared error. It's a super ideal case for Newton's method. A quadratic function always has a single stationary point. That's because the derivative of a quadratic equation is a linear equation, and a line will cross the $x$-axis exactly once!

Since the squared error is always positive, then this point must be a global minimum. So we don't have to worry about converging to a local maximum, or a local minimum that isn't a global minimum.

In fact, if we're optimizing a univariate quadratic function, this is ridiculously simple. Note:

\\[
\begin{align}
f(x) &= a x^2 + bx + c\\
\frac{\partial f}{\partial x} &= 2a x + b\\
\frac{\partial^2 f}{\partial x^2} &= 2a\\
\end{align}
\\]

Now look at that. Newton's method in optimization says you should find the zero of $f'$. The way it does this is by approximating $f'$ with a tangent line at $x_i$ and seeing where that approximation hits zero.

But in the case of a quadratic function, the first derivative truly *is* a line. So that means that the calculated stationary point isn't an approximation: it's perfectly accurate!

Therefore, when optimizing a quadratic error function of a single variable, we will jump immediately to the global minimum on the first step!

### 02.11: Finding $\hat\theta_1$ given $\hat\theta_0$

So we've learned how to minimize a univariate quadratic function. Let's do it to calculate the best $\hat\theta_1$, assuming we already know that $\hat\theta_0 = 100.0$. Now that would be cheating in real life, but I want to show a practical example where Newton's method works.

To do this we need not only the first, but second partial derivative with respect to $\hat\theta_0$. Note:

\\[
\begin{alignat*}{3}
\frac{\partial E}{\partial \hat\theta_1} &= \sum_{i=1}^N 2 \left(
                                              \left( \hat\theta_0 + \hat\theta_1 x_i \right) - y_i
                                            \right) \big( x_i \big) &&= 0\\
\frac{\partial^2 E}{\partial \hat\theta_1^2} &= \sum_{i=1}^N 2 \big( x_i \big) \big( x_i \big) &&= 0
\end{alignat*}
\\]

In [1]:
import IPython.display
import matplotlib.animation
import time

class DifferentiableLinearModel(LinearModel):
    def __init__(self, theta0, theta1):
        super().__init__(theta0, theta1)

    def improve_theta1(self, dataset):
        first_error_deriv_wrt_theta1 = np.sum(
            2 * (dataset.y - self(dataset.x)) * (-dataset.x)
        )
        second_error_derivative_wrt_theta1 = np.sum(2 * (-dataset.x) * (-dataset.x))

        self.theta1 = self.theta1 - (first_error_deriv_wrt_theta1 / second_error_derivative_wrt_theta1)

class FindTheta0Animation:
    NUM_STEPS = 4
    SLEEP = 2000
    
    @staticmethod
    def frame(ax, dataset, generator_model, step_idx):
        ax.clear()

        x_values = np.arange(0, 100, 1.0)
        average_sse = generator_model.average_sse(dataset)

        dataset.plot(ax)
        generator_model.plot(ax, x_values, "g-")
        ax.set_title(f"Step #{step_idx} | theta0: {generator_model.theta1:0.2f} | Avg SSE: {average_sse:0.2f}")

        generator_model.improve_theta1(dataset)

    @classmethod
    def run(cls, dataset, fixed_theta0):
        figure = plt.figure()
        ax = figure.add_subplot(1, 1, 1)

        generator_model = DifferentiableLinearModel(theta0 = fixed_theta0, theta1 = 0.0)
        frame_fn = lambda step_idx: cls.frame(ax, dataset, generator_model, step_idx)
        animation = matplotlib.animation.FuncAnimation(
            figure,
            frame_fn,
            frames = cls.NUM_STEPS,
            interval = cls.SLEEP,
            init_func = lambda: None
        )
        
        plt.close(figure)

        IPython.display.display(IPython.display.HTML(animation.to_html5_video()))

FindTheta0Animation.run(DATASET, fixed_theta0 = 100.0)

NameError: name 'LinearModel' is not defined