> **License Notice**
> 
> This notebook is part of the *Computational Economics and Data Science* course.
> 
> - Code cells are released under the MIT License. You are free to use and adapt the code with attribution.
> - Text, figures, and other non-code content are released under the Creative Commons Attribution 4.0 International License.
> 
> Please retain this notice and credit Amirreza "Farnam" Taheri when sharing adaptations.

In [None]:
# === Environment Setup ===
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, linalg
from scipy.optimize import fsolve
import seaborn as sns
from IPython.display import Markdown, display

# --- Custom Display Functions ---
def theorem(title, statement):
    display(Markdown(f"""
    <div style='background-color:#e7f3e7; padding:15px; border-left:5px solid #4CAF50'>
    <strong>Theorem ({title}):</strong><br/>
    {statement}
    </div>
    """))

def example(title, content):
    display(Markdown(f"""
    <div style='background-color:#e3f2fd; padding:15px; border-left:5px solid #2196F3'>
    <strong>Example ({title}):</strong><br/>
    {content}
    </div>
    """))

def econ_app(title, content):
    display(Markdown(f"""
    <div style='background-color:#fff3e0; padding:15px; border-left:5px solid #FF9800'>
    <strong>💼 Economic Application ({title}):</strong><br/>
    {content}
    </div>
    """))

def course_connection(content):
    display(Markdown(f"""*📚 **Course Connection:** {content}*"""))

# --- Plotting Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 14, 'figure.figsize': (10, 6), 'figure.dpi': 150})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# Appendix A2: Multivariate Calculus

---
## Table of Contents

- 2.1 Partial Derivatives and Differentiability
- 2.2 The Jacobian Matrix
- 2.3 The Hessian Matrix
- 2.4 Implicit Function Theorem
- 2.5 Taylor Expansion in Multiple Dimensions
- 2.6 Optimization Review


**Purpose**: This section reviews the essential tools from multivariate calculus. These concepts are the language of optimization and sensitivity analysis in multiple dimensions. We will cover how to measure rates of change for functions of several variables (gradients, Jacobians), how to characterize their curvature (Hessians), and how to analyze the response of a model's equilibrium to changes in its parameters (the Implicit Function Theorem and the Envelope Theorem).

## 2.1 Partial Derivatives and Differentiability

### 2.1.1 Partial Derivatives

The **partial derivative** of a function of several variables is its derivative with respect to one of those variables, with the others held constant. It measures the rate of change of the function along a single axis.
$$ \frac{\partial f}{\partial x_i}(x_1, ..., x_n) = \lim_{h \to 0} \frac{f(x_1, ..., x_i+h, ..., x_n) - f(x_1, ..., x_n)}{h} $$

In [None]:
econ_app("Marginal Utility", "The partial derivative is the mathematical formalization of the ceteris paribus ('all else equal') condition that is fundamental to economic analysis. For a utility function $U(x_1, x_2)$, the partial derivative $\\frac{\partial U}{\partial x_1}$ represents the marginal utility of good 1—the additional utility gained from an infinitesimal increase in the consumption of good 1, holding the consumption of good 2 constant.")

### 2.1.2 Total Differentiability

While partial derivatives consider changes along the axes, **total differentiability** considers changes in all directions. A function is differentiable at a point if it can be well-approximated by a linear function (a tangent plane) at that point. The error of this linear approximation must go to zero faster than the distance from the point.

A sufficient condition for differentiability, which is almost always used in practice, is that if all partial derivatives of $f$ exist and are continuous in a neighborhood of a point, then $f$ is differentiable at that point.

### 2.1.3 The Gradient Vector

The **gradient** of a scalar-valued function $f: \mathbb{R}^n \\to \mathbb{R}$ is the vector of its partial derivatives:
$$ \nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\\\vdots \\\\frac{\partial f}{\partial x_n} \end{bmatrix} $$
The gradient has a crucial geometric interpretation: it always points in the direction of the **steepest ascent** of the function at a given point. If you imagine standing on a hillside, the gradient vector is the arrow on the ground that points directly uphill. Its magnitude, $\|\nabla f(x)\|$, represents the steepness of the slope in that direction. Conversely, the negative of the gradient, $-\nabla f(x)$, points in the direction of steepest *descent*, an observation that forms the basis for gradient descent optimization algorithms.

The image below visualizes the gradient of the function $f(x,y) = x^2 + 2y^2$. The left panel shows the 3D surface, illustrating the function's bowl shape. The right panel shows the function's level curves (contours) and its gradient field. Each arrow in the gradient field points in the direction of steepest ascent and is orthogonal to the level curve at that point. The color and length of the arrows indicate the magnitude of the gradient, which is larger for steeper parts of the surface.

![Gradient Field Visualization](../images/png/gradient_field.png)
*<center>This image was programmatically generated for the course.</center>*

In [None]:
course_connection("The gradient is the core component of gradient-based optimization algorithms (Chapter 2.5). All of microeconomic optimization relies on setting gradients to zero (Chapter 5).")

## 2.2 The Jacobian Matrix

### 2.2.1 Definition and Interpretation

**Historical Note:** The Jacobian matrix is named after the German mathematician **Carl Gustav Jacob Jacobi** (1804-1851), who made fundamental contributions to elliptic functions, differential equations, and number theory. The Hessian matrix is named after another German mathematician, **Ludwig Otto Hesse** (1811-1874), who worked on algebraic geometry and invariant theory.

The **Jacobian matrix** is the multivariate generalization of the derivative. While the gradient is a vector of partials for a scalar-valued function, the Jacobian is a matrix of partials for a vector-valued function. For a function $F: \mathbb{R}^n \\to \mathbb{R}^m$, the Jacobian $J_F(x)$ is the $m \times n$ matrix of all first-order partial derivatives. Each row of the Jacobian is the gradient of one of the output components of $F$.
$$ J_F(x) = \begin{bmatrix} \frac{\partial F_1}{\partial x_1} & \cdots & \frac{\partial F_1}{\partial x_n} \\\\vdots & \ddots & \vdots \\\\frac{\partial F_m}{\partial x_1} & \cdots & \frac{\partial F_m}{\partial x_n} \end{bmatrix} = \begin{bmatrix} - & \nabla F_1(x)^T & - \\& \vdots & \\- & \nabla F_m(x)^T & - \end{bmatrix} $$**Geometric Intuition**: If the derivative of a single-variable function tells you the slope of the tangent line, the Jacobian matrix gives you the best **linear transformation** that approximates the vector function at a given point. It describes how a small change in the input vector $x$ leads to a change in the output vector $F(x)$.

### 2.2.2 The Chain Rule for Jacobians

In [None]:
theorem("The Chain Rule", "If $g: \mathbb{R}^n \\to \mathbb{R}^m$ is differentiable at $x$ and $f: \mathbb{R}^m \\to \mathbb{R}^p$ is differentiable at $g(x)$, then the composite function $h = f \circ g$ is differentiable at $x$, and its Jacobian is the product of the Jacobians: $J_h(x) = J_f(g(x)) \cdot J_g(x)$.")

In [None]:
econ_app("Comparative Statics", "The Jacobian is the workhorse of comparative statics. If we have a system of equilibrium equations $F(x, \alpha) = 0$, where $x$ are endogenous variables and $\alpha$ are exogenous parameters, the Jacobian is used with the Implicit Function Theorem to find how the equilibrium variables change when the parameters change: $\\frac{\partial x}{\partial \alpha} = -[J_x(F)]^{-1} J_\alpha(F)$."),
course_connection("Used in Newton's method for systems of non-linear equations (Chapter 2.4) and for analyzing general equilibrium models (Chapter 5.3).")

## 2.3 The Hessian Matrix

### 2.3.1 Definition and Symmetry

The **Hessian matrix** of a scalar-valued function $f: \mathbb{R}^n \\to \mathbb{R}$ is the $n \times n$ matrix of its second-order partial derivatives. It is the multivariate analogue of the second derivative.
$$ H_f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\\\vdots & \vdots & \ddots & \vdots \\\\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} $$**Geometric Intuition**: The Hessian matrix describes the **local curvature** of the function. Just as the second derivative tells you if a single-variable function is concave or convex, the Hessian's properties (its definiteness) tell you whether a multivariate function looks like a 'bowl' (local minimum), a 'hill' (local maximum), or a 'saddle' at a critical point. The diagonal elements relate to the direct curvature along each axis, while the off-diagonal elements describe how the curvature changes as you move across different directions.

In [None]:
theorem("Clairaut's Theorem on Symmetry", "If the second partial derivatives of $f$ are continuous in a neighborhood of a point, then the order of differentiation does not matter, i.e., $\\frac{\partial^2 f}{\partial x_i \partial x_j} = \\frac{\partial^2 f}{\partial x_j \partial x_i}$. This implies that the Hessian matrix is symmetric.")

### 2.3.2 Definiteness and Optimization

The definiteness of the Hessian at a critical point (where $\nabla f = 0$) determines the nature of that point. For a symmetric matrix like the Hessian:
- **Positive definite** ($x^T H x > 0$ for all $x \ne 0$; all eigenvalues $> 0$): The point is a strict local minimum.
- **Negative definite** ($x^T H x < 0$ for all $x \ne 0$; all eigenvalues $< 0$): The point is a strict local maximum.
- **Indefinite** (eigenvalues have mixed signs): The point is a saddle point.
- **Positive/Negative semidefinite** (some eigenvalues are zero): The test is inconclusive.

In [None]:
econ_app("Second-Order Conditions in Optimization", "In consumer or producer theory, after finding a point where the first-order conditions are met, we check the second-order conditions using the Hessian to ensure it's a maximum (for utility/profit) or minimum (for cost). For utility, this corresponds to diminishing marginal rates of substitution (convex indifference curves). For profit, it corresponds to diminishing marginal returns."),
course_connection("The Hessian is the key to the second-order conditions for optimization (Chapter 2.5) and is used in Newton's method.")

## 2.4 Implicit Function Theorem

### 2.4.1 Statement and Intuition

In [None]:
theorem("Implicit Function Theorem (IFT)", "Let $F: \mathbb{R}^n \\times \mathbb{R}^m \\to \mathbb{R}^m$ be a continuously differentiable function. If at a point $(x_0, y_0)$ we have $F(x_0, y_0) = 0$ and the Jacobian matrix of $F$ with respect to $y$, $J_y F(x_0, y_0)$, is invertible, then there exists a local function $y = \phi(x)$ around $x_0$ such that $F(x, \phi(x)) = 0$. Furthermore, the Jacobian of this implicit function is given by: $J_x \phi(x) = -[J_y F(x,y)]^{-1} J_x F(x,y)$.")

**Motivation**: The IFT is the theoretical engine behind **comparative statics**, a cornerstone of economic analysis. Economic models are often complex systems of equations where we cannot explicitly solve for the endogenous variables (e.g., prices, quantities) as functions of the model's exogenous parameters (e.g., tax rates, technology shocks). The IFT provides the conditions under which such a function *implicitly* exists in the neighborhood of an equilibrium. More importantly, it gives us a direct formula for the derivative of this implicit function. This allows us to answer questions like, 'By how much will the equilibrium price change if we increase a tax by a small amount?' without ever needing to solve for the equilibrium price function itself. It is a powerful tool for analyzing the sensitivity of a model's predictions to its assumptions.

In [None]:
econ_app("Comparative Statics in General Equilibrium", "In a general equilibrium model, we have a system of excess demand equations, $Z(p, \alpha) = 0$, where $p$ is the vector of prices and $\alpha$ is a vector of parameters (e.g., tastes, technology). We can't solve for the equilibrium price vector $p^*(\alpha)$ explicitly. The IFT allows us to calculate $\\frac{\partial p^*}{\partial \alpha}$, showing how equilibrium prices respond to a change in the underlying economic environment."),
course_connection("Fundamental to all comparative statics exercises in Chapter 5. It is the theoretical justification for why we can analyze the response of economic models to parameter changes.")

The image below shows a classic comparative statics exercise. The Implicit Function Theorem provides the theoretical justification for this type of analysis. It allows us to understand how an endogenous variable (like the equilibrium price, $P^*$) changes when an exogenous parameter (like a demand shifter, which moves the demand curve from D1 to D2) changes, even without an explicit formula for $P^*$.

![Comparative Statics](../images/png/comparative_statics.png)
*<center>This image was programmatically generated for the course.</center>*

## 2.5 Taylor Expansion in Multiple Dimensions

### 2.5.1 First-Order (Linear) and Second-Order (Quadratic) Approximations

The Taylor expansion provides a polynomial approximation of a function around a point. For a function $f: \mathbb{R}^n \\to \mathbb{R}$ around a point $a$:

**First-Order (Linear) Approximation:**
$$ f(x) \approx f(a) + \\nabla f(a)^T(x - a) $$

**Second-Order (Quadratic) Approximation:**
$$ f(x) \approx f(a) + \\nabla f(a)^T(x - a) + \\frac{1}{2}(x-a)^T H_f(a) (x-a) $$

In [None]:
econ_app("Log-Linearization and Welfare Analysis", "The first-order Taylor expansion is the basis for log-linearization, a technique used to approximate non-linear DSGE models as linear systems around their steady state. The second-order expansion is used in welfare analysis to approximate the welfare effects of policies, as the first-order terms often cancel out."),
course_connection("Crucial for log-linearization of RBC models (Chapter 4.3) and the Delta method for variance estimation in econometrics (Chapter 6.1).")

## 2.6 Optimization Review

### 2.6.1 Unconstrained and Constrained Optimization (KKT Conditions)

Optimization is the process of finding the best solution from a set of feasible solutions. The conditions for optimality are a cornerstone of economic theory.

**Unconstrained Optimization:**
- **First-Order Necessary Condition (FOC):** $\nabla f(x^*) = 0$.
- **Second-Order Sufficient Condition (SOC):** $H_f(x^*)$ is positive definite for a minimum, or negative definite for a maximum.

**Constrained Optimization (Karush-Kuhn-Tucker Conditions):**
For a problem $\max_x f(x)$ subject to $g(x) \le 0$, the KKT conditions state that at an optimum, the gradient of the objective function must be a linear combination of the gradients of the binding constraints. This is formalized using a **Lagrangian**: $\mathcal{L}(x, \lambda) = f(x) - \lambda^T g(x)$.

### 2.6.2 Second-Order Conditions with the Bordered Hessian

**Motivation**: For constrained problems, checking the second-order conditions is more subtle than in the unconstrained case. We don't care if the objective function's curvature is positive or negative in *every* direction, only in those directions that are **feasible**—that is, directions that do not violate the constraints. A utility function might be decreasing in a direction that is unaffordable, but this is irrelevant for the consumer's optimal choice.

The standard Hessian analyzes curvature in all directions, which is not what we want. We need a tool that restricts the analysis to the curvature of the objective function *along the constraint surface*. This tool is the **Bordered Hessian**. It is the Hessian of the Lagrangian function, $\mathcal{L}(x, \lambda)$, which is then augmented ('bordered') with the gradients of the constraints. 

For a problem with $n$ variables and $m$ equality constraints ($g(x)=c$), the Bordered Hessian is an $(n+m) \times (n+m)$ matrix:
$$ H_B = \begin{bmatrix} \mathbf{0}_{m \times m} & J_g(x) \\ J_g(x)^T & H_{\mathcal{L}}(x, \lambda) \end{bmatrix} = \begin{bmatrix} \mathbf{0} & \nabla g_1(x)^T \\ & \vdots \\ & \nabla g_m(x)^T \\ \nabla g_1(x) \cdots \nabla g_m(x) & H_f(x) - \sum \lambda_i H_{g_i}(x) \end{bmatrix} $$The conditions for a local maximum or minimum are determined by the signs of the determinants of specific submatrices (the leading principal minors) of this matrix. The key intuition is that the 'border' of constraint gradients effectively forces the test to consider only directions that are tangent to the constraint set, providing the correct second-order condition for constrained problems.

In [None]:
econ_app("Quasiconcavity and Utility Maximization", "In consumer theory, we maximize a utility function $u(x)$ subject to a budget constraint $p \cdot x = w$. The first-order conditions give the tangency point between the indifference curve and the budget line. The second-order condition, checked with the Bordered Hessian, ensures that this point is a maximum. This condition is satisfied if the utility function is 'quasiconcave', which is a weaker condition than concavity and corresponds to the economic assumption of a diminishing marginal rate of substitution (i.e., convex indifference curves)."),
course_connection("Necessary for verifying solutions in consumer and producer theory in Chapter 5.1.")

### 2.6.3 Envelope Theorem

In [None]:
theorem("Envelope Theorem", "Let $V(\alpha) = \max_x f(x, \alpha)$ be the value function of an optimization problem with parameter $\alpha$. The derivative of the value function with respect to the parameter is equal to the partial derivative of the objective function with respect to the parameter, evaluated at the optimal choice $x^*(\alpha)$. Formally, $\\frac{dV}{d\alpha} = \\frac{\partial f}{\partial \alpha} |_{x=x^*(\alpha)}$.")

In [None]:
econ_app("Roy's Identity and Shephard's Lemma", "The Envelope Theorem is a powerful tool for comparative statics.<br>- **Roy's Identity** uses it to derive the Marshallian demand function from the indirect utility function.<br>- **Shephard's Lemma** uses it to derive the Hicksian demand function from the expenditure function."),
course_connection("The Envelope Theorem is the theoretical basis for the results in Chapter 5.1 (Consumer and Producer Theory).")