### Differential geometric formulation - one constraint
Consider a manifold $M$ and two functions $f,g \in \mathcal{C}^1(M,\mathbb{R})$. We consider the constrained optimization problem

$$\begin{aligned}
& \underset{x\in M}{\text{maximize}}
& & f(x) \\
& \text{subject to}
& & g(x) = 0
\end{aligned}$$

Observe that the equation $g=0$ defines a submanifold $N$ of $M$. Hence, $f$ is critical at a point $x_0\in N$ precisely when the covariant derivative $\nabla ^N$ of $f$ at $x_0$ vanishes, i.e., $\left.d^N(\left.f\right|_N)\right|_{x_0} = 0$. This is equivalent to saying that, for every tangent vector $\nu\in T_{x_0}N$, we have $\left.df\right|_{x_0}(\nu) = 0$. Thus,

$$T_{x_0}N \leqslant \ker\left(\left.df\right|_{x_0}\right)$$

Geometric interpretation: the vectors $dg\left(\partial_{x^1}\right) , \ldots , dg\left(\partial_{x^m}\right)$ span $T_{x_0} N$, so this equivalently means that $\left\langle \left.dg\right|_{x_0}\left(\partial_{x^1}\right), \ldots , \left.dg\right|_{x_0}\left(\partial_{x^m}\right)\right\rangle \leqslant \ker\left(\left.df\right|_{x_0}\right)$.

Since $N$ is defined by $g=0$, we have 

$$T_{x_0}N = \ker(\left.dg\right|_{x_0})$$

And so,

$$\ker\left(\left.dg\right|_{x_0}\right) \leqslant \ker\left(\left.df\right|_{x_0}\right) $$
i.e., $$\forall \eta \in T_{x_0}N, \hskip 12pt dg(\eta)=0 \hskip 5pt \Rightarrow \hskip 5pt df(\eta)=0$$ 

To consolidate this expression further, we compute locally. We can express $dg$ and $df$ as

$$dg = \frac{\partial g}{\partial x^i} dx^i \hskip 48pt df = \frac{\partial f}{\partial x^i} dx^i$$

For a tangent vector $\eta = \eta ^i \partial _{x^i} \in T_{x_0}N$, we have

$$ dg(\eta) = \frac{\partial g}{\partial x^i} \eta^i = \begin{bmatrix}\frac{\partial g}{\partial x^1} & \cdots & \frac{\partial g}{\partial x^m}\end{bmatrix} \begin{bmatrix} \eta^1 \\ \vdots \\ \eta ^m \end{bmatrix} = \left\langle \left(\left.dg\right|_{x_0}\right)^\#,\eta \right\rangle_M$$
$$df(\eta) = \frac{\partial f}{\partial x^i} \eta^i = \begin{bmatrix}\frac{\partial f}{\partial x^1} & \cdots & \frac{\partial f}{\partial x^m}\end{bmatrix} \begin{bmatrix} \eta^1 \\ \vdots \\ \eta ^m \end{bmatrix} = \left\langle \left(\left.df\right|_{x_0}\right)^\#,\eta \right\rangle_M$$


Thus, the vectors $\left(dg|_{x_0}\right)^\#$ and $\left(df|_{x_0}\right)^\#$ must parallel because we are in a single dimension, i.e.,

$$ \boxed{ dg_{x_0} \wedge df_{x_0} =0 \in \bigwedge ^2 T^*_{x_0}M }$$

Note that the classical Lagrange multiplier $\lambda$ is implicit in this condition, because the parallel condition at $x_0$ exactly means $ \exists \lambda \mbox{ s.t. }\lambda \cdot \nabla g_{x_0} =  \nabla f_{x_0} $.

### Differential geometric formulation - multiple constraints
Let now $f,g_\ell \in \mathcal{C}^1(M,\mathbb{R} )$ for $\ell = 1, \ldots , p$. Define the function $G:M \rightarrow \mathbb{R}^p$ component-wise via the $g_\ell$. Cnsider the multiple-constraint optimization problem

$$\begin{aligned}
& \underset{x\in M}{\text{maximize}}
& & f(x) \\
& \text{subject to}
& & G(x) = 0
\end{aligned}$$

As before, $G = 0$ defines a submanifold $N$ of $M$, and at a stationary point $x_0\in N$ of $f$, we have $\left.d^N(f|_N)\right|_{x_0}=0$. Thus,

$$ \ker(DG) \leqslant \ker\left(\left.df\right|_{x_0}\right)$$

For a tangent vector $\eta = \eta ^i \partial _{x^i} \in T_{x_0}N$, we have

$$ DG(\eta) = \left(\frac{\partial g_i}{\partial x^j} \partial_{x^i} \otimes dx^j\right) \left(\eta^j \partial_{x^j}\right) = \begin{bmatrix}
\frac{\partial g_1}{\partial x^1} & \cdots & \frac{\partial g_1}{\partial x^m} \\
\vdots & \ddots & \vdots \\
\frac{\partial g_p}{\partial x^1} & \cdots & \frac{\partial g_p}{\partial x^m}
\end{bmatrix} 
\begin{bmatrix} \eta^1 \\ \vdots \\ \eta ^m \end{bmatrix}\in T_{G(x_0)}\mathbb{R}^p$$
$$df(\eta) = \frac{\partial f}{\partial x^i} \eta^i = \begin{bmatrix}\frac{\partial f}{\partial x^1} & \cdots & \frac{\partial f}{\partial x^m}\end{bmatrix} \begin{bmatrix} \eta^1 \\ \vdots \\ \eta ^m \end{bmatrix} = \left\langle \left(\left.df\right|_{x_0}\right)^\#,\eta \right\rangle_M \in T_{f(x_0)}\mathbb{R} \cong \mathbb{R}$$

Note that $DG(\eta)$ has two representations:

$$\begin{align}
DG(\eta)
&= \left\langle \left(\left.dg_1\right|_{x_0}\right)^\#,\eta \right\rangle_M \partial_{x^1}
+ \cdots + 
\left\langle \left(\left.dg_p\right|_{x_0}\right)^\#,\eta \right\rangle_M \partial_{x^p} \\
&=
\eta^1 \left.DG\right|_{x_0} \left(\partial_{x^1}\right)
+ \cdots + 
\eta^p \left.DG\right|_{x_0} \left(\partial_{x^p}\right)
\end{align}$$

Recall that, for a general linear transformation $A\in GL(m,n)$, we have $\ker(A) \perp \mbox{row space}(A^\perp)$. Note that $df$ is a $1-$form and so has a $n-1$ dimensional kernel, whereas $DG$ is $p-$form, and has at most a $n-p$ dimensional kernel. Thus, $\ker(DG) \leqslant \ker(df)$ is possible if and only if 

$$\mbox{row space}(df ^T) \subset \mbox{row space}\left(DG^T\right)$$ 

This means that there must be some $\mu ^i\partial_{x^i} \in T_{f(x_0)}\mathbb{R}^p$ such that

$$\begin{bmatrix}
\frac{\partial f}{\partial x^1}\\
\vdots \\
\frac{\partial f}{\partial x^m}\\
\end{bmatrix}
=
\begin{bmatrix}
\frac{\partial g_1}{\partial x^1} & \cdots & \frac{\partial g_p}{\partial x^1} \\
\vdots & \ddots & \vdots \\
\frac{\partial g_1}{\partial x^m} & \cdots & \frac{\partial g_p}{\partial x^m}
\end{bmatrix}
\begin{bmatrix}
\mu ^1\\
\vdots \\
\mu ^p
\end{bmatrix}
$$

Thus, we must have

$$\boxed{ \left.df\right|_{x_0} \wedge \left(\left.dg_1\right|_{x_0} \wedge \cdots \wedge \left.dg_p\right|_{x_0}\right) = 0 \in \bigwedge^{p+1} T^*_{x_0} M}$$

This is equivalent to the linear dependence of the $p+1$ vectors $\{\left(\left.df\right|_{x_0}\right)^\# , \left(\left.dg_1\right|_{x_0}\right)^\#, \ldots ,\left(\left.dg_p\right|_{x_0}\right)^\# \}$, i.e., there exist $\lambda ^i$ such that $df|_{x_0} = \lambda^i dg_i|_{x_0}$
