I'm going to show how to deduce the image derivatives from various assumption and constrains.

思路：
* 用不同的算法优化，例如线性规划
* 用不同的模型描述，例如NN

# First Derivative

## 1D

A one dimension image can be viewed as a function of $x$, $$\begin{equation} Image = f(x)\label{eq:image} \end{equation}$$
, and after sampled to a digital image, $$\begin{equation} Digital Image = g(n) \label{eq:digit_image} \end{equation}$$, where $f(n) = g(n)$ and $n \in \{0, 1, \cdots, N-1\}$.

Now the question is how to calculate the derivative $f'(n)$. At first it's easy to find out, in most cases, the precise can not be get because 

1. There is not an analytical expression of $f(x)$;
2. We only know the value of $g(0), g(1), \cdots, g(N-1)$.

So to resolve the problem, what need to do is to find out a good estimation of $f'(n)$, $\hat{f'}(n)$.

For convenience, I still use $g(x)$ instead of $g(n)$ in the following discussion. 

### From statistical

### Using Stochastic Gradient Descent 

## 2D

### Prewitt

### Sobel

### Roberts Cross

# One Dimension

### Using Underdestermined Coefficient with Taylor Polynomial

Assume within a local region of $x_0$, lets say $\Psi_{x_0}$, $$\begin{equation} f(x) = h(x) + \epsilon \end{equation}$$, where $$\begin{equation} h(x) = h(x_0) + (x - x_0)f'(x_0) \end{equation}$$. This is a first order Taylor expansion.

According to the Taylor theorem, $\epsilon = \frac{(x - x_0)^2}{2!}f'(\xi)$, where $\xi \in [x_0, x+x_0]$.

Let's suppose $x$ takes on values of $\Psi = \{-1, 0, 1\}$. Then I get 
$$\begin{align} 
\epsilon(1) &= f(1) - f(0) - f'(0)\\
\epsilon(0) &= 0\\
\epsilon(-1) &= f(-1) - f(0) + f'(0)
\end{align}$$
If I assume $$\begin{equation}\sum_{x \in \Psi}\epsilon(x) = 0\end{equation}$$, then I get $$f'(0) = \frac{f(1) - f(-1)}{2}$$.

## Using Least Mean Square

### First Derivative

Assume within a local region of $x_0$, according to the Taylor theorem, set $\Psi_{x_0}$, $$\begin{equation} f(x) \approx h(x) = wx + b\end{equation},$$where $w$ is the $\hat{f'}(x_0)$. 

Let's suppose $x$ takes on values of $\Psi = \{-1, 0, 1\}$. Then I get the optimal $$w^*=\frac{1}{2}argmin_w E(w) = argmin_w \sum_{x \in \Psi}(wx + b - g(x))^2,$$get $$\frac{\partial{E(w)}}{\partial{w}} = 0 \Rightarrow w = \frac{f(1) - f(-1)}{2}.$$

### Second Derivative

Use a quadratic polynomial to simulate a local curve, that is $$\begin{equation} f(x) \approx h(x) = ax^2 + bx + c,\end{equation}$$where, without loss of generality, $x \in \{-1, 0, 1\}$. Then $h''(x) = 2a$ can be taken as a approximation of $f''(x)$.

Set $\mathbf{w} = [a, b, c]^T$, $\mathbf{x} = [x^2, x, 1]^T$, we want to find out
$$\mathbf{w}^* = argmin_{\mathbf{w}} E(\mathbf{w}),$$where
$$E(\mathbf{w}) = \sum_x(\mathbf{x}^T\mathbf{w} - f(x))^2.$$
So we my take derivatives and set it to zero,
$$\begin{align}
\frac{\partial{E}}{\partial{\mathbf{w}}} &= 2 \sum_x \mathbf{x}(\mathbf{x}^T\mathbf{w} - f(x)) \\
&= 2 ((\sum_x \mathbf{x}\mathbf{x}^T)\mathbf{w} - \sum_x \mathbf{x}f(x)) \\
&= 0
\end{align}.$$
Because, 
$$\begin{align} \sum_x \mathbf{x}\mathbf{x}^T &= \left[\matrix{\sum_xx^4 & \sum_xx^3 & \sum_xx^2\\ \sum_xx^3 & \sum_xx^2 & \sum_xx\\ \sum_xx^2 & \sum_xx & \sum_x1}\right] \\
&= \left[\matrix{2&0&2\\ 0&2&0\\ 2&0&3}\right]\end{align},$$
so we get a set of equations,
$$\left[\matrix{2&0&2\\ 0&2&0\\ 2&0&3}\right] \cdot \left[\matrix{a\\ b\\ c}\right] = \left[\matrix{\sum_xx^2f(x)\\ \sum_xxf(x)\\ \sum_xf(x)}\right].$$
Resolve these equations, we get $$\begin{align}f'(0) &\approx h'(0) = b = \frac{f(1)-f(-1)}{2}\\
f''(0) &\approx h''(0) = 2a = f(-1) - 2f(0) + f(1) = \left[\matrix{1&-2&1}\right] \cdot \left[\matrix{f(-1)\\ f(0)\\ f(1)}\right]\end{align}$$

## Using Lagrange Interpolating Polynomial

Given $n+1$ data, $(x_0,y_0),\cdots,(x_n,y_n)$，where all $x_i$ are different, a Lagrange's formual of order $\le n$ is defined as
$$P_n(x)=y_0L_0(x)+y_1L_1(x)+\cdots+y_nL_n(x),$$
where $L_i(x)$, the Lagrange interpolation basis functions, is defined as
$$L_i(x)=\frac{(x-x_0) \cdots (x-x_{i-1})(x-x_{i+1}) \cdots (x-x_n)}{(x_i-x_0) \cdots (x_i-x_{i-1})(x_i-x_{i+1}) \cdots (x_i-x_n)}.$$

Now given $(x_{-1}=-1, f(-1)), (x_0=0, f(0)), (x_1=1, f(1))$, we have
$$
\begin{align}
P_2(x) &= \frac{(x - x_0)(x - x_1)}{(x_{-1}-x_0)(x_{-1}-x_1)}f(x_{-1}) + \frac{(x - x_{-1})(x - x_1)}{(x_0-x_{-1})(x_0-x_1)}f(x_0) + \frac{(x - x_{-1})(x - x_0)}{(x_1-x_{-1})(x_1-x_0)}f(x_1) \\
&= \frac{x(x - 1)}{2}f(-1) - (x + 1)(x - 1)f(0) + \frac{x(x + 1)}{2}f(1)
\end{align}
$$
Then
$$
\begin{align}
P_2'(x) &= \frac{2x-1}{2}f(-1) - 2xf(0) + \frac{2x+1}{2}f(1)\\
P_2'(0) &= \frac{f(1)-f(-1)}{2} \\
P_2''(x) &= P_2''(0) = f(-1) - 2f(0) + f(1) \\
\end{align}
$$

## Two Dimension

# References

Irwin Sobel, History and Definition of Sobel Operator