### One-dimensional Gradient Descent

To find the minimum of differentiable function $f : [a, b] \rightarrow \mathbb{R}$ we can use the following **One-dimensional $Gradient$ $Descent$ Algorithm **

1. Choose a point  $r_1 \in [a, b]$
2. Define $i$ as a step number of Gradient Descent. For now it is $i = 1$
3. Calculate $f'(r_i)$.
4. If $f'(r_i)=0$: the algorithm stops.  
   If $f'(r_i)>0$: we should move to left, so we choose $\delta > 0$ and assign $r_{i+1}=r_i-\delta$.  
   If $f'(r_i)<0$: we should move to right, so we choose $\delta > 0$ and assign $r_{i+1}=r_i+\delta$.  
5. Replace $i$ to $i+1$ and repeat the steps 3, 4 and 5.

If in Gradient Descent Algorithm we take a step $-\lambda f'(r_i)$ for some positive value $\lambda > 0$, then this $\lambda$ is called $learning\ rate$. In this case in the point 4 of the algorithm $r_{i+1}=r_i-\lambda f'(r_i)$.



### $\mathbb{R}$: distances and vectors
$\mathbb{R^n}$ - is a set of ordered arrays $(x_1, x_2,...,x_n)$ , such as $\forall \ i : x_i \in \mathbb{R}$. Each of this array is called $point \ in \ \mathbb{R^n}$.
We call $f$ as $a \ function \ of \ many \ variables$, if $f$ maps $D$ to $\mathbb{R}$, where $D \subset \mathbb{R^n}$ for some $n$. In other words, domain of the function $f$ must be subset of $\mathbb{R^n}$ and codomain of the function $f$ is subset of $\mathbb{R}$.  
$Euclidean\ Distance$ between points $a=(a_1,...,a_n) \in \mathbb{R^n}$ and $b=(b_1,...,b_n) \in \mathbb{R^n}$ is defined as  
$d(a,b):=\sqrt{(a_1-b_1)^2+(a_2-b_2)^2+\cdots +(a_1-b_1)^n}$.  
A point $a \in \mathbb{R^n}$ is called $the\ limit\ of\ sequence\ {x_i}$, where $x_i \in \mathbb{R^n}$, if for any $\epsilon > 0$ there will always exist natural number $N$, such that $d(x_i, a) < \epsilon$ for all $i \geqslant N$ (i.e. are in \epsilon neighbourhood the point $a$ when $i \geqslant N$).  
The unformal definition of $a vector space$:
• All the elements $\mathbb{R^n}$ are called vectors and the set $\mathbb{R^n}$ is called the vector space.  
• Vectors can be added to each other. The result of this addition is a vector from the same vector space.  
In the common case the addition of the vectors $(a_1, a_2,...,a_n),(b_1,b_2,...,b_n)\in \mathbb{R^n}$ is defined as:  
$(a_1, a_2,...,a_n)+(b_1, b_2,...,b_n)=(a_1+b_1, a_2+b_2,...,a_n+b_n)\in \mathbb{R^n}$.  
• Also vectors can be multiplied with numbers.  
In the common case the multiplication of a vector $(a_1, a_2,...,a_n)\in \mathbb{R^n}$ and a is defined as:  
$c(a_1,a_2,...,a_n)=(ca_1,ca_2,...,ca_n)\in \mathbb{R^n}$.  
Sometimes we call elements from $\mathbb{R^n}$ as points and sometimes as vectors.  
$The\ length\ of\ the\ vector$ $x={x_1, x_2, ..., x_n}$ is defined as:  
$||x||=\sqrt{x_1^2+x_2^2+\cdots +x_n^2}$

### Differential  
• The functions of the form $a_1\Delta x_1+\cdots +a_n\Delta x_n$ are called $linear functions$ from ($\Delta x_1,...,\Delta x_n$)  
• The equation $a_1(\Delta x_1+\cdots + a_n\Delta)$ is called the linear increment of function $f1$.  
• The function $g(x_1+\Delta x1,...,x_n+\Delta x_n)=f(x_1,...,x_n)+a_1\Delta x_1 +\cdots +a_n \Delta x_n$ is called $linear approximation$ of function $f$ in the point $x$.  

**The unformal definition of Differential**  
$f(x_1+\Delta x_1,...,x_n+\Delta x_n)-f(x_1,...,x_n)\approx d_xf(\Delta x_1,...,\Delta x_n):=a_1\Delta x_1+\cdots +a_n\Delta x_n$

In common case the coefficients a_1,...,a_n depend on the chosen point $x=(x_1,...,x_n)$

**The formal definition of Differential**  
Let the function $f$ is a function from many variables. The function $d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right):=$ $a_{1} \Delta x_{1}+\cdots+a_{n} \Delta x_{n}$ is called Differential from function $f$ in the point $x=(x_1,...,x_n), if the following limit exists and is equal to zero:  

$
\begin{aligned}
\lim _{\left(\Delta x_{1}, \ldots, \Delta x_{n}\right) \rightarrow(0, \ldots, 0)} & \frac{f\left(x_{1}+\Delta x_{1}, \ldots, x_{n}+\Delta x_{n}\right)-\left(f(x)+a_{1} \Delta x_{1}+\cdots+a_{n} \Delta x_{n}\right)}{\left\|\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right\|}:= \\
& :=\lim _{\left(\Delta x_{1}, \ldots, \Delta x_{n}\right) \rightarrow(0, \ldots, 0)} \frac{f\left(x_{1}+\Delta x_{1}, \ldots, x_{n}+\Delta x_{n}\right)-\left(f(x)+d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right)}{\left\|\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right\|}=0
\end{aligned}
$

Where $x, \Delta x$ and $(x+\Delta x)$ are vectors from $n$ variables. 0 in the equation $\lim _{\Delta x \rightarrow 0}$ is a shorthand notation of the vector (0,...,0). 0 in the right part of the equation is a simple variable $0 \in R$ (not a vector).  
If there is a Differential in the point $x$ for function $f$, then the function is called $differentiable\ at\ point\ x$.  
The function $f$ is called $differentiable$, if it is differentiable in the all points of its domain of definition.
**Properties of Differentials**  
1. **Uniqueness of the Differential.**  
Let the function $f$ is a function from many variables. If there is Differential of the function $f$ in the point $x$, then the Differential is unique.  
2. **Differential of constant multiple**  
Let $f$ is differentiable in the point $x$. Then for any number $c \in R$ the function $f$ is differentiable in the point $x$, and  
$d_{x}(c f)=c \cdot d_{x} f$  
3. **Differential of a sum.**  
Let $f$ and $g$ are differentiable in the point $x$. Then the function $f+g$ are differentiable in the point $x$, and   
$d_{x}(f+g)=d_{x} f+d_{x} g$  
4. **Differential of product.**  
Let $f$ and $g$ are differentiable in the point $x$. Then the function $f \cdot g$ is differentiable in the point $x$, and  
$d_{x}(f \cdot g)=f(x) \cdot d_{x} g+g(x) \cdot d_{x} f .$  
NB, that in this equation $f(x)$ and $g(x)$ are numbers, because the point $x$ is fixed.  
5. **Differential of the quotient**.  
Let $f$ and $g$ are differentiable in the point $x$. Let $g$ is defined and is not equal zero in the neighbourhood of the point $x$. Then the function $\frac{f}{g}$ is differentiable in the point $x$, and  
$d_{x}\left(\frac{f}{g}\right)=\frac{g(x) \cdot d_{x} f-f(x) \cdot d_{x} g}{g(x)^{2}}$.  
NB, that in this equation $f(x)$ and $g(x)$ are numbers, because the point $x$ is fixed.  
6. **Differential of the complex function**.  
Let $f$ is the function of one variable and $g$ is the function of $n$ variables. Then $f(g(x))$ is a function of $n$ variables. Let $g$ is differentiable in the point $x$ and there is derivative for $f$ in the point $g(x)$. Then function $f(g(x))$ is differentiable in the point $x$ and its differential is equal  
$f^{\prime}(g(x)) \cdot d_{x} g$.  
NB, that in this equation $f(g(x))$ is a number.




### Partial Derivative
Let be the function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ and the point $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$. Then $the Partial Derivative$ of the $k$-th coordinate is the limit  
$\frac{\partial f}{\partial x_{k}}:=\lim _{t \rightarrow 0} \frac{f\left(x_{1}, \ldots, x_{k}+\Delta x_{k}, \ldots, x_{n}\right)-f\left(x_{1}, \ldots, x_{k}, \ldots, x_{n}\right)}{\Delta x_{k}}$
When calculating of the Partial Derivative of $x_k$ it can be considered all the other variables in the formula as constants. In other words it can be considered as this algorithm:  
1. In the formula for $f$ put the specific values for all the coordinates except $k$-th. that is we put the follwing $(n-1)$ values such as the first coordinate of the point $x$, the second coordinate of the point $x$ and so on, all the except $k$-th coordinate of the point $x$. The resulting function from one variable (from the variable $x_k$).  
2. Calculate the derivative from the resulting function from one variable.
3. Find the derivative in the specific point. That is put the $k$-th coordinate of the point $x$.  
The obtained function in the Point #1 describes behaviour of the $f$ on the straight line, passing through the point $x$ and the parallel $k$-th coordinate axis. That is we fix all the coordinates except $k$-th and let it change only $k$-th coordinate. The obtained equation in the point #1 is called $limitation$ of the function $f$ to this straight line. Found Partial Derivative describes growth rate of the function $f$ along this straight line in the point $x$. 

**Theorem**.  
Let be a function $f$ from $n$ variables. Let be Differential of $f$ in the point $x$ $d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)=$ $a_{1} \Delta x_{1}+\cdots+a_{n} \Delta x_{n}$ and partial derivatives $\frac{\partial f}{\partial x_{1}}, \ldots, \frac{\partial f}{\partial x_{n}}$. Then  
$a_{1}=\frac{\partial f}{\partial x_{1}}, \ldots, a_{n}=\frac{\partial f}{\partial x_{n}} .$  
That is for any $j=1,...,n$ value $a_j$ is equal partial derivative of function $f$ of $j$-th coordinate calculated in the point $x$. In other words:  
$d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)=\frac{\partial f}{\partial x_{1}} \Delta x_{1}+\frac{\partial f}{\partial x_{2}} \Delta x_{2}+\cdots+\frac{\partial f}{\partial x_{n}} \Delta x_{n}$,  
where all the partial derivatives are calculated in the point $x$.  

**Theorem**.  
Let be a function $f$ from $n$ variables. Let $f$ is defined in a neighborhood of the point $x$, and in the point $x$ $f$ has partial derivatives of all the coordinates. Then $x$ can be the point of a local mimimum or a local maximum only if all the partial derivatives are equal to zero.  

**Consequence of the Theorem**.  
Let in the point $x$ there is Differential $d_x f$. The point $x$ can be the point of a local minimum or a local maximum only if $d_x f=0$ (that is $d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)=0$ for any $\Delta x_{1}, \ldots, \Delta x_{n}$).  
We can interpret $\frac{\partial f}{\partial x_{k}}$ as the function, that maps any point $x \in \mathbb{R^n}$ in the partial derivative $\frac{\partial f}{\partial x_{k}}$ calculated in the point (for such $x \in \mathbb{R}^{n}$, in which $\frac{\partial f}{\partial x_{k}}$ is defined.  
  
**Properties of a Partial Derivative as a function**.  
Let the peartial deriavatives are defined for the functions $f$ and $g$. Then for the partial derivative the following conditioans are true as for the usual derivative:  
1. For the function $f+g$ is defined the partial derivative of $x_k$ and $\frac{\partial(f+g)}{\partial x_{k}}=\frac{\partial f}{\partial x_{k}}+\frac{\partial g}{\partial x_{k}}$.  
2. Fot the function $cf$ is defined the partial derivative of $x_k$ and $\frac{\partial(c f)}{\partial x_{k}}=c \frac{\partial f}{\partial x_{k}}$, where $c \in \mathbb{R}$.  
3. For the function $fg$ is defined the partial derivative of $x_k$ and $\frac{\partial(f g)}{\partial x_{k}}=\frac{\partial f}{\partial x_{k}} g+f \frac{\partial g}{\partial x_{k}}$.  
4. For the function $c$ partial derivative for $x_k$ is equal zero.

