### One-dimensional Gradient Descent

To find the minimum of differentiable function $f : [a, b] \rightarrow \mathbb{R}$ we can use the following **One-dimensional $Gradient$ $Descent$ Algorithm **

1. Choose a point  $r_1 \in [a, b]$
2. Define $i$ as a step number of Gradient Descent. For now it is $i = 1$
3. Calculate $f'(r_i)$.
4. If $f'(r_i)=0$: the algorithm stops.  
   If $f'(r_i)>0$: we should move to left, so we choose $\delta > 0$ and assign $r_{i+1}=r_i-\delta$.  
   If $f'(r_i)<0$: we should move to right, so we choose $\delta > 0$ and assign $r_{i+1}=r_i+\delta$.  
5. Replace $i$ to $i+1$ and repeat the steps 3, 4 and 5.

If in Gradient Descent Algorithm we take a step $-\lambda f'(r_i)$ for some positive value $\lambda > 0$, then this $\lambda$ is called $learning\ rate$. In this case in the point 4 of the algorithm $r_{i+1}=r_i-\lambda f'(r_i)$.



### $\mathbb{R}$: distances and vectors
$\mathbb{R^n}$ - is a set of ordered arrays $(x_1, x_2,...,x_n)$ , such as $\forall \ i : x_i \in \mathbb{R}$. Each of this array is called $point \ in \ \mathbb{R^n}$.
We call $f$ as $a \ function \ of \ many \ variables$, if $f$ maps $D$ to $\mathbb{R}$, where $D \subset \mathbb{R^n}$ for some $n$. In other words, domain of the function $f$ must be subset of $\mathbb{R^n}$ and codomain of the function $f$ is subset of $\mathbb{R}$.  
$Euclidean\ Distance$ between points $a=(a_1,...,a_n) \in \mathbb{R^n}$ and $b=(b_1,...,b_n) \in \mathbb{R^n}$ is defined as  
$d(a,b):=\sqrt{(a_1-b_1)^2+(a_2-b_2)^2+\cdots +(a_1-b_1)^n}$.  
A point $a \in \mathbb{R^n}$ is called $the\ limit\ of\ sequence\ {x_i}$, where $x_i \in \mathbb{R^n}$, if for any $\epsilon > 0$ there will always exist natural number $N$, such that $d(x_i, a) < \epsilon$ for all $i \geqslant N$ (i.e. are in \epsilon neighbourhood the point $a$ when $i \geqslant N$).  
The unformal definition of $a vector space$:
• All the elements $\mathbb{R^n}$ are called vectors and the set $\mathbb{R^n}$ is called the vector space.  
• Vectors can be added to each other. The result of this addition is a vector from the same vector space.  
In the common case the addition of the vectors $(a_1, a_2,...,a_n),(b_1,b_2,...,b_n)\in \mathbb{R^n}$ is defined as:  
$(a_1, a_2,...,a_n)+(b_1, b_2,...,b_n)=(a_1+b_1, a_2+b_2,...,a_n+b_n)\in \mathbb{R^n}$.  
• Also vectors can be multiplied with numbers.  
In the common case the multiplication of a vector $(a_1, a_2,...,a_n)\in \mathbb{R^n}$ and a is defined as:  
$c(a_1,a_2,...,a_n)=(ca_1,ca_2,...,ca_n)\in \mathbb{R^n}$.  
Sometimes we call elements from $\mathbb{R^n}$ as points and sometimes as vectors.  
$The\ length\ of\ the\ vector$ $x={x_1, x_2, ..., x_n}$ is defined as:  
$||x||=\sqrt{x_1^2+x_2^2+\cdots +x_n^2}$

### Differential  
• The functions of the form $a_1\Delta x_1+\cdots +a_n\Delta x_n$ are called $linear functions$ from ($\Delta x_1,...,\Delta x_n$)  
• The equation $a_1(\Delta x_1+\cdots + a_n\Delta)$ is called the linear increment of function $f1$.  
• The function $g(x_1+\Delta x1,...,x_n+\Delta x_n)=f(x_1,...,x_n)+a_1\Delta x_1 +\cdots +a_n \Delta x_n$ is called $linear approximation$ of function $f$ in the point $x$.  

**The unformal definition of Differential**  
$f(x_1+\Delta x_1,...,x_n+\Delta x_n)-f(x_1,...,x_n)\approx d_xf(\Delta x_1,...,\Delta x_n):=a_1\Delta x_1+\cdots +a_n\Delta x_n$

In common case the coefficients a_1,...,a_n depend on the chosen point $x=(x_1,...,x_n)$

**The formal definition of Differential**  
Let the function $f$ is a function from many variables. The function $d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right):=$ $a_{1} \Delta x_{1}+\cdots+a_{n} \Delta x_{n}$ is called Differential from function $f$ in the point $x=(x_1,...,x_n), if the following limit exists and is equal to zero:  

$
\begin{aligned}
\lim _{\left(\Delta x_{1}, \ldots, \Delta x_{n}\right) \rightarrow(0, \ldots, 0)} & \frac{f\left(x_{1}+\Delta x_{1}, \ldots, x_{n}+\Delta x_{n}\right)-\left(f(x)+a_{1} \Delta x_{1}+\cdots+a_{n} \Delta x_{n}\right)}{\left\|\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right\|}:= \\
& :=\lim _{\left(\Delta x_{1}, \ldots, \Delta x_{n}\right) \rightarrow(0, \ldots, 0)} \frac{f\left(x_{1}+\Delta x_{1}, \ldots, x_{n}+\Delta x_{n}\right)-\left(f(x)+d_{x} f\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right)}{\left\|\left(\Delta x_{1}, \ldots, \Delta x_{n}\right)\right\|}=0
\end{aligned}
$

Where $x, \Delta x$ and $(x+\Delta x)$ are vectors from $n$ variables. 0 in the equation $\lim _{\Delta x \rightarrow 0}$ is a shorthand notation of the vector (0,...,0). 0 in the right part of the equation is a simple variable $0 \in R$ (not a vector).  
If there is a Differential in the point $x$ for function $f$, then the function is called $differentiable\ at\ point\ x$.  
The function $f$ is called $differentiable$, if it is differentiable in the all points of its domain of definition.