# Gradient Descent

### **Overview:**
  - Gradient descent is a foundational algorithm in machine learning, not limited to linear regression but applicable to training advanced neural network models, among others. It provides a systematic approach for minimizing the cost function $J(w, b)$, optimizing parameters $w$ and $b$ to achieve the lowest possible cost.

### **Key Concepts:**
  - **Gradient Descent Purpose:** Minimize the cost function $J(w, b)$ over model parameters $w$ and $b$.<br><br>
  - **General Application:** Although illustrated with linear regression, gradient descent is a versatile method applicable to functions with multiple parameters ($w_1, w_2, \ldots, w_n, b$) across various machine learning models.<br><br>
  - **Initial Guess:** Starts with initial guesses for parameters, commonly set to 0 ($w = 0, b = 0$), and iteratively adjusts $w$ and $b$ to decrease $J(w, b)$.<br><br>
  - **Process:** By iteratively taking steps in the direction that most steeply decreases $J$, gradient descent seeks to find the function's minimum, ideally the global minimum but possibly a local minimum.

### **Process:**
![ProImage](./image/GradientDescent.png)
  - **Direction of Descent:** At each step, the algorithm evaluates the surrounding landscape to determine the direction that most steeply descends toward a minimum.<br><br>
  - **Local Minima:** Depending on the starting point, gradient descent may lead to different local minima, illustrating the algorithm's sensitivity to initial conditions.

### **Implementing Gradient Descent**

#### The Gradient Descent Formula:

- **Update Rule:** On each iteration, parameters are updated as follows:
  - $w := w - \alpha \frac{\partial}{\partial w}J(w,b)$
  - $b := b - \alpha \frac{\partial}{\partial b}J(w,b)$
  Where $\alpha$ is the learning rate, a small positive number that controls the step size.

#### Mathematic behind the algorithm:
- **Gradient:** 
    - In calculus, the gradient of a function represents the direction and rate of the fastest increase of the function. 
$->$ The fastest way to decrease the function is minus with Gradient <br><br>

- **Learning Rate:** 
    - Because of the road to the optimize point will took much iterations to get, so change the large $\alpha$ will make this process faster.
    - But if the $\alpha$ is too large, your algorithm can't get to the optimize point