# gradient descent

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# Define a simple cost function (quadratic function)
def cost_function(theta):
    return theta**2 + 5

# Define the gradient of the cost function
def gradient(theta):
    return 2 * theta

# Gradient Descent function
def gradient_descent(learning_rate, num_iterations):
    theta = np.random.rand()  # Initialize theta with a random value
    theta_history = [theta]  # Store the history of theta values
    cost_history = [cost_function(theta)]  # Store the history of cost values

    for i in range(num_iterations):
        gradient_value = gradient(theta)  # Compute the gradient
        theta = theta - learning_rate * gradient_value  # Update theta
        theta_history.append(theta)
        cost_history.append(cost_function(theta))

    return theta_history, cost_history

# Hyperparameters
learning_rate = 0.1
num_iterations = 50

# Perform Gradient Descent
theta_history, cost_history = gradient_descent(learning_rate, num_iterations)

# Plot the cost function and the path taken by gradient descent
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(theta_history, cost_history, marker='o')
plt.xlabel('Theta')
plt.ylabel('Cost')
plt.title('Cost Function')

plt.subplot(1, 2, 2)
plt.plot(theta_history, marker='o')
plt.xlabel('Iterations')
plt.ylabel('Theta')
plt.title('Gradient Descent Path')

plt.tight_layout()
plt.show()


anatomy 

Anatomy of Gradient descent
Gradient descent is an optimization algorithm used in machine learning and deep learning to minimize the cost or loss function of a model by iteratively adjusting the model's parameters. It's a fundamental concept in the training of neural networks and other optimization problems. Here's the anatomy of gradient descent:

1. **Objective Function (Cost/Loss Function)**:
   - Gradient descent begins with an objective function, often denoted as J(θ), where θ represents the parameters of the model.
   - The objective is to minimize this function by finding the optimal values of θ.

2. **Initialization**:
   - Gradient descent starts by initializing the model parameters θ with some values.
   - Common initialization methods include setting θ to zeros, random values, or predefined values.

3. **Learning Rate (α)**:
   - The learning rate (α) is a hyperparameter that determines the step size at each iteration.
   - It's a critical parameter as it influences the convergence and stability of the algorithm. Choosing an appropriate learning rate is crucial.

4. **Gradient Calculation**:
   - The key step in gradient descent is to compute the gradient of the cost function with respect to the parameters (∇J(θ)).
   - The gradient points in the direction of the steepest increase in the cost function.

5. **Update Parameters**:
   - The parameters θ are updated using the gradient and learning rate:
     θ = θ - α * ∇J(θ)
   - The subtraction ensures that the algorithm moves in the direction of decreasing cost.

6. **Iteration**:
   - Steps 4 and 5 are repeated iteratively until a stopping criterion is met. This can be a fixed number of iterations, a target cost value, or other convergence criteria.

7. **Convergence**:
   - Gradient descent converges when it reaches a point where the gradient is close to zero or when it meets the predefined stopping criterion.
   - Convergence implies that the parameters θ have reached values that minimize the cost function.

8. **Cost Monitoring**:
   - Typically, the cost function is monitored during training. The goal is to observe the cost decreasing over iterations, indicating that the algorithm is converging.

9. **Types of Gradient Descent**:
   - There are different variants of gradient descent, including:
     - **Batch Gradient Descent**: It computes the gradient using the entire training dataset in each iteration.
     - **Stochastic Gradient Descent (SGD)**: It computes the gradient using one training example at a time, making it faster but with more variance.
     - **Mini-Batch Gradient Descent**: A compromise between batch and stochastic gradient descent, where a small random subset of data is used in each iteration.

10. **Regularization and Optimization Techniques**:
    - In practice, gradient descent is often used with additional techniques like L1 and L2 regularization, momentum, and adaptive learning rates (e.g., Adam, RMSprop) to improve convergence and generalization.

11. **Termination**:
    - Gradient descent terminates when the stopping criterion is met, and the trained model parameters are used for predictions.

12. **Challenges**:
    - Gradient descent may get stuck in local minima, and the choice of the learning rate is crucial.
    - Poorly conditioned or ill-posed problems can also affect gradient descent.

Overall, gradient descent is a fundamental algorithm in machine learning and deep learning, and its successful application requires careful tuning of hyperparameters and an understanding of the data and problem at hand.
