# Gradient Descent

## Explanation
Gradient descent is an optimization algorithm used to minimize the cost function in machine learning models. It iteratively adjusts the model parameters to find the optimal values that minimize the cost function.

### Importance in Machine Learning
Gradient descent is crucial in training machine learning models, especially in deep learning. It helps in finding the optimal parameters that minimize the error between the predicted and actual values.

### Gradient Descent Algorithms
1. **Batch Gradient Descent**: Uses the entire dataset to compute the gradient and update the parameters.
2. **Stochastic Gradient Descent (SGD)**: Uses a single data point to compute the gradient and update the parameters.
3. **Mini-batch Gradient Descent**: Uses a small batch of data points to compute the gradient and update the parameters.

### Examples
- **Batch Gradient Descent**: If the cost function is J(θ), the parameters are updated as θ = θ - α * ∇J(θ), where α is the learning rate.
- **Stochastic Gradient Descent**: If the cost function is J(θ), the parameters are updated as θ = θ - α * ∇J(θ; x^(i)), where x^(i) is a single data point.
- **Mini-batch Gradient Descent**: If the cost function is J(θ), the parameters are updated as θ = θ - α * ∇J(θ; X^(i)), where X^(i) is a small batch of data points.

## Practice
For practice, refer to the [Gradient Descent Notebook](03_gradient_descent.ipynb).


In [None]:
# Gradient Descent Example
import numpy as np
import matplotlib.pyplot as plt

# Cost function
def cost_function(X, y, theta):
    m = len(y)
    J = np.sum((X.dot(theta) - y) ** 2) / (2 * m)
    return J

# Gradient Descent
def gradient_descent(X, y, theta, alpha, num_iters):
    m = len(y)
    J_history = []
    for i in range(num_iters):
        theta = theta - (alpha / m) * X.T.dot(X.dot(theta) - y)
        J_history.append(cost_function(X, y, theta))
    return theta, J_history

# Example data
X = np.array([[1, 1], [1, 2], [1, 3], [1, 4], [1, 5]])
y = np.array([1, 2, 3, 4, 5])
theta = np.array([0, 0])
alpha = 0.01
num_iters = 1000

# Perform gradient descent
theta, J_history = gradient_descent(X, y, theta, alpha, num_iters)

# Plot the cost function history
plt.plot(J_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost Function History')
plt.show()

print('Theta:', theta)
print('Final cost:', J_history[-1])
