# Understanding Itegrals and ther Applications in ML
- What are integrals
  - compute the area under a curve, representing accumulation
  - the definite integral of f(x) from a to b:
    ![image.png](attachment:image.png)
- Applications in ML
  - Probability Distributions
  - Cost Functions
  


In [4]:
import sympy as sp
x = sp.Symbol('x')
f = x**2
definite_integral = sp.integrate(f, (x, 0, 2))
indefinite_integral = sp.integrate(f, x)
print(f"Definite integral of x^2 from 0 to 2: {definite_integral}") 
print(f"Indefinite integral of x^2: {indefinite_integral}")



Definite integral of x^2 from 0 to 2: 8/3
Indefinite integral of x^2: x**3/3


# Optimization Concepts
- Local vs Global Minima
  - Local Minimum
  - Global Minimum
- Convex Functions
  - $$f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda) f(x_2), \quad \text{for all } \lambda \in [0, 1]$$
  - Ensure that any local minimum is also a global minimum

# Stochastic Gradient Descent (SGD) and its Variants
- What is Stochastic Gradient descent?
  - Optimization algorithm that uses random subsets (mini-batches) of the data to compute gradients and update parameters
- Why use SGD?
  - faster convergence of large data sets

- Variants of SGD
  - mini-batch SGD
  - Momentum
  - Adam Optimizer


In [5]:
# Excercise 1: Calculate Integrals of Simple Functions
import sympy as sp
x = sp.Symbol('x')
f = sp.exp(-x)
indefinite_integral = sp.integrate(f, x)
print(f"Integral: {indefinite_integral}")
definite_integral = sp.integrate(f, (x, 0, sp.oo))

print(f"Definite integral: {definite_integral}")

# Excersize 2: Implement Stochastic Gradient Descent for a Linear Model
import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# add bias to term X
X_b = np.c_[np.ones((100, 1)), X]  # add x0 = 1 to each instance

# SGD Implementation
def stochastic_gradient_descent(X, y, theta, learning_rate=0.01, n_iterations=1000):
    m = len(y)
    for iteration in range(n_iterations):
        for i in range(m):
            random_index = np.random.randint(m)
            xi = X[random_index:random_index + 1]
            yi = y[random_index:random_index + 1]
            gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
            theta = theta - learning_rate * gradients
    return theta

# initialize parameters
theta = np.random.randn(2, 1)
learning_rate = 0.01
n_epochs = 50
theta_best = stochastic_gradient_descent(X_b, y, theta, learning_rate, n_epochs)
print(f"Best theta: {theta_best}")



Integral: -exp(-x)
Definite integral: 1
Best theta: [[3.87186075]
 [3.06654427]]
