# Applied Calculus in Machine Learning & Optimization (Phase 7)

This phase transforms your analytical mastery into *computational intelligence*.  
It covers how gradients, derivatives, and integrals become the engines of learning — structured into **6 modules** and **100 core problems / experiments**.

---

## **Module 1 — Gradient Mechanics & Cost Landscapes (1–15)**

1. Compute gradient of $$L(w)=\frac{1}{2}(y-\hat{y})^2.$$
2. Compute gradient for $$L(w)=|Xw-y|^2.$$
3. Derive update rule $$w_{t+1}=w_t-\eta\nabla L(w_t).$$
4. Compute gradient of $$f(x)=x^4-3x^3+2x$$ at $$x=1,2,3.$$
5. Visualize cost surface $$f(x,y)=x^2+y^2.$$
6. Compute contour lines and direction of steepest descent.
7. Show gradient ⟂ contour for $$f(x,y)=x^2+y^2.$$
8. Approximate gradient numerically via finite differences.
9. Implement gradient check for neural loss $$L(w)=|Xw-y|^2.$$
10. Show gradient sign controls learning direction.
11. Compute Hessian $$H=\nabla^2f$$ for $$f=x^2+xy+y^2.$$
12. Identify convex vs non-convex shapes via eigenvalues of $$H.$$
13. Plot cost function $$f(x)=x^2$$ and $$f(x)=x^4-3x^3+2$$ to contrast convexity.
14. Demonstrate gradient descent convergence on $$f(x)=x^2.$$
15. Show divergence if $$\eta$$ (learning rate) is too large.

---

## **Module 2 — Optimization Algorithms (16–35)**

16. Derive SGD update for linear regression.  
17. Derive batch gradient descent and compare.  
18. Implement momentum update: $$v_t=\beta v_{t-1}+(1-\beta)\nabla L.$$  
19. Simulate overshooting and damping in momentum.  
20. RMSProp: $$s_t=\rho s_{t-1}+(1-\rho)\nabla^2L.$$  
21. Adam optimizer: $(m_t, v_t)$ updates and bias correction.  
22. Compare convergence of SGD vs Adam.  
23. Use contour plots to visualize optimizer paths.  
24. Show adaptive learning rates reduce oscillations.  
25. Compute gradient clipping for exploding gradients.  
26. Derive learning-rate schedule $$\eta_t=\frac{\eta_0}{1+kt}.$$  
27. Optimize $$f(x,y)=x^2+3y^2$$ with different $$\eta.$$  
28. Analyze local minima in $$f(x)=x^4-3x^3+2.$$  
29. Demonstrate saddle point in $$f(x,y)=x^2-y^2.$$  
30. Show escaping saddle with noise (SGD stochasticity).  
31. Derive condition for convergence in convex quadratic.  
32. Compute gradient norm threshold stopping rule.  
33. Plot loss vs iteration for different optimizers.  
34. Measure computational cost per iteration.  
35. Summarize optimizer trade-offs.

---

## **Module 3 — Jacobians, Chain Rule & Backpropagation (36–55)**

36. Compute Jacobian of $$f(x,y)=(x^2+y,\,xy).$$  
37. Find Jacobian for softmax: $$s_i=\frac{e^{z_i}}{\sum_j e^{z_j}}.$$  
38. Derive chain rule for composite $$f(g(h(x))).$$  
39. Compute $$\frac{dy}{dx}$$ for $$y=\sigma(wx+b).$$  
40. Differentiate ReLU, sigmoid, tanh functions.  
41. Derive backprop rule for one-layer perceptron.  
42. Derive gradient of cross-entropy loss $$L=-y\ln\hat{y}-(1-y)\ln(1-\hat{y}).$$  
43. Backprop through two-layer network $$a_2=W_2\sigma(W_1x+b_1)+b_2.$$  
44. Compute gradient of MSE loss wrt $$(W_1,W_2).$$  
45. Implement numerical gradient check.  
46. Compute Jacobian of convolution output wrt kernel weights.  
47. Backprop through pooling (max, avg).  
48. Chain rule through normalization layer.  
49. Compute gradient of batch-norm output wrt input.  
50. Backprop through residual block $$y=x+f(x).$$  
51. Show how vector calculus ($\nabla$) operator underlies backprop.  
52. Compute elementwise gradient for matrix multiplication $$C=AB.$$  
53. Derive gradient for softmax cross-entropy in matrix form.  
54. Differentiate scalar loss wrt tensor using Einstein summation.  
55. Visualize computational graph and gradient flow.

---

## **Module 4 — Optimization Landscapes & Loss Geometry (56–75)**

56. Analyze gradient field for $$f(x,y)=x^3-3xy^2.$$  
57. Find critical points and classify via Hessian eigenvalues.  
58. Plot gradient flow trajectories.  
59. Examine vanishing/exploding gradients in deep chain $$y=\sigma^n(x).$$  
60. Derive gradient scaling through sigmoid composition.  
61. Analyze non-convex loss with multiple minima.  
62. Show how momentum smooths noisy landscapes.  
63. Compute curvature matrix $$H$$ numerically from gradients.  
64. Visualize loss contours for $$f(w_1,w_2)=w_1^2+100w_2^2.$$  
65. Simulate ill-conditioned optimization.  
66. Use preconditioning (normalize gradient directions).  
67. Demonstrate Newton’s method: $$x_{n+1}=x_n-\frac{f'(x)}{f''(x)}.$$  
68. Apply Newton to $$f(x)=x^3-x-1.$$  
69. Compare Newton vs gradient descent convergence.  
70. Quasi-Newton (BFGS) update concept.  
71. Derive Hessian-free optimization idea.  
72. Compute eigen-decomposition of $$H$$ to analyze curvature.  
73. Implement trust-region update rule.  
74. Plot loss curvature before and after normalization.  
75. Explore plateau and saddle escape techniques.

---

## **Module 5 — Probabilistic & Integral Links (76–90)**

76. Interpret integration as expectation: $$E[f(X)]=\int f(x)p(x)\,dx.$$  
77. Approximate expectation using Monte Carlo.  
78. Compute gradient of expected loss via sampling.  
79. Apply reparameterization trick for $$z=\mu+\sigma\epsilon.$$  
80. Derive stochastic gradient of ELBO in variational autoencoder.  
81. Estimate area under PDF numerically.  
82. Integrate Gaussian kernel $$e^{-x^2}$$ via Simpson’s rule.  
83. Compute normalization constant of exponential family.  
84. Derive gradient of log-likelihood $$\nabla_\theta \log p_\theta(x).$$  
85. Compute Fisher information via second derivative.  
86. Show gradient descent ≈ maximum likelihood under quadratic loss.  
87. Approximate integral of sigmoid with Gaussian input.  
88. Use Monte Carlo integration for reinforcement-learning return.  
89. Compute policy-gradient estimator $$\nabla_\theta E[R].$$  
90. Apply variance reduction via baselines in gradient estimates.

---

## **Module 6 — Advanced Applications & Experimentation (91–100)**

91. Derive gradient for loss $$L(W)=\frac{1}{2N}\sum_i|W x_i - y_i|^2.$$  
92. Compute numerical gradient for logistic regression cost.  
93. Visualize gradient descent path in 3D loss surface.  
94. Implement finite-difference gradient verification for CNN weight.  
95. Estimate gradient noise variance during SGD.  
96. Compute adaptive learning rate from curvature estimate.  
97. Apply backprop to LSTM cell gate equations.  
98. Derive gradient clipping impact analytically.  
99. Plot convergence curve for Adam vs RMSProp.  
100. Perform numerical experiment comparing gradient methods on Rosenbrock function $$f(x,y)=100(y-x^2)^2+(1-x)^2.$$

---

##  **Outcome of This Phase**

By completing these 100 applied-AI calculus problems, you will:

* Master **gradient descent mathematics** at the research level.  
* Understand **how derivatives control learning**.  
* Be able to **debug and design optimization algorithms**.  
* Move seamlessly from pure calculus to **deep-learning optimization theory**.
