# Phase 8: Tensor Calculus, Manifolds, and Backprop in Deep Networks

100 problems/experiments grouped by module. No extra commentary.

---

## Module 1 — Matrix & Tensor Calculus Core (1–15)

1. Compute $$\frac{\partial}{\partial x}\,(a^\top x).$$
2. Compute $$\frac{\partial}{\partial x}\,(x^\top A x)$$ for symmetric $$A.$$
3. Compute $$\frac{\partial}{\partial A}\,\|Ax-b\|_2^2.$$
4. Compute $$\frac{\partial}{\partial W}\,\|XW-Y\|_F^2.$$
5. Derive gradient of $$\mathrm{tr}(A^\top X)$$ w.r.t. $$X.$$
6. Compute $$\frac{\partial}{\partial X}\,\mathrm{tr}(X^\top B X C).$$
7. Derive Jacobian of $$\mathrm{vec}(AXB)$$ w.r.t. $$\mathrm{vec}(X).$$
8. Compute gradient of $$\log\det(X)$$ w.r.t. $$X\succ 0.$$
9. Compute Hessian of $$f(x)=\tfrac12 x^\top Q x + c^\top x.$$
10. Show $$\nabla_X \|X\|_F = \frac{X}{\|X\|_F}$$ for $$X\neq 0.$$
11. Derive gradient/subgradient of the nuclear norm $$\|X\|_*$$ at full-rank $$X$$ via SVD.
12. Compute (sub)derivative of $$\|WX-b\|_1.$$
13. Derive $$\nabla_x \|Ax\|_2 = \frac{A^\top A x}{\|Ax\|_2}.$$
14. Show $$\nabla_X \,\mathrm{tr}(X^{-1}A)=-(X^{-\top} A^\top X^{-\top}).$$
15. Compute $$\frac{\partial (u\otimes v)}{\partial u}$$ and $$\frac{\partial (u\otimes v)}{\partial v}.$$

---

## Module 2 — Automatic Differentiation & Backprop (16–30)

16. Build computational graph for $$f(x)=\exp(\sin(x^2))$$; list forward/backward passes.
17. Backprop through $$y=\mathrm{ReLU}(Wx+b).$$
18. Backprop through softmax + cross-entropy (vector form).
19. Backprop through batch normalization (inference mode).
20. Backprop through batch normalization (training mode with mean/var).
21. Backprop through residual block $$y=x+f(x).$$
22. Backprop through depthwise separable convolution.
23. Backprop through max-pooling (argmax mask).
24. Backprop through average pooling.
25. Backprop through layer norm.
26. Backprop through GELU activation.
27. Backprop through dropout (inverted scaling).
28. Compute gradient checkpointing effect on graph (symbolic).
29. Compare forward-mode vs reverse-mode AD for $$f:\mathbb{R}^{d}\to\mathbb{R}$$ and $$f:\mathbb{R}^{d}\to\mathbb{R}^d.$$
30. Derive vector–Jacobian product (VJP) and Jacobian–vector product (JVP) identities.

---

## Module 3 — Manifolds & Riemannian Geometry Basics (31–45)

31. Define a smooth chart for the 2-sphere $$S^2$$ minus a pole; compute transition map Jacobian.
32. Compute metric $$g$$ on $$S^2$$ in spherical coordinates.
33. Derive Christoffel symbols $$\Gamma^k_{ij}$$ for $$S^2.$$
34. Write geodesic equation on $$S^2.$$
35. Compute geodesic distance (great-circle) between two points on $$S^2.$$
36. Compute gradient of a scalar field $$f(\theta,\phi)$$ on $$S^2$$ under metric $$g.$$
37. Show equivalence of intrinsic gradient and Euclidean projection for $$S^2.$$
38. Compute volume element $$\sqrt{\det g}\,d\theta\,d\phi$$ on $$S^2.$$
39. Derive Gaussian curvature for $$S^2.$$
40. Define atlas for $$SO(2)$$; compute group metric from embedding.
41. Parameterize $$SO(3)$$ via axis–angle; compute left-invariant vector fields.
42. Compute exponential map $$\exp:\mathfrak{so}(3)\to SO(3)$$ for a skew-symmetric matrix.
43. Compute logarithm map on $$SO(3)$$ for a given rotation $$R.$$
44. Prove geodesics on $$\mathbb{R}^n$$ are straight lines under Euclidean metric.
45. Show product manifold metric for $$S^2\times \mathbb{R}.$$

---

## Module 4 — Optimization on Manifolds (46–60)

46. Riemannian gradient of $$f(w)$$ on unit sphere constraint $$\|w\|=1.$$
47. Retraction on sphere using normalization; derive update rule.
48. Optimize PCA direction as maximizer of $$w^\top \Sigma w$$ with $$\|w\|=1.$$
49. Grassmann manifold $$\mathrm{Gr}(k,n):$$ compute tangent projection at $$U$$ with $$U^\top U=I.$$
50. Riemannian gradient for subspace fitting $$f(U)=\|X-UU^\top X\|_F^2.$$
51. Stiefel manifold $$\mathrm{St}(k,n):$$ derive skew-symmetric correction for gradient.
52. Implement Cayley retraction on $$\mathrm{St}(k,n).$$
53. Natural gradient on probability simplex with Fisher metric.
54. Riemannian gradient descent for covariance estimation on SPD manifold.
55. Geodesic on SPD: $$\gamma(t)=X^{1/2}\exp\!\big(t\,\log(X^{-1/2}YX^{-1/2})\big)X^{1/2}.$$ Compute midpoint.
56. Compute Riemannian distance $$d_{\mathrm{SPD}}(X,Y)=\|\log(X^{-1/2}YX^{-1/2})\|_F.$$
57. Projected vs Riemannian gradient comparison for orthonormal $$W.$$
58. Trust-region step on sphere: derive subproblem in tangent space.
59. Constrained optimization on $$SO(3):$$ align two point clouds (Orthogonal Procrustes).
60. EM on manifold: update mean on $$S^2$$ (intrinsic average).

---

## Module 5 — Information Geometry & Natural Gradient (61–75)

61. Derive Fisher information for univariate Gaussian $$\mathcal{N}(\mu,\sigma^2).$$
62. Compute Fisher matrix for logistic regression parameters.
63. Show natural gradient $$\tilde{\nabla} = F^{-1}\nabla$$ for an exponential family.
64. Derive KL divergence local quadratic form via Fisher.
65. Compute natural gradient step for softmax regression.
66. Show equivalence of NGD with preconditioning by Fisher.
67. Derive Fisher for diagonal Gaussian policy (RL).
68. Compute Fisher–Rao metric on probability simplex.
69. Natural gradient for variational Gaussian posterior.
70. Compare Euclidean GD vs NGD on ill-conditioned logistic objective.
71. Relate Gauss–Newton to Fisher for least-squares models.
72. Derive block-diagonal Fisher approximation (K-FAC idea).
73. Compute eigenvalues of Fisher in a 2-parameter Bernoulli model.
74. Show invariance of natural gradient under reparameterization.
75. Implement a one-step NGD on a toy softmax classifier (symbolic).

---

## Module 6 — Deep Networks: Jacobians, Hessians, Curvature (76–85)

76. Jacobian of a 2-layer MLP $$h_1=\phi(W_1x+b_1),\ \ y=W_2h_1+b_2.$$
77. Hessian–vector product (HVP) for MLP using Pearlmutter’s trick.
78. Compute spectral norm of Jacobian $$J=\partial y/\partial x.$$
79. Show gradient explosion via product of layer norms.
80. Derive condition for vanishing gradients with sigmoids.
81. Curvature of loss at optimum for linear regression (closed-form Hessian).
82. HVP for cross-entropy softmax model.
83. Empirical Fisher vs true Fisher in classification model.
84. Schur complement use in block Hessian of two-layer net.
85. Analyze effect of residual connections on Jacobian conditioning.

---

## Module 7 — Transformers, Attention & Geometry (86–93)

86. Backprop through scaled dot-product attention $$\mathrm{Attn}(Q,K,V)=\mathrm{softmax}\!\big(QK^\top/\sqrt{d}\big)V.$$
87. Derive gradient w.r.t. $$Q, K, V$$ in multi-head attention.
88. Jacobian of layer norm in Transformer block.
89. Backprop through positional encoding (sin/cos) parameters.
90. Show softmax temperature effect on gradient scaling.
91. Analyze attention map sensitivity (Jacobian of softmax logits).
92. Compute curvature (HVP) for attention output w.r.t. query vector.
93. Natural gradient step for attention weight matrix under Fisher of softmax.

---

## Module 8 — Diffusion, Normalization, and Continuous Flows (94–100)

94. Backprop through diffusion step $$x_{t+1}=\alpha_t x_t + \sigma_t \epsilon.$$
95. Derive score-matching gradient for denoising objective.
96. Compute Jacobian trace estimator for continuous normalizing flow (CNF).
97. Backprop through ODE solver step (adjoint method outline).
98. Fisher information for Gaussian score model.
99. Natural gradient update for variance parameter in score-based model.
100. Analyze stability of training via Hessian spectrum in diffusion U-Net block.
