# A. Introduction to Calculus

## A1. Introduction

*  What are functions? 
   *  A function is a mathematical relationship between inputs and an output. It can be thought of as a machine that takes in one or more variables and produces a single, corresponding result. For example, a function for the temperature of a room might take in the coordinates ($x$,$y$,$z$) and time ($t$) as inputs and return the temperature at that specific point and time.
   *  The notation $f(x)$ represents "f as a function of x", not "f multiplied by x." This can be a point of confusion due to its seemingly arbitrary nature, but it's a standard convention in mathematical language.
* The creative essence of science
  * Selecting a function to model real-world data is a core, creative step in science and machine learning. This process involves formulating a **hypothesis**—a candidate function that could represent the relationship you're observing. Without this initial creative step, there would be nothing to test or investigate.
* Introduction to Calculus
  * **Calculus** is the study of how functions change with respect to their input variables. It provides a set of tools to investigate and manipulate these functions. By understanding calculus, you can analyze the behavior of functions and use them to model complex phenomena in the real world.
* Gradients and Derivatives
  * A great way to visualize this concept is with a **speed-time graph**.
  * ![Speed time graph](images/speed_time_graph.png)

    * The **gradient** (or slope) of the graph at any point represents the **rate of change** of speed with respect to time, which is the **acceleration**.
    * A positive gradient indicates acceleration, a negative gradient indicates deceleration, and a zero gradient (a flat horizontal line) means constant speed with zero acceleration.
    * The gradient at a single point is called the **local gradient** and can be visualized as the slope of a tangent line that touches the curve at that point.
  * By finding the local gradient at every point on a continuous function, we can create an entirely new function called its **derivative**. The derivative describes the original function's slope at every point.
* Higher-Order Derivatives and Anti-Derivatives
  * This process can be repeated. The derivative of the acceleration function is called the **jerk**, which represents the rate of change of acceleration. This concept is useful for describing the "jerky" motion of a car as it starts and stops. The jerk is the second derivative of the speed-time function.
  * The inverse procedure, finding a function for which our original function is the derivative, is called the **anti-derivative**. For our speed-time example, the anti-derivative would be the **distance-time functio**n, as the rate of change of distance is speed. The anti-derivative is closely related to an integral.

## A2. Derivatives (Sum Rule and Power Rule)

* Defining the Derivative
  * The derivative is the formal mathematical notation for the gradient of a function. For a linear function with a constant gradient, the slope is defined as "**rise over run**."
  * For a non-linear function where the gradient changes at every point, we define the derivative at a specific point $x$ by taking the limit of the "rise over run" formula. We consider a second point that is an infinitesimally small distance $Δx$ away from the first point. As this distance approaches zero, the line connecting the two points becomes a perfect approximation of the tangent line at point $x$. $$ \frac{df}{dx} = f'(x) = \lim_{Δx\to0} (\frac{f(x+Δx)-f(x)}{Δx}) $$
    * The notation for the derivative can be either $f'(x)$ (read as "f prime of x")
    * or $\frac{df}{dx}$ (read as "df by dx"). 
    * The key idea is that we are not dividing by zero, but rather observing the behavior of the expression as $Δx$ gets extremely close to zero.
* Fundamental Rules of Differentiation
  * This definition, while powerful, can be tedious to apply directly to every function. Fortunately, we can derive and use general rules to simplify the process.
  * **The Sum Rule**
    * The derivative of a sum of functions is the sum of their individual derivatives. This means you can differentiate each term in a function separately and then add the results together.
    * Example: The derivative of $f(x)=3x+2$ is the derivative of $3x$ plus the derivative of $2$.
  * **The Power Rule**
    * For a function in the form of $f(x) = ax^b$, its derivative is:
    $$ f(x) = ax^b $$
    $$ f'(x) = abx^{(b-1)} $$
    * The rule is: multiply the coefficient by the original power, and then subtract 1 from the power.
    * Example: For $f(x) = 5x^2$, the derivative is $f'(x)=(5)(2)x^{2-1}=10x^1=10x$
* Special cases
  * The Derivative of $\frac{1}{x}$
    * ![Derivative - Discontiunity ](images/derivative_discontinuity.png)
    * The Derivative of $\frac{1}{x}$ has a **discontinuity** at $x=0$, as division by zero is undefined. However, we can find its derivative using the limit definition of differentiation. After working through the algebra, the derivative is found to be: 
    $$ f(x) = \frac{1}{x} $$
    $$ f'(x) = \lim_{Δx\to0} (\frac{ \frac{1}{x+Δx} - \frac{1}{x} }{Δx}) $$
    $$ = \lim_{Δx\to0} (\frac{ \frac{x}{x(x+Δx)} - \frac{x+Δx}{x(x+Δx)} }{Δx}) $$
    $$ = \lim_{Δx\to0}  (\frac{\frac{-Δx}{x(x+Δx)}}{Δx}) $$
    $$ = \lim_{Δx\to0} (\frac{-1}{x^2+xΔx}) $$
    $$ = - \frac{1}{x^2}$$
    * This derivative function is always negative, matching our visual observation that the original function's slope is always decreasing. Like the original function, the derivative is also undefined at $x = 0$.
  * The Exponential Function ($e^x$)
    * ![Derivative - Euler's Nimber ](images/derivative_eulers_number.png)
    * The exponential function, $f(x) = e^x$, has  a unique and powerful property: **its derivative is itself**. $$ \frac{d}{dx} e^x = e^x$$ 
    * This means the value of the function at any point is equal to its slope at that same point. This self-similarity is incredibly useful in calculus and other areas of mathematics. The constant $e$ (Euler's number), approximately 2.718, is fundamental to this function.
      * The "Designed" Nature of the Exponential Function $e^x$
        * The constant $e$ (Euler's number) is not a random value; it's specifically defined to satisfy a unique and powerful property in calculus. Its entire purpose is to make the derivative of the exponential function $f(x)=e^x$ equal to itself.
      * Derivation of the Derivative
        * We can see this by using the formal limit definition of the derivative for a general exponential function $f(x)=a^x$.$$ f'(x)=\lim_{Δx\to0} \frac{f(x+Δx)-f(x)}{Δx} = \lim_{Δx\to0} \frac{a^{x+Δx}-a^x}{Δx}$$
        * Using the rule of exponents ($a^{x+y} =a^xa^y$), we can factor out $a^x$: $$ f'(x)=\lim_{Δx\to0} \frac{a^x \cdot a^{Δx}-a^x}{Δx} = a^x \lim_{Δx\to0} \frac{a^{Δx}-1}{Δx} $$.
        * For most values of the base $a$, the limit part of this expression will evaluate to some constant value other than 1.
        * The number $e$ is precisely the value for the base a that makes this limit exactly 1: $$ \lim_{Δx\to0} \frac{e^{Δx}-1}{Δx} $$
        * Therefore, when we substitute $a=e$ back into our derivative expression, we get: $$ f'(x) = e^x \cdot 1 = e^x $$
      * This result shows that the rate of change of the function $e^x$ at any point is simply the value of the function itself at that point. This isn't a coincidence; it's the very reason why $e$ is so fundamental to calculus and the modeling of natural growth and decay processes.
  * Trigonometric Functions (Sine and Cosine)
    * ![Derivative - Sin and Cos ](images/derivative_sin_cos.png)
    * The trigonometric functions sine and cosine have an interesting relationship when differentiated. They follow a cyclical pattern:
      * The derivative of $\sin(x)$ is $\cos(x)$.
      * The derivative of $\cos(x)$ is $-\sin(x)$.
      * The derivative of $-\sin(x)$ is $-\cos(x)$.
      * The derivative of $-\cos(x)$ is $\sin(x)$.
    * After four differentiations, the function returns to its original form. This self-similarity is a hint that these functions are deeply related to the exponential function, although the connection is not immediately obvious.
  * Ultimately, these examples demonstrate that even with complex functions, the core concept of differentiation remains the same: finding the "rise over run" at every point on the curve.




## A3. The Product Rule of Differentiation

* The product rule is a shortcut for finding the derivative of a function that is the product of two separate functions, $A(x)=f(x)g(x)$. Instead of using the tedious limit definition, we can visualize the rule by thinking about the change in area of a rectangle with sides $f(x)$ and $g(x)$.
* ![Product Rule](images/product_rule.png)
* When we increase $x$ by a small amount $Δx$, the area of the rectangle changes. The increase in area, $ΔA$, consists of three parts:
  * A vertical strip with area $f(x)(g(x+Δx)-g(x))$.
  * A horizontal strip with area $g(x)(f(x+Δx)-f(x))$
  * A small corner rectangle with area $(f(x+Δx)-f(x))(g(x+Δx)-g(x))$ 
* So, the whole $ΔA$ will be: $$ΔA = f(x)(g(x+Δx)-g(x)) +$$ $$g(x)(f(x+Δx)-f(x)) +$$ $$(f(x+Δx)-f(x))(g(x+Δx)-g(x))$$
* As $Δx$ approaches zero, the area of the smallest corner rectangle $(f(x+Δx)-f(x))(g(x+Δx)-g(x))$ becomes negligible compared to the other two parts and can be ignored in the limit. 
  $$ \lim_{Δx\to0} (ΔA(x)) = \lim_{Δx\to0} ( f(x)(g(x+Δx)-g(x)) + g(x)(f(x+Δx)-f(x)) ) $$
  $$ = \lim_{Δx\to0} (\frac{ΔA(x)}{Δx}) = \lim_{Δx\to0} ( \frac{f(x)(g(x+Δx)-g(x)) + g(x)(f(x+Δx)-f(x))}{Δx} ) $$
  $$ = \lim_{Δx\to0} (\frac{ΔA(x)}{Δx}) = \lim_{Δx\to0} ( \frac{f(x)(g(x+Δx)-g(x))}{Δx}+ \frac{g(x)(f(x+Δx)-f(x))}{Δx} ) $$
  $$ = \lim_{Δx\to0} (\frac{ΔA(x)}{Δx}) = \lim_{Δx\to0} ( f(x)\frac{(g(x+Δx)-g(x))}{Δx}+ g(x)\frac{(f(x+Δx)-f(x))}{Δx} ) $$
  $$ = \lim_{Δx\to0} (\frac{ΔA(x)}{Δx}) = \lim_{Δx\to0} ( f(x)g'(x)+ g(x)f'(x) ) $$
  $$ = A'(x) = f(x)g'(x)+ g(x)f'(x) $$
* Based on this intuition, we can derive the formal product rule. It states that the derivative of a product of two functions, $f(x)g(x)$, is the sum of two terms: the first function times the derivative of the second, plus the second function times the derivative of the first.
  $$ \text{{Product Rule =  }}$$
  $$ \text{if } A(x) = f(x)g(x)$$
  $$ \text{then } A'(x) = f(x)g'(x)+g(x)f'(x)$$
* This rule is a powerful tool in calculus and can be added to our toolbox alongside the Sum Rule and the Power Rule. It simplifies the process of differentiating complex functions that are the product of simpler ones.
* Example: 
  * To differentiate $A(x)=xe^x\cos(x)$, we apply the product rule for three functions from the previous question.
  * Let $f(x) = x$,$g(x)=e^x$, and $h(x)=\cos(x)$.
    * $f'(x) = 1$
    * $g'(x) = e^x$
    * $h'(x) = -\sin(x)$
  * Applying the three-function product rule $A'(x) = f'(x)g(x)h(x) + f(x)g'(x)h(x) + f(x)g(x)h'(x)$:
    $$ f'(x)g(x)h(x) = 1e^x\cos(x) = e^x\cos(x)$$
    $$ f(x)g'(x)h(x) = xe^x\cos(x) $$
    $$ f(x)g(x)h'(x) = xe^x-\sin(x)$$
    $$ A'(x) = e^x\cos(x) + xe^x\cos(x) + xe^x-\sin(x)$$
    $$ A'(x) = e^x[\cos(x) + x\cos(x) - x\sin(x)]$$
    $$ A'(x) = e^x[(1+x)\cos(x) - x\sin(x)]$$

## A4. The Chain Rule of Differentiation

* The Chain Rule is the essential tool for differentiating composite functions, where one function is nested inside another (e.g., $h(p(m))$). This rule is the fourth and final tool needed to tackle complex differentiation problems.
  1. Conceptualizing the chain
    * A composite function relates an ultimate output to a final input through a chain of intermediate variables. This structure is common in science and engineering.
    * Example: Relating Happiness ($h$) to Money ($m$) via Pizza ($p$).
      * $h$ is a function of $p$.
      * $p$ is a function of $m$.
      * Goal: Find the rate of change of happiness with respect to money, $\frac{dh}{dm}$.
    * Example functions:
      * $h(p)=-\frac{1}{3}p^2+p+\frac{1}{5}$
      * $p(m)=e^m-1$
    * ![Chain rule](images/chain_rule.png)
  2. The Chain Rule formula
   * The Chain Rule provides an elegant approach by multiplying the derivatives of the successive functions. 
   * The derivative of the composite function is the product of the derivative of the outer function with respect to the intermediate variable, and the derivative of the intermediate variable with respect to the innermost variable. 
   * The formula is intuitively represented as a chain of derivative relationships:
   $$ \frac{dh}{dm}=\frac{dh}{dp}\frac{dp}{dm} $$
  3. Application example
   * Step 1: Differentiate the individual functions.
     * $h(p)=-\frac{1}{3}p^2+p+\frac{1}{5} \text{ → } \frac{dh}{dp}=1-\frac{2}{3}p$
     * $p(m)=e^m-1 \text{ → } \frac{dp}{dm} =e^m$
   * Step 2: Apply the Chain Rule and eliminate the intermediate variable.
     * $\frac{dh}{dm} = \frac{dh}{dp}\frac{dp}{dm} = (1-\frac{2}{3}p) \cdot e^m  $
     * By substituting $p = e^m - 1$ back into the expression, we ensure the final derivative is only a function of $m$.
     $$\frac{dh}{dm} = \left(1 - \frac{2}{3}(e^m - 1)\right) e^m$$
     $$\frac{dh}{dm} = \frac{1}{3}e^m (5 - 2e^m)$$
* The magic of the Chain Rule is that it works even when direct substitution is impossible, provided we know the derivatives of the individual functions within the chain.

## A5. Combination of the 4 rules (Sum, Power, Product, and Chain)

* Applying the Calculus toolbox
  * The function to be differentiated is:
  $$f(x) = \frac{\sin(2x^5 + 3x)}{e^{7x}}$$
  * The core strategy is to decompose the scary function into manageable pieces and use the Product Rule as the final step.
    1. Preparation: Rewriting as a product
      * To avoid the Quotient Rule, we rewrite the fraction as a product using a negative exponent:
        $$f(x) = \underbrace{\sin(2x^5 + 3x)}_{g(x)} \cdot \underbrace{e^{-7x}}_{h(x)}$$
      * The derivative will be found by the Product Rule: $f'(x) = g'(x)h(x) + g(x)h'(x)$.
    2. Part 1: Differentiating $g(x) = \sin(2x^5 + 3x)$
      * This is a classic Chain Rule scenario, $g(x) = g(u(x))$, where the inner function is $u(x)$.
        ![Chain rule 2](images/chain_rule_2.png)
      * Applying the Chain Rule: $\frac{dg}{dx} = \frac{dg}{du} \cdot \frac{du}{dx}$:
      $$\frac{dg}{dx} = \cos(u) \cdot (10x^4 + 3)$$
      * Substituting $u = 2x^5 + 3x$ back:
      $$g'(x) = \cos(2x^5 + 3x) (10x^4 + 3)$$
    3. Part 2: Differentiating $h(x) = e^{-7x}$
      * This is also a Chain Rule scenario, $h(x) = h(v(x))$, where the inner function is $v(x)$.
        ![Chain rule 3](images/chain_rule_3.png)
      * Applying the Chain Rule: $\frac{dh}{dx} = \frac{dh}{dv} \cdot \frac{dv}{dx}$:
      $$\frac{dh}{dx} = e^v \cdot (-7)$$
      * Substituting $v = -7x$ back:
      $$h'(x) = -7e^{-7x}$$
    4. Final step: Applying the Product Rule
      * Finally, apply the Product Rule: $f'(x) = g'(x)h(x) + g(x)h'(x)$.
      $$f'(x) = \left[\cos(2x^5 + 3x)(10x^4 + 3)\right] \cdot \left[e^{-7x}\right] + \left[\sin(2x^5 + 3x)\right] \cdot \left[-7e^{-7x}\right]$$
      * The expression can be slightly rearranged by factoring out $e^{-7x}$:
      $$f'(x) = e^{-7x} \left[ (10x^4 + 3)\cos(2x^5 + 3x) - 7\sin(2x^5 + 3x) \right]$$
  * This single example utilized all four differentiation rules: Power Rule, Sum Rule, Chain Rule, and Product Rule. As is common in coding, further algebraic simplification (optimization) is often deferred until necessary.

# B. Multivariate Calculus

## B1. Partial Differentiation

* Variables, Constants, and Parameters
  
  In multivariate calculus, understanding the role of each term in a function is crucial for differentiation.
  
  * Dependent Variable ($y$): A variable whose value depends on the values of others (e.g., speed depends on time).
  * Independent Variable ($x$): A variable that is controlled or chosen freely, and on which the dependent variable relies (e.g., time).
  * Constants: Values that are fixed in the context of the problem (e.g., $\pi$).
  * Parameters: Variables that are considered constants during a standard differentiation but are often varied by an engineer or designer to explore a family of similar functions (e.g., car mass or drag coefficient in a specific context).
  
  Key Takeaway: What is a constant or a variable is context-dependent. In calculus, you can differentiate any term with respect to any other, provided the context makes sense.

* Introduction to Partial Differentiation
  
  Partial differentiation is the method of applying the familiar rules of single-variable calculus to functions of multiple variables.

  * The core rule is:

    When differentiating a function with respect to a specific variable, treat all other variables as constants.

  * The Partial Derivative Symbol

    We use the curly symbol, $\partial$ (read as "partial"), instead of the standard $d$, to signify that the function being differentiated has more than one variable.For a function $f(x, y, z)$, the partial derivative with respect to $x$ is written as $\frac{\partial f}{\partial x}$.

* Example: Mass of a Can ($m$)
  
  The mass of a metal can is a function of its design parameters: radius ($r$), height ($h$), wall thickness ($t$), and density ($\rho$).

  ![Mass Function](images/mass_function.png)

  The mass function is:
  $$m(r, h, t, \rho) = (2\pi r^2 t \rho) + (2\pi r h t \rho)$$

  * Partial Derivative with respect to Height ($h$)
    
    When calculating $\frac{\partial m}{\partial h}$, we treat $r, t,$ and $\rho$ as constants.
    $$\frac{\partial m}{\partial h} = \frac{\partial}{\partial h} [2\pi r^2 t \rho] + \frac{\partial}{\partial h} [2\pi r h t \rho]$$
    * The first term ($2\pi r^2 t \rho$) does not contain $h$, so its derivative is 0.
    * The second term ($2\pi r t \rho$ is the constant multiplier of $h$), so its derivative is $2\pi r t \rho$.
    $$\frac{\partial m}{\partial h} = 0 + 2\pi r t \rho$$
    $$\frac{\partial m}{\partial h} = 2\pi r t \rho$$

    The result no longer contains $h$, as mass varies linearly with height (all else being equal).

  * Partial Derivative with respect to Radius ($r$)
  
    When calculating $\frac{\partial m}{\partial r}$, we treat $h, t,$ and $\rho$ as constants.
    $$\frac{\partial m}{\partial r} = \frac{\partial}{\partial r} [2\pi r^2 t \rho] + \frac{\partial}{\partial r} [2\pi r h t \rho]$$
    * For the first term, $\frac{\partial}{\partial r} [2\pi t \rho \cdot r^2] = 2\pi t \rho \cdot (2r) = 4\pi r t \rho$.
    * For the second term, $\frac{\partial}{\partial r} [2\pi h t \rho \cdot r] = 2\pi h t \rho$.
    $$\frac{\partial m}{\partial r} = 4\pi r t \rho + 2\pi h t \rho$$

  * Partial Derivative with respect to Thickness ($t$)
    $$\frac{\partial m}{\partial t} = \frac{\partial}{\partial t} [2\pi r^2 \rho \cdot t] + \frac{\partial}{\partial t} [2\pi r h \rho \cdot t]$$
    $$\frac{\partial m}{\partial t} = 2\pi r^2 \rho + 2\pi r h \rho$$

  * Partial Derivative with respect to Density ($\rho$)
    $$\frac{\partial m}{\partial \rho} = \frac{\partial}{\partial \rho} [2\pi r^2 t \cdot \rho] + \frac{\partial}{\partial \rho} [2\pi r h t \cdot \rho]$$
    $$\frac{\partial m}{\partial \rho} = 2\pi r^2 t + 2\pi r h t$$