# PARTIAL DERIVATIVES

To illustrate the computation of a partial derivative of a cost function with actual data in a linear regression context, we'll use the Mean Squared Error (MSE) cost function and compute its partial derivative with respect to one of the model's coefficients. (parameters) This process is a key part of gradient descent optimization.

### Mean Squared Error (MSE) Cost Function:

For a linear regression model with two features $(x_1)$ and $(x_2)$, the MSE cost function is given by:

$J(\theta) = \frac{1}{2n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2$

Where:
- $n$ is the number of observations.
- $\hat{y}_i$ is the predicted value for the $i$-th observation, calculated as $\hat{y}_i = \theta_0 + \theta_1 x_{i1} + \theta_2 x_{i2}$.
- $y_i$ is the actual value for the $i$-th observation.

### Partial Derivative of $J(\theta)$ with Respect to $\theta_1$:

The partial derivative of $J(\theta)$ with respect to $\theta_1$ (the coefficient of $x_1$) is:

$\frac{\partial J(\theta)}{\partial \theta_1} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) \cdot x_{i1}$

In general:
$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) \cdot x_{ij} $

### Example Dataset:

Let's assume we have the following dataset:

| $x_1$     | $x_2$     | $y$     |
|-----------|-----------|---------|
| 1         | 2         | 3       |
| 4         | 5         | 6       |

And the model parameters are:
- $\theta_0 = 0.5$
- $\theta_1 = 1$
- $\theta_2 = 0.5$





For $\theta_0$, the feature $x_{i0}$ is considered to be 1 for all $i$ (to account for the intercept term).

### Given Dataset and Parameters:

- Feature values: $x_1 = [1, 4]$, $x_2 = [2, 5]$
- Actual target values: $y = [3, 6]$
- Model parameters: $\theta_0 = 0.5, \theta_1 = 1, \theta_2 = 0.5$
- Number of observations $n = 2$

### Computing Predicted Values $\hat{y}_i$:

$\hat{y}_i = \theta_0 + \theta_1 x_{i1} + \theta_2 x_{i2} $

### Computing Partial Derivatives:

1. **For $\theta_0$ (Intercept Term):**

$\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) \cdot 1$

2. **For $\theta_1$ (Coefficient of $x_1 $):**

$\frac{\partial J(\theta)}{\partial \theta_1} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) \cdot x_{i1} $

3. **For $\theta_2$ (Coefficient of $x_2 $):**

$\frac{\partial J(\theta)}{\partial \theta_2} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) \cdot x_{i2} $

Let's perform these computations using the provided data.

The computed partial derivatives of the Mean Squared Error (MSE) cost function with respect to all parameters ($\theta_0, \theta_1, \theta_2 $) for the given dataset and model parameters are as follows:

- Partial derivative with respect to $\theta_0$ (intercept term): $0.25$
- Partial derivative with respect to $\theta_1$ (coefficient of $x_1$): $1.75$
- Partial derivative with respect to $\theta_2$ (coefficient of $x_2$): $2.0$

These values represent the gradients of the cost function with respect to each parameter and indicate how the cost function $J(\theta)$ changes as each parameter is varied. In gradient descent optimization, these gradients would be used to iteratively update each parameter in the direction that minimizes the cost function.

The symbol $\frac{\partial J(\theta)}{\partial \theta_0}$ represents the partial derivative of the cost function $J(\theta)$ with respect to the parameter $\theta_0$, often referred to as the intercept term in the context of linear regression models. This mathematical expression is used to quantify how the cost function changes as the value of $\theta_0$ changes, while keeping all other model parameters constant.

### Symbols Meaning:

- $\frac{\partial}{\partial \theta_0}$: This denotes the partial derivative with respect to $\theta_0$. The partial derivative measures the rate of change of the function with respect to one of its variables, with all other variables held constant.
- $J(\theta)$: This is the cost function, which quantifies the error or difference between the predicted values ($\hat{y}$) by the model and the actual target values ($y$) in the dataset. The cost function is what we aim to minimize in a machine learning model.
- $\theta$: This represents the vector of all parameters in the model, including $\theta_0$ (the intercept) and other coefficients associated with the features ($\theta_1, \theta_2, \ldots$).
- $\theta_0$: This specific parameter is the intercept term of the linear regression model, which is the value of the predicted output ($\hat{y}$) when all the feature inputs ($x$) are equal to zero.

### Practical Example:

Consider a simple linear regression model where we are trying to predict house prices ($y $) based on the size of the house ($x$). The model can be represented as:

$\hat{y} = \theta_0 + \theta_1 x$

Let's say our cost function $J(\theta)$ is the Mean Squared Error (MSE):

$J(\theta) = \frac{1}{2n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2$

Where:
- $n$ is the number of houses in the dataset.
- $\hat{y}_i$ is the predicted price for the $i$-th house.
- $y_i$ is the actual price for the $i$-th house.

The partial derivative of $J(\theta)$ with respect to $\theta_0$ is:

$\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)$

This expression tells us how the cost function $J(\theta)$ changes as the intercept term $\theta_0$ changes, which is crucial for adjusting $\theta_0$ during the training process (e.g., using Gradient Descent) to minimize the cost function and improve the model's accuracy.

The symbol $\partial$ is used to denote a partial derivative. In calculus, a partial derivative represents the rate at which a function changes with respect to one of its variables, while keeping all other variables constant. This concept is particularly important in the context of functions with multiple variables, where you might be interested in understanding how the function changes in relation to each variable independently.

### Key Points about Partial Derivatives:

- **Multivariable Functions**: Partial derivatives are used in functions of more than one variable. For example, in a function $f(x, y)$, $\partial f/\partial x$ represents the rate of change of $f$ with respect to $x$, holding $y$ constant.

- **Notation**: The $\partial$ symbol is used to distinguish partial derivatives from ordinary derivatives (denoted by $d$). While $d$ is used for functions of a single variable, $\partial$ is used for functions of multiple variables.

- **Interpretation**: In the context of machine learning models, such as linear regression, the partial derivative with respect to a parameter (e.g., $\partial J/\partial \theta_0$) tells us how the cost function $J$ changes as the parameter $\theta_0$ is varied slightly, with all other parameters held fixed. This information is crucial for optimization algorithms like gradient descent, which adjust model parameters iteratively to minimize the cost function.

- **Vector Calculus**: In vector calculus, the gradient of a scalar-valued multivariable function is a vector of its first partial derivatives. In machine learning, the gradient vector is used to perform gradient descent optimization, where each component of the gradient vector is a partial derivative with respect to one of the function's parameters.

### Practical Example:

Consider a function $f(x, y) = x^2 + y^2$. The partial derivative of $f$ with respect to $x$ is $\frac{\partial f}{\partial x} = 2x$, and the partial derivative with respect to $y$ is $\frac{\partial f}{\partial y} = 2y$. These derivatives tell us how $f$ changes as $x$ or $y$ changes individually, providing insight into the function's behavior along each axis in its domain.