# Root Mean Squared Error (RMSE)

The Root Mean Squared Error (RMSE) is a standard way to measure the error of a model in predicting quantitative data. It is particularly useful in contexts where we want to penalize larger errors more than smaller ones, as the squaring process disproportionately increases the impact of larger errors.

The formula for RMSE is given by:

$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$

where:
- $n$ is the number of observations in the dataset.
- $y_i$ is the actual value of an observation.
- $\hat{y}_i$ is the predicted value for the observation.
- $\sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ is the sum of the squared differences between the actual and predicted values.
- $\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ calculates the mean of these squared differences.
- $\sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$ takes the square root of this mean to return the error in the same units as the original measurements.

**Steps to Calculate RMSE:**
1. **Difference:** Calculate the difference between the predicted values and the actual values.
2. **Square:** Square each of these differences to eliminate negative values and give more weight to larger differences.
3. **Mean:** Calculate the mean (average) of these squared differences to get a sense of the overall error magnitude.
4. **Square Root:** Take the square root of this mean to bring the units back to the original scale of the data.

**Interpretation:**
- RMSE gives an estimate of the standard deviation of the residuals, which are the differences between observed and predicted values. A lower RMSE value indicates a better fit of the model to the data.
- Because of the squaring, larger errors have a disproportionately large effect on RMSE, making it particularly useful when large errors are especially undesirable.

## Example: Calculating RMSE

| Store | Actual Sales $y_i$ | Predicted Sales $\hat{y}_i$ |
|-------|--------------------|-----------------------------|
| A     | 4                  | 5                           |
| B     | 5                  | 6                           |
| C     | 6                  | 8                           |
| D     | 8                  | 7                           |
| E     | 3                  | 4                           |

Now, let's compute the RMSE with these new numbers, following the steps outlined previously.

### Step 1: Compute the Differences
Calculate the difference between the actual and predicted sales for each store.

For Store A:$4 - 5 = -1$  
For Store B:$5 - 6 = -1$  
For Store C:$6 - 8 = -2$  
For Store D:$8 - 7 = 1$  
For Store E:$3 - 4 = -1$  

### Step 2: Square the Differences
Square each of these differences.

For Store A:$(-1)^2 = 1$  
For Store B:$(-1)^2 = 1$  
For Store C:$(-2)^2 = 4$  
For Store D:$1^2 = 1$  
For Store E:$(-1)^2 = 1$  

### Step 3: Compute the Mean of Squared Differences
Calculate the mean of these squared differences.

Mean =$\frac{1 + 1 + 4 + 1 + 1}{5} = \frac{8}{5} = 1.6$

### Step 4: Take the Square Root
Take the square root of the mean to get the RMSE.

RMSE =$\sqrt{1.6}$

Let's calculate the final RMSE value.

The Root Mean Squared Error (RMSE) for the updated set of actual and predicted sales data is approximately 1.265. This means that, on average, the model's sales predictions are off by around 1,265 units from the actual sales numbers. This RMSE value provides a more nuanced measure of the model's predictive accuracy, taking into account the variability in the errors across the dataset.