# Outlier Detection Using Linear Regression in Tableau

**Objective:** Identify outliers in your product data by using linear regression to model the relationship between volume and mass.

## Steps:

### 1. Calculate Slope and Intercept:

The Slope (m) and Intercept (b) are derived from the least squares method to minimize the sum of squared differences between observed values and predicted values.

#### Slope:

```tableau
{fixed [category]: 
(COUNT([Volume]) * SUM([Weight] * [Volume]) - SUM([Volume]) * SUM([Weight])) /
(COUNT([Volume]) * SUM([Weight] * [Weight]) - SUM([Weight]) * SUM([Weight]))
}
```

**Mathematical Principle:** This formula is derived from the ordinary least squares (OLS) method where `𝑚 = (n∑xy - ∑x∑y) / (n∑x² - (∑x)²)`.

#### Intercept:

```tableau
{fixed [category]: 
(SUM([Volume]) * SUM([Weight]) - COUNT([Volume]) * SUM([Weight] * [Volume])) /
(COUNT([Volume]) * SUM([Weight] * [Weight]) - SUM([Weight]) * SUM([Weight]))
}
```

**Mathematical Principle:** This formula calculates `𝑏 = (∑y - m∑x) / n` where `𝑏` is the intercept in the OLS method.

### 2. Calculate Distance from Linear Regression:

Distance is the perpendicular distance from a data point to the regression line.

#### Distance:

```tableau
abs([Slope] * [Weight] - [Volume] + [Intercept]) /
SQRT([Slope] * [Slope] + 1)
```

**Mathematical Principle:** This measures the perpendicular distance from a point to the regression line, where `|𝑚𝑥 - 𝑦 + 𝑏| / √(𝑚² + 1)` represents the distance from a point `(𝑥, 𝑦)` to the line `𝑦 = 𝑚𝑥 + 𝑏`.

### 3. Determine Outliers:

Outliers are determined by comparing the distance to a sensitivity threshold.

#### Criteria:

```tableau
[Distance from linear regression] * [Distance from linear regression] >
[Sensitivity] * {fixed [category]: AVG([Distance from linear regression] * [Distance from linear regression])}
```

**Mathematical Principle:** The squared distance is compared to a sensitivity threshold. Outliers are identified if the squared distance exceeds a multiple of the average squared distance, helping account for variability in the dataset.

#### Explanation:

Slope and Intercept are estimated using the least squares method, minimizing the sum of squared residuals. Distance measures how far a data point deviates from the predicted value, following the principles of linear regression. The Outlier Criterion identifies significant deviations from the regression model by comparing the squared distance to a scaled average distance. This approach helps in flagging data points that do not conform to the expected relationship between volume and weight.