## MAD and Winsorized Variance

**MAD** - Median Absolute Deviation.

If we have  values:

```
12, 45, 23, 79, 19, 92, 30, 58, 132
```

This gives median 45. Subtract median from each gives following absolute values:

```
33, 0, 22, 34, 26, 47, 15, 13, 87
```

If we order these, and takes the median, then we obtain 26.

MAD is generally divided by 0.6745 yielding final used value.

**Winsorized value** - For certain types of analyses, this is neeed. Winsorizing by 20% is related to 20% trimmming. Instead of trimming values, they are *set equal to the smallest value not trimmed*. This is another way to reduce variance in data.

Question here is - How does it influence variance compared to trimming? Should give lower, right?

Calculating winsorized variance:

```
winvar(x, tr=0.2)
```

Winsorized standard deviation:

```
winsd(x, tr=0.2)
```

Compute interquartile range based on ideal fourths:

```
idealfIQR(x)
```

MADN:

```
mad(x)
```

## Detecting outliers

Few outliers can have major impact.

Mundane reason for outlier detection: Can help identify erroneously recorded results. "Such errors seem to be rampant in applied work, and the subsequent cost of such errors can be enormous". So make sure to check for outliers, and make sure that they are valid.

Outlier technique said to suffer from *masking* if very presence of outliers causes them to be missed.

Classify as outliers if either two or three standard deviations outside.

For classic - If we have eight values where one is very extreme, it can drag up the variance in a way that hides the fact that it is an outlier!


## Boxplot rule

We can avoid masking by replacing mean and standard deviation with measures of location and dispersion that are relatively insensitive to outliers.

X declared outlier if:

```
X < q_1 - 1.5*(q_2 - q_1)
X > q_2 + 1.5*(q_2 - q_1)
```

So - we are declaring outliers on a more robust criteria.


## MAD-Median rule

Boxplot rule often suffices, as more than 25% of values must be outliers before masking becomes an issue. But this can break down when having large amount of outliers.

Here, we can use MAD-median rule.

```
abs(X - M) / MADN > 2.24
```

Explanation: Absolute difference of values and median, over the stabilized median standard deviation measure.

### R functions outms, outbox and out

Outliers using mean and standard deviation (not recommended):

```
outms(x, crit=2)
```

Check using box method:

```
outbox(x)
```

Check using MAD-median:

```
outpro(x)
```

# Skipped measures of location

Based on strategy of removing outliers and computing mean of remaining data. This was illustrated here using boxplot rule.

Removing outliers based on MAD-median rule and averaging remaining values is called modified one-step M-estimator (MOM). These have excellent statistical properties, apparently.

# Summary

* Several measures here. Sensitivity to outlier is one factor to consider when picking method.
* Sample mean highly sensitive to outliers.
* Median highly insensitive to outliers. Have some negative characteristics yet to be described.
* Trimmed mean lies between the two extremes.
* Sample variance highly sensitive to outliers, which in turn can mask actual outliers when checking for them.
* Interquartile range measures variability  without being sensitive to more extreme values. Suitable for outlier detection.
* 20% Winsorized variance also measures variation without being sensititve to extreme values - But too soon to explain practical importance...