# Outliers:

## Definition:
An outlier is any data point or record present in a dataset that is dinstinct or distant from other observations by a greatly significant amount. As a result outliers lies outside of the heavily populated data point regions.

## How to identify an outlier?
- IQR (InterQuartile Range) method -> if data falls outside of 1.5X of IQR above the third Quartile and below the first quartile.

- Z Score -> If the data point is out of the 3 std deviations.

## Outliers are present due to :
- Variabilty in the data.
- Presence due to the errors.

## Affects of Outliers
- Can change mean and std deviation with a significant amount.
- Can cause issues during the process of statistical analysis.

## Tools to use for finding outliers
- Box plot
- Scatter plot
- IQR
- Z Score


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [22]:
dataset=[11,10,12,14,12,2,14,12,15,14,13,15,102,12,14,17,19,2,14,12,107, 10,13,12,14,12,108,12,11,14,13,15,10,15,12,10,14,13,15,10]

### Using Z score
z=(x-mean)/std-dev

In [23]:
outliers = []
def detect_outliers(data):
    
    threshold=3
    mean=np.mean(data)
    std_dev=np.std(data)
    
    for observation in data:
        z_score=(observation-mean)/std_dev
        if np.abs(z_score)>threshold:
            outliers.append(observation)
            
    return outliers

In [24]:
outlier_data=detect_outliers(dataset)
outlier_data

[102, 107, 108]

### IQR -> InterQuantile Range
25% to 75% values in dataset

- Steps
    1. Arrange the data in increasing order
    2. Calculate first(q1) and third quartile(q3)
    3. Find interquartile range (q3-q1)
    4. Find lower bound q1*1.5
    5. Find upper bound q3*1.5

In [25]:
## Perform all the steps of IQR
sorted(dataset)


[2,
 2,
 10,
 10,
 10,
 10,
 10,
 11,
 11,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 13,
 13,
 13,
 13,
 14,
 14,
 14,
 14,
 14,
 14,
 14,
 14,
 15,
 15,
 15,
 15,
 15,
 17,
 19,
 102,
 107,
 108]

In [26]:
quantile1, quantile3= np.percentile(dataset,[25,75])

In [27]:
print(quantile1,quantile3)

12.0 14.25


In [28]:

## Find the IQR

iqr_value=quantile3-quantile1
print(iqr_value)

2.25


In [30]:
# Finding upper and lower bound values

lower_val=quantile1 - (1.5 * iqr_value)
upper_val=quantile3 + (1.5 * iqr_value)
print(lower_val,upper_val)

8.625 17.625
