# Handling Unexpected Values
**Outliers are values outside the expected range of a feature.
Common causes of outliers include:
· Human errors during data entry; Data Preparation (Preprocessing)
· Measurement (instrument) errors;
· Experimental errors during data extraction or
manipulation and
· Intentional errors to test the accuracy of outlier
detection methods.**

*During the data preparation and analysis, we have to detect
the presence of unexpected values within a data structure.*

In [2]:
#Data Set Example
import pandas as pd
student_frame = pd.DataFrame({'Student Name':
['A','B','C', 'D','E','F','G'], 'Sex':['M','F','M','F','F'
,'M','M'], 'Age': [10, 14, 60, 15, 16, 15, 11], 'School':
['Primary','High', 'High', 'High', 'High','High',
'Primary']}) 
student_frame

Unnamed: 0,Student Name,Sex,Age,School
0,A,M,10,Primary
1,B,F,14,High
2,C,M,60,High
3,D,F,15,High
4,E,F,16,High
5,F,M,15,High
6,G,M,11,Primary


We find the age of student C to be an expected value, i.e., 60 years. 
This is an error , and is considered as an outlier. 
We uses the #`describe() `function to  get the summary statistics of the table data

In [3]:
student_frame.describe()

Unnamed: 0,Age
count,7.0
mean,20.142857
std,17.71467
min,10.0
25%,12.5
50%,15.0
75%,15.5
max,60.0


The presence of outliers shifts the statistics. We expect the
mean or the average age of students to be around 15 years.\
However, the average age of students is 20.142857 due to the
presence of the outlier, i.e., 60 years. 

Sort your data from low to high.\
Identify the first quartile (Q1), the median, and the third quartile (Q3).\
Calculate your IQR = Q3 – Q1.\
Calculate your upper fence = Q3 + (1.5 * IQR)\
Calculate your lower fence = Q1 – (1.5 * IQR)\
IQR = 15.5 -12.5 =3\
60 is > 15.5 +(1.5 * 3)

In [7]:
Q1 = student_frame.quantile(0.25)   ## 25% (quartile 1)
Q3 = student_frame.quantile(0.75)   ## 75% (quartile 3)


IQR = Q3 - Q1
IQR_mult = IQR * 1.5

lower_bound = Q1 - IQR_mult
higher_bound = Q3 + IQR_mult
print(higher_bound)


print(f" The lowere limit is  {lower_bound}\n The higher limit is  { higher_bound}")

Age    20.0
dtype: float64
 The lowere limit is  Age    8.0
dtype: float64
 The higher limit is  Age    20.0
dtype: float64


**Now any value lesser than 8 or greater than 20 can be treated
as an outlier.**

Now, we are able to filter our DataFrame, student_frame, to 
remove outliers. We access the column Age using 
student_frame[‘Age’], and compare it with int(lower).

In [22]:
student_frame_lower = student_frame[student_frame["Age"] > int(lower_bound) ]

student_frame_higher = student_frame[ student_frame["Age"] < int(higher_bound) ]

student = student_frame_lower.combine_first(student_frame_higher, index=(list(range(20))))


TypeError: list indices must be integers or slices, not str