<a href="https://www.kaggle.com/code/kyeongsupchoi/local-outlier-factor?scriptVersionId=132582404" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<a id="introduction"></a>
* [Introduction](#introduction)
* [Step 1: Exploratory Data Analysis](#step-one)
* [Step 2: Model Training](#step-two)
* [Step 3: Model Validation](#step-three)

The Local Outlier Factor (LOF) algorithm is an unsupervised anomaly detection method which computes the local density deviation of a given data point with respect to its neighbors. It considers as outliers the samples that have a substantially lower density than their neighbors.

In [1]:
import pandas as pd
from sklearn.neighbors import LocalOutlierFactor
import sys
import numpy

<a id="step-one"></a>
# **Step 1: Exploratory Data Analysis** 

Read data from csv and load into dataframe

In [2]:
# Load dataframe from csv
df = pd.read_csv("/kaggle/input/insurance/insurance.csv")

# Pick out variables in int form
X = df[['age', 'bmi', 'children', 'charges']]

#test

# Show X
X

Unnamed: 0,age,bmi,children,charges
0,19,27.900,0,16884.92400
1,18,33.770,1,1725.55230
2,28,33.000,3,4449.46200
3,33,22.705,0,21984.47061
4,32,28.880,0,3866.85520
...,...,...,...,...
1333,50,30.970,3,10600.54830
1334,18,31.920,0,2205.98080
1335,18,36.850,0,1629.83350
1336,21,25.800,0,2007.94500


<a id="step-two"></a>
# **Step 2: Model Training**  

Train the initial model

In [3]:
# Initialize Model
clf = LocalOutlierFactor(n_neighbors=20, contamination=0.1)

# Train model
y_pred = clf.fit_predict(X)

# Load into dataframe
y = pd.DataFrame(y_pred, columns=['outlier'])

# Show Y
y

Unnamed: 0,outlier
0,1
1,1
2,1
3,1
4,1
...,...
1333,1
1334,1
1335,1
1336,1


In [4]:
# Concatenate X and y to make z
z = pd.concat([X, y], axis = 1)

# **Step 3: Model Validation**

In [5]:
# Show only the outliers
z.loc[z['outlier'] == -1] 

Unnamed: 0,age,bmi,children,charges,outlier
12,23,34.400,0,1826.84300,-1
15,19,24.600,1,1837.23700,-1
22,18,34.100,0,1137.01100,-1
34,28,36.400,1,51194.55914,-1
37,26,20.800,0,2302.30000,-1
...,...,...,...,...,...
1292,21,23.210,0,1515.34490,-1
1300,45,30.360,0,62592.87309,-1
1317,18,53.130,0,1163.46270,-1
1324,31,25.935,1,4239.89265,-1
