# Demo of the Median Algorithm
Written by: Lukas Gust

This algorithm aims to detect outliers/anomalies in spatial data. The algorithm is not of my own design, but the implementation is. I have cited the source below. 

This is just a demo and the implementation needs tweaking as well as the methodology of running it. This demo only considers what we call a <u>single attribute</u>. There is a method defined in [Chen](#chen) for multiple attributes.

This implementation is currently only working for total accumulated precipitation. Let's begin.

## The data
As before we are going to look at the Dugway station cluster because it has some easy to find outliers by just using your eyes. We use the API to get the accumulated precipitation from March 1st 2018 at time 00:00 to October 14th 2018 at time 00:00. We get the data from all stations within a 20 mile radius, the desert is vast and rural after all. Here is the code that does this using some helpful tools that I've written.

In [1]:
import sys
sys.path.append('..')
from MesoPy import Meso
from MesoTools.MesoDataframes import precip_dataframe, precip_meso_knn
from median_algorithm.PrecipMSAD import PrecipMSAD

In [2]:
m = Meso(token='demotoken')
df = precip_dataframe(m, '201803010000', '201810140000', pmode='totals', radius='dpg03,50',
                      timeformat='%s')
df

Unnamed: 0_level_0,ACCUM_226_DAYS[mm]
STID,Unnamed: 1_level_1
DPG01,107.95
DPG02,88.39
DPG04,0.0
DPG05,151.38
DPG06,114.05
DPG07,113.28
DPG08,113.03
DPG09,114.05
DPG10,99.57
DPG11,63.25


As you can see there are a few obvious outliers. DPG04, DPG23, DPG30, and DPG31. All have a mixture of very low amounts of precip and high. Now we will run the detection algorithm to find out which of these are spatial outliers.

## The Detecting

In [3]:
d = PrecipMSAD()
d.fit(m, k=8, start='201803010000', end='201810140000', pmode='totals', radius='dpg03,50',
      timeformat='%s')

d.detect()

Unnamed: 0_level_0,ACCUM_226_DAYS[mm]
STID,Unnamed: 1_level_1
DPG04,0.0
DPG13,0.0
DPG20,0.25
MFKU1,589.28
DPG23,28.7
DPG30,49.53
DPG31,463.55
TAKU1,330.2


### Sources <a href='#chen'><a>
Chen, D., Lu, CT., Kou, Y. et al. Geoinformatica (2008) 12: 455. 9
[https://doi.org/10.1007/s10707-007-0038-8](https://doi.org/10.1007/s10707-007-0038-8)