# Unsupervised Anomaly Detection: Schools Performance 2016-17
## Objective
The focus of this notebook/project is to investigate the use of **unsupervised anomaly detection** methods on **Schools Performance 2016-17** data. We are particularly interested in spotting anomalous pupil destination data. 

## Definitions
The following terms used in this project are defined below:
+ **Unsupervised data:** When your data is not labelled. *e.g. If you have income data but you do not know if this data is correct*
+ **Anomaly data:** Unusual and typically wrong observations/data points in your data.
+ **Unsupervised anomaly detection:** Spotting anomaly observations in your data without any labelled data to tell you whether your predictions are correct.

## Data
The data for this project comes from the [Schools Performance 2016-17](https://www.compare-school-performance.service.gov.uk/). The data is here very rich and detailed. For the sake of demonstration and simplification., we will only consider a subset of this information, defined below:
+ **URN** | Unique reference number for schools
+ **LEA** | Local authority, as a code
+ **ESTAB** | Establishment number
    - *Note that combining the LA and ESTAB fields to give LAESTAB will give us another unique identifer for schools, akin to our URN field*
+ **SCHNAME** | School name
+ **NFTYPE** | School type
    - AC = Sponsored academy
    - ACC = Academy converter – mainstream
    - AC1619 = Academy 16-19 sponsor led
    - ACC1619 = Academy 16-19 converter
    - ACCS = Academy converter - special school
    - ACS = Sponsored special academy, CTC=City technology college
    - CY = Community school
    - CYS = Community special school
    - F = Free school – mainstream
    - FESI = Further Education Sector Institution
    - FD = Foundation school
    - FDS = Foundation special school
    - FS = Free school – special
    - FSS = Studio school
    - FUTC = UTC (university technical college)
    - F1619 = Free school - 16-19
    - IND = Independent school
    - INDSPEC = Independent special school
    - MODFC = College funded by Ministry of Defence
    - NMSS = Non-maintained special school
    - VA = Voluntary aided school,
    - VC = Voluntary controlled school
+ **OVERALL_DESTPER** | Percentage of pupils who have been in a sustained education or employment destination for the first two terms, October 2015 to March 2016. 
+ **PTEALGRP2** | Percentage of eligible pupils with English-as-(an)-language (EAL)
+ **PTMOBN** | Percentage of pupils classified as non-mobile
+ **PTRWM_EXP** | Percentage of pupils reaching the expected stnadard in reading, writing and maths
+ **PSENELST** | Percentage of eligible pupils with special-education-needs (SEN)
+ **PTFSM6CLA1A_16** | Percentage of KS2 disdvantaged pupils
+ **PNUMFSM** | Percentage of pupils for free-school-meals

The anomalous pupil destination data that we are interested in spotting is therefore found within the **OVERALL_DESTPER** field. Note that this information is *provisional* so it is **unlabelled**.

## Set-up
Need to set-up our Jupyter notebook so that it has the required libraries and an environment is set-up so that when sharing this noteboook, others can use the same environment as we did here.

By default, the use of `!conda install` will install the package to the environment location that we are running the kernel in. This would be the environment from which we started Jupyter notebook from, but we can check this by looking at some of the system variables from the sys module.

In [2]:
import sys
sys.executable

'C:\\Users\\a_vis\\Anaconda3\\python.exe'

In [3]:
# Use pip freeze to examine installed packages and versions within our Jupyter session
!pip freeze

alabaster==0.7.9
anaconda-clean==1.0
anaconda-client==1.6.3
anaconda-navigator==1.6.4
anaconda-project==0.6.0
argcomplete==1.0.0
astroid==1.4.7
astropy==1.2.1
Babel==2.3.4
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.5.1
bitarray==0.8.1
blaze==0.10.1
bokeh==0.12.2
boto==2.42.0
Bottleneck==1.1.0
cffi==1.7.0
chardet==3.0.4
chest==0.2.3
click==6.6
cloudpickle==0.2.1
clyent==1.2.2
colorama==0.3.7
comtypes==1.1.2
conda==4.5.0
conda-build==2.0.2
configobj==5.0.6
contextlib2==0.5.3
cryptography==1.5
cycler==0.10.0
Cython==0.24.1
cytoolz==0.8.0
dask==0.11.0
datashape==0.5.2
decorator==4.0.10
dill==0.2.5
docutils==0.12
dynd===c328ab7
et-xmlfile==1.0.1
fastcache==1.0.2
filelock==2.0.6
Flask==0.11.1
Flask-Cors==2.1.2
gevent==1.1.2
greenlet==0.4.10
h5py==2.7.1
HeapDict==1.0.0
idna==2.1
imagesize==0.7.1
ipykernel==4.5.0
ipython==5.1.0
ipython-genutils==0.1.0
ipywidgets==5.2.2
itsdangerous==0.24
jdcal==1.2
jedi==0.9.0
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.4.0


You are using pip version 8.1.2, however version 9.0.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [5]:
# Install libraries not already above
!conda install seaborn -y

Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\a_vis\Anaconda3

  added / updated specs: 
    - seaborn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    seaborn-0.8.1              |   py35hc73483e_0         338 KB
    openssl-1.0.2o             |       h8ea7d77_0         5.4 MB
    ------------------------------------------------------------
                                           Total:         5.7 MB

The following NEW packages will be INSTALLED:

    seaborn:         0.8.1-py35hc73483e_0            

The following packages will be UPDATED:

    ca-certificates: 2018.1.18-0          conda-forge --> 2018.03.07-0     
    openssl:         1.0.2n-vc14_0        conda-forge [vc14] --> 1.0.2o-h8ea7d77_0


Downloading and Extracting Packages
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transact


seaborn 0.8.1:            |   0% 
seaborn 0.8.1: 3          |   4% 
seaborn 0.8.1: #######5   |  75% 
seaborn 0.8.1: #########3 |  93% 
seaborn 0.8.1: ########## | 100% 

openssl 1.0.2o:            |   0% 
openssl 1.0.2o: 1          |   2% 
openssl 1.0.2o: 6          |   6% 
openssl 1.0.2o: #1         |  11% 
openssl 1.0.2o: #7         |  18% 
openssl 1.0.2o: ##2        |  23% 
openssl 1.0.2o: ##7        |  28% 
openssl 1.0.2o: ###3       |  34% 
openssl 1.0.2o: ###8       |  39% 
openssl 1.0.2o: ####4      |  45% 
openssl 1.0.2o: #####      |  50% 
openssl 1.0.2o: #####7     |  57% 
openssl 1.0.2o: ######3    |  63% 
openssl 1.0.2o: ######9    |  70% 
openssl 1.0.2o: #######5   |  76% 
openssl 1.0.2o: ########   |  80% 
openssl 1.0.2o: ########4  |  84% 
openssl 1.0.2o: ########8  |  88% 
openssl 1.0.2o: #########1 |  92% 
openssl 1.0.2o: #########4 |  95% 
openssl 1.0.2o: #########7 |  98% 
openssl 1.0.2o: ########## | 100% 
