### New Analysis method ###

*Create a classifier for different kinds of plankton using supervised machine learning* 

Executing this Notebook requires a personal STOQS database. Follow the [steps to build your own development system](https://github.com/stoqs/stoqs/blob/master/README.md) &mdash; this will take about an hour and depends on a good connection to the Internet.  Once your server is up log into it (after a `cd ~/Vagrants/stoqsvm`) and activate your virtual environment with the usual commands:

    vagrant ssh -- -X
    cd /vagrant/dev/stoqsgit
    source venv-stoqs/bin/activate
    
Then load the `stoqs_september2013` database with the commands:

    cd stoqs
    ln -s mbari_campaigns.py campaigns.py
    export DATABASE_URL=postgis://stoqsadm:CHANGEME@127.0.0.1:5432/stoqs
    loaders/load.py --db stoqs_september2013
    loaders/load.py --db stoqs_september2013 --updateprovenance
   
Loading this database can take over a day as there are over 40 million measurments from 22 different platforms. You may want to edit the `stoqs/loaders/CANON/loadCANON_september2013.py` file and comment all but the `loadDorado()` method calls at the end of the file. You can also set a stride value or use the `--test` option to create a `stoqs_september2013_t` database, in which case you'll need to set the STOQS_CAMPAIGNS envrironment variable: 

    export STOQS_CAMPAIGNS=stoqs_september2013_t

Use the `stoqs/contrib/analysis/classify.py` script to create some labeled data that we will learn from:

    contrib/analysis/classify.py --createLabels --groupName Plankton \
        --database stoqs_september2013 --platform dorado \
        --start 20130916T124035 --end 20130919T233905 \
        --inputs bbp700 fl700_uncorr --discriminator salinity \
        --labels diatom dino1 dino2 sediment \
        --mins 33.33 33.65 33.70 33.75 --maxes 33.65 33.70 33.75 33.93 --clobber -v

A little explanation is probably warranted here. The Dorado missions on 16-19 September 2013 sampled distinct water types in Monterey Bay that are easily identified by ranges of salinity. These water types contain different kinds of particles as identified by bbp700 (backscatter) and fl700_uncorr (chlorophyll). The previous command "labeled" MeasuredParameters in the database according to our understanding of the optical properties of diatoms, dinoflagellates, and sediment. This works for this data set because of the particular oceanographic conditions at the time.

This Notebook demonstrates creating a classification algortithm from these labeled data and addresses [Issue 227 on GitHub](https://github.com/stoqs/stoqs/issues/227). To be able to execute the cells and experiment with different algortithms and parameters launch Jupyter Notebook with:

    cd contrib/notebooks
    ../../manage.py shell_plus --notebook
    
navigate to this file and open it. You will then be able to execute the cells and experiment with different settings and code.

---

In [1]:
mps = (MeasuredParameter.objects.using('stoqs_september2013_o')
           .filter(measurement__instantpoint__activity__platform__name='dorado')
           .filter(measuredparameterresource__resource__value='diatom')
      )

In [2]:
import pandas as pd
df = pd.DataFrame.from_records(mps.values(
     'measurement__instantpoint__timevalue', 'measurement__depth',
     'measurement__geom', 'parameter__name', 'datavalue', 'id'
     ))

In [3]:
df.head()

Unnamed: 0,datavalue,id,measurement__depth,measurement__geom,measurement__instantpoint__timevalue,parameter__name
0,0.000324,5874086,18.334985,"[-121.88168500810833, 36.87137007677367]",2013-09-19 20:32:46,fl700_uncorr
1,0.00468,5868369,18.334985,"[-121.88168500810833, 36.87137007677367]",2013-09-19 20:32:46,bbp700
2,0.000453,5874085,17.373692,"[-121.8817067640539, 36.871354011630885]",2013-09-19 20:32:44,fl700_uncorr
3,0.004756,5868368,17.373692,"[-121.8817067640539, 36.871354011630885]",2013-09-19 20:32:44,bbp700
4,0.000413,5874084,15.384219,"[-121.88174902541665, 36.871324504245074]",2013-09-19 20:32:40,fl700_uncorr
