### The following notebook is a tutorial on machine learning to detect change using geospatial_learn
-------------------------------------------------------------------------------------------------------

Documentation on the lib can be found here:

http://geospatial-learn.readthedocs.io/en/latest/

Please use QGIS to visualise results as this is quicker than plotting them in the notebook.

Two Sentinel 2 subsets have been provided along with a pre-made model for detecting change. It is possible you could create your own model - the code is supplied to do so, but this would involve a bit of processing time!

The change detection method used here **classifies the change direcly** rather than differencing two maps in order nto avoid the inevitable error-prop that occurs with the former. The training data was collected over 1.5 yrs worth of S2 data over some areas in Kenya.

The data is available here:

https://drive.google.com/file/d/1LyHZsWkELtVD8F4Y3-1tVvRvYPmGQ-Ev/view?usp=sharing

The data consists of an image from 2015 and one from 2016, which will be used to detect change as well as some QGIS style files and a Random Forest model (Ch_MYE_cv5_rf.gz).

The 2015 'before" image

<img src="figures/S2Bfr.png" style="height:400px">

....and the 2016 "after" image

<img src="figures/S2Aft.png" style="height:400px">

The class of principle interest is DF (De-Forest), which is really just forest clearance, but other change classes are included. 

The classes are:

- DF (De-Forest - really just clearance)
- SF (Stable Forest)
- SNFV (Stable Non-Forest Vegetation)
- SNF (Stable Non-Forest eg - inorganic/impervious)
- Water
- NFV (Non-Forest-Veg) loss 
- NFV (as above) regrowth

There ia a QML for both raster & vector

In [None]:
%matplotlib inline

**Before we begin!**

In jupyter, to see the docstring, which explains any function (provided someone has written it!) enter the function as you normaly would, but put a question mark at the end and press shift and enter:
```python
raster.stack_ras?
```
A scrollable text will appear with an explanation

In [None]:
import matplotlib.pyplot as plt
from geospatial_learn import raster, learning, shape

In [None]:
cd S2_change

change directory to where you have saved the files

### Paths to the 2 images and model - please alter as appropriate in your own dir

In [None]:
im1 = ('S2_mau_clip_dec2015.tif')

im2 = ('S2_mau_clip_dec2016.tif')

rfModel = 'Ch_MYE_cv5_rf.gz'

### First thing to do is stack our 'before' and 'after' images

In [None]:
stkRas = 'S2_ch_stk.tif'
       
raster.stack_ras([im1,im2], stkRas)

### Next classify the temporal S2 stack

In [None]:
outMap = 'S2_ch_map'

### A note on model creation with k-fold cross validated grid search


**If you wish to create your own model with training samples train the model with the above data.** 

**Please note this will take time with a large training set**

We first define the parameters we wish to grid search over. The parameters below are just an example, It is of course possible for these to be more numerous at the cost of processing time. The time is a function of the number of possibilities per parameter. There are defaults in geospatial-learn, but it is recommended you define your own.

```python

    params = {'n_estimators': [500], 'max_features': ['sqrt', 'log2'], 
              'min_samples_split':[5,10,20,50], 'min_samples_leaf': [5,10,20,50]}
```          
When we execute the create_model function we get a summary of the no of model fits

'Fitting 5 folds for each of 18 candidates, totalling 90 fits'

I have fixed the n_estimators (trees) at 500 below but this could be varied also.

For a full list of params see:

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

We also need a model path to save to:

```python
outModel = 'path/to/mymodel.gz'
```

Then finally run the model calibration:

```python
learning.create_model(trainPix, outPixmodel, clf='rf', cv=3, params=params)
```

### Lastly, polygonise the thematic raster for visualisation in QGIS

There is a style file available for this in the zip called 'Change_style.qml'.

For those not familiar with python, the line below uses some string concatenation out of lazyness for renaming files. 

In [None]:
raster.polygonize(outMap+'.tif', outMap+'.shp')

### Check the results in QGIS

Using the qml file the resulting polygon will look like this:

<img src="figures/S2Chg.png" style="height:400px">

The key being....

<img src="figures/S2Key.png" style="height:100px">

**As well as a thematic map, we can produce a multiband map of class probabilities with the following function**

```python 
learning.prob_pixel_bloc(rfModel, stkRas, 8, probMap, 8, blocksize=256)
```
The input variables are the same as the classify function except we also input the number of classes (7 in this case)

This will output a multi-band raster where each band is a probability of a certain class. This will take a while to process.

If you wish to plot feature importances run the cell below

In [None]:
learning.plot_feature_importances(rfModel, ['b','g', 'r', 'nir','b1','g1', 'r1', 'nir1'])