# Create an annual cropmask 

## Description

Using the model output fom `2_Train_fit_evaluate_classifier`t, this notebook will make predictions on new data to generate a cropland mask for the area defined by a geojson. Results are saved to disk as Cloud-Optimised-Geotiffs.

1. Open and inspect the shapefile which delineates the extent we're classifying
2. Import the model
3. Make predictions on new data loaded through the ODC.  The pixel classification will also undergo a post-processing step where steep slopes and water are masked using a SRTM derivative and WOfS, respectively. Pixels labelled as crop above 3600 metres ASL are also masked. 

***
## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load Packages

In [1]:
import warnings

import os
import datacube
import numpy as np
import xarray as xr
import geopandas as gpd
from joblib import load
from tqdm.auto import tqdm
from datacube.utils import geometry
from datacube.utils.cog import write_cog
from datacube.utils.geometry import assign_crs
from datacube.testutils.io import rio_slurp_xarray

from deafrica_tools.classification import HiddenPrints, predict_xr
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster

#import out feature layer function for prediction
from feature_layer_functions import gm_mads_two_seasons_prediction
from post_processing import post_processing

In [2]:
warnings.filterwarnings("ignore")

### Set up dask cluster

In [3]:
create_local_dask_cluster()

0,1
Client  Scheduler: tcp://127.0.0.1:38889  Dashboard: /user/chad/proxy/8787/status,Cluster  Workers: 1  Cores: 15  Memory: 104.37 GB


## Analysis parameters

* `aoi`: A path to a geojson define the extent of the analysis area. Make sure each polygon in the geojson has a column with some kind of unique ID
* `output_suffix`: Use this as folder and file name for the classifications that are output
* `column`: The column header name in the `aoi` geojson/shapefile that contains the unique ID for the polygons in the file 
* `year`: What year to run the crop mask for.
* `resolution`: What pixel resolution to run the classifciaiton at `(-10,10)` is the max resolution. Use `(-20,20)` to have the code run faster and get quick results.
* `results`: A folder location to store the classified geotiffs
* `model_path`: The path to the location where the model exported from the previous notebook is stored
* `training_data`: Name and location of the training data `.txt` file (this has been copied into the data folder)

In [4]:
aoi = 'data/Area_of_interest.geojson'

output_suffix = 'aoi'

column = 'Id'

year = '2021'

resolution = (-20,20)

results = 'results/classifications/'

model_path = 'results/southern_ml_model_20220225_2021.joblib'

training_data = "results/southern_training_data_20220225_2021.txt"

### Open and inspect AOI

In [5]:
gdf = gpd.read_file(aoi)

In [6]:
gdf.explore(column=column)

## Open the model


In [7]:
model = load(model_path).set_params(n_jobs=1)

In [8]:
# load the data
model_input = np.loadtxt(training_data)

# load the column_names
with open(training_data, 'r') as file:
    header = file.readline()
    
column_names = header.split()[1:][1:]

## Making a prediction


### Loop through tiles and predict, with post-processing

For every tile we list in the `aoi`, we calculate the feature layers, and then use the DE Africa function `predict_xr` to classify the data.

The `feature_layer_functions.gm_mads_two_seasons_prediction` function is doing most of the heavy-lifting here.

The results are exported to file as Cloud-Optimised Geotiffs.

In [9]:
#make a folder to store results in if one doesn't already exist
if not os.path.exists(results+output_suffix):
        os.mkdir(results+output_suffix)

Handle dask garbage collections warnings

In [10]:
import gc
g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*2, g1*2, g2*2)

### Loop through polygons and predict the crop mask

In [11]:
%%time
i=1
for index, row in gdf.iterrows():
    
    print('working on polygon: '+str(i)+"/"+str(len(gdf)),end='\r')
    
    # Get the geometry
    geom = geometry.Geometry(row.geometry.__geo_interface__,
                             geometry.CRS(f'EPSG:{gdf.crs.to_epsg()}'))
    
    #create query 
    query = {
        'geopolygon': geom,
        'time': year,
        'resolution': resolution,
        'output_crs': 'epsg:6933',
        'dask_chunks' : {'x':5000, 'y':5000},
    }
    
    # define features from query
    data = gm_mads_two_seasons_prediction(query)

    #generate prediction
    with HiddenPrints():
        predicted = predict_xr(model,data,clean=True).compute()

    #-------Post-processsing ------------------------------       
    predict = post_processing(predicted)
    
    #create mask from polygon
    mask = xr_rasterize(gdf.iloc[[index]], data)
    predict = predict.where(mask)
    
    #----export classifications to disk-----------------------
    col_id = str(gdf.iloc[[index]][column].values[0])
    write_cog(predict,
              results+output_suffix+'/Southern_cropmask_'+output_suffix+"_"+year+"_"+col_id+'.tif',
              overwrite=True)
    i+=1

CPU times: user 6min 25s, sys: 56.8 s, total: 7min 22s
Wall time: 22min 46s


***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

