# &#x1F4F7; Train an AI model with images associated to a GPS point and use it to make maps

In this tutorial, we will learn how to automatically produce maps with images associated to a GPS location.

## &#x1F3AC; Scenario

Imagine you are a research scientist and you have collected images associated to a GPS point. You have annotated some of the images with your classes of interest (plants, clouds, marine species) thanks to an annotation tool (CVAT, Labelbox...). At the end on the annotation process, you have a COCO file storing your annotations. As part of your project, you know that artificial intelligence is the key to annotate more images and to automatically produce georeferenced anntoations. Seeing the colossal work you have yet to accomplish, you sweat drops (&#x1F613;). Fortunately, in a useful breath, you discover artus (&#x1F631;)! But unfortunately you have no knowledge of artificial intelligence. That's good because it's not necessary, this tutorial will guide you to your holy grail! &#x1F60E;

## Input data needed

As explained previously, you must have images associated to GPS point. To train a supervised deep learning model, you also need annotation. What is expected here is a COCO file storing your annotations. 

To spatialize the predictions of the deep learning, you also need a csv storing the image location (3 columns : "filename", "latitude", "longitude")

## &#x0031; Data splitting

Data splitting in an important step when training a deep learning model. We will split the COCO file created into 3 coco files : a train dataset, a validation dataset and a test dataset to further evaluate the model.

In [None]:
import artus.prepare.coco_splitting as tusplit
import artus.evaluate_model.coco_stats as tustats

In [None]:
coco_path = '/path/to/coco/export/directory/coco_annotations.json'

If you have under represented classes in your annotations, you can set a minimal number of occurrences (min_nb_occurences) to remove classes that do not reach the threshold.

In [None]:
min_nb_occurences=50

You can also export some statistics on the classes distribution before the training process. This is optional but useful to get an idea of the annotations composition.

In [None]:
stats = tustats.COCOStats(coco_path, min_nb_occurences)
stats.get_class_stats()
stats.export_stats(export_path='/path/to/export/stats.csv') #optionnal : export the classes distribution in csv format

In [None]:
splitter = tusplit.COCOSplitter(
    coco_path=coco_path,
    export_dir='/path/to/coco/export/directory/',
    coco_train_name='coco_train.json',
    coco_test_name='coco_test.json',
    coco_val_name='coco_val.json',
    min_nb_occurrences=min_nb_occurences,
    train_pct=.8,
    val_pct=.1,
    test_pct=.1,
    batch_size=8
)

splitter.split_coco()

## &#x0032; Train a deep learning model
### &#x0032;. &#x0031;. Configure config file and train

To configure the deep learning model that you will train, you must write a model configuration file. Examples are available in the ../models_config/ folder.

In [None]:
import artus.train.train as tustrain


In [None]:
config_path = '../../configs/x101_allsites_species_overlapping25_tiles1500_ITER3000.yml' 

In the cell below, you will train a deep learning model. Depending on you data and on your machine ressources, this step can take several hours.

In [None]:
tustrain.train_model(config_path)

### &#x0032;. &#x0032;. Evaluate model
By running the code below, you open tensorboard which will help you to analyze training metrics and detect you have, for instance, overfitted training data.

In [None]:
import os
%load_ext tensorboard
%tensorboard --logdir='/path/to/logs/directory/'

When evaluating the model trained, you will get a csv that reports what is the performance of the trained model on your test dataset.

In [4]:
import artus.evaluate_model.evaluate as tuseval

In [None]:
tuseval.evaluate_model(
    config_path, 
    csv_metrics_name='/path/to/export/models_metrics.csv')

You can also plot the evaluation results with artus. You will get several interactive barplots that can be useful to compare models when you tried different model configuration.

In [12]:
import plotly.express as px
import pandas as pd
import artus.evaluate_model.write_eval_results as tusevalplot

In [13]:
plots = tusevalplot.ModelsMetricsPlots(
    csv_metrics_path = '/home/justine/Documents/G2OI/collaborations/brianna/logs/models_metrics.csv', #the results from evaluate_model.ipynb
    export_dir = '/path/to/export/plots/',
    plot_name = 'metrics.html',
    title = 'Average precision'
)

In [14]:
fig = plots.plot_metrics()
fig.show()

In [None]:
plots.export_plots()

## &#x0033; Use the model you trained to predict new data!

Now that you have a trained deep learning model, you can use it to predict new annotations on unlabeled rasters and export the results into a spatial format!


In [None]:
import yaml
import torch
import os
import artus.inference as tusinf
import artus.spatialize as tuspal
from tursinf import 

In [None]:
images_dir = '/path/to/unlabeled/images/directory/' 

### &#x0033;. &#x0032;. Deploy an unlabeled fiftytone dataset
In artus package, we choose to work with fiftyone datasets to handle images and annotations formats. Fiftyone is an open source python package very relevant in deep learning frameworks. You can find all the features on their website. 


In [None]:
dataset = tusinf.deploy_unlabeled_dataset.create_or_load_dataset(
    dataset_name=os.path.basename(config_path), #add a name for your fiftyone dataset 
    dataset_type='unlabeled', 
    images_path=images_dir,
    label_type='segmentation')

dataset.persistent = True #optional : save dataset on your machine for further exploration

dataset.save()

print(dataset)

In [None]:
dataset.compute_metadata()

### &#x0033;. &#x0033;. Add image locations

To be able to spatialize the predictions made by the model, you need to provide a csv containing location of every image in the dataset. The csv must be presented like this : 
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>filename</th>
      <th>latitude</th>
      <th>longitude</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>image1.jpeg</td>
      <td>45.2145313</td>
      <td>-12.18641251</td>
    </tr>
    <tr>
      <td>image2.jpeg</td>
      <td>45.2145584</td>
      <td>-12.18646598</td>
    </tr>
    <tr>
      <td>image3.jpeg</td>
      <td>45.214565456</td>
      <td>-12.18646123</td>
    </tr>
    <tr>
      <td>image4.jpeg</td>
      <td>45.65452316.jpg</td>
      <td>-12.123456789</td>
    </tr>
    <tr>
      <td>image5.jpeg</td>
      <td>45.16546168</td>
      <td>-12.18646598</td>
    </tr>
  </tbody>
</table>


Predictions made by the AI model on an image will be affected to the GPS point belonging to the image and results will be spatialized.

In [None]:
from tuspal.LocationImporter import import_csv_locations

In [None]:
dataset = import_csv_locations(
    location_path="/path/to/locations.csv",
    fiftyone_dataset=dataset
    )

### &#x0033;. &#x0034;. Predict new annotations thanks to AI!

We first call the trained model an load it (this is the predictor), then we predict new labels on unlabeled rasters with the predictor. In a yaml file, we stored the classes that the model can predict. You can write a yaml file or add a python list directly.

In [None]:
device = ("cuda" if torch.cuda.is_available() else "cpu")

#Load model's classes
model_classes = '../../configs/model_classes_species.yml'

with open(model_classes) as f:
    model_classes = yaml.load(f, Loader=yaml.FullLoader)

In [None]:
predictor = tusinf.predict.build_predictor(config_path, device)

In [None]:
dataset = tusinf.predict.add_predictions_to_dataset(
    dataset=dataset, 
    predictor=predictor, 
    device=device, 
    classes=model_classes['species_classes'], #the list of a class names or config file containing the list
    predictions_field='predictions', 
    nms_threshold=0.5)

### &#x0033;. &#x0035;. Export the results into a spatial format! &#x1F389;

In [None]:
geojson_exporter = tuspal.GeoFiftyoneExporter(
    export_dir='/path/to/export/dir', 
    label_type='polylines', #can be 'detections' for bbox or 'polylines' for segmentation masks
    epsg_code='4326', #set the destination CRS adapted to your data
    dest_name='geospatial_predictions.geojson'
)

In [None]:
dataset.export(
    dataset_exporter=geojson_exporter,
    label_field='predictions',
    export_media=False
    )