Vessel classification: feature generation and model training/inference.
Clone or download
Latest commit 95bfd06 Sep 21, 2018

README.md

Global Fishing Watch Vessel Classification Pipeline.

Global Fishing Watch is a partnership between Skytruth, Google and Oceana to map all of the trackable commercial fishing activity in the world, in near-real time, and make it accessible to researchers, regulators, decision-makers, and the public.

This repository contains code to build Tensorflow models to classify vessels and identify fishing behaviour based on AIS data.

(This is not an official Google Product).

Build Status

Overview

Use AIS, and possibly VMS data in the future, to extract various types of information including:

  • Vessel types

  • Vessel fishing activity

  • Vessel attributes (length, tonnage, etc)

The project consists of a convolutional neural networks (CNN) that infers vessel features.

Neural Networks

We have two CNN in production, as well as several experimental nets. One net predict vessel class (longliner, cargo, sailing, etc), as well as vessel length and other vessel parameters, while the second predicts whether a vessel is fishing or not at a given time point.

We initially used a single CNN to predict everything at once, but we've to having two CNN. The original hope was that we would be able to take advantage of transfer learning between the various features. However, we did not see any gains from that, and using a multiple nets adds useful flexibility.

The nets share a similar structure, consisting of a large number (currently 9) of 1-D convolutional layers, followed by a single dense layer. The net for fishing prediction is somewhat more complicated since it must predict fishing at each point. To do this all of the layers of the net are combined, with upscaling of the upper layers, to produce. These design of these nets incorporate ideas are borrowed from the ResNets and Inception nets among other places but adapted for the 1D environment.

The code associated with the neural networks is located in classification. The models themselves are located in classification/models.

Data layout

The data layout is currently in flux as we move data generation to Python-Dataflow managed by Airflow

Common parameters

In order to support the above layout, all our programs need the following common parameters:

  • env: to specify the environment - either development or production.
  • job-name: for the name (or date) of the current job.
  • Additionally if the job is a dev job, the programs will read the $USER environment variable in order to be able to choose the appropriate subdirectory for the output data.

Neural Net Classification

Running Stuff

  • python -m train.deploy_cloudml -- launch a training run on cloudml. Use --help to see options

    If not running in the SkyTruth/GFW environment, you will need to edit deploy_cloudml.yaml to set the gcs paths correctly.

    For example, to run vessel classification in the dev environment with the name test:

     ```python -m train.deploy_cloudml \
                  --env dev \
                  --model prod.vessel_characterization \
                  --job_name test_deploy \
                  --config_file train/deploy_characterization.yaml
     ```
    

    IMPORTANT: Even though there is a maximum number of training steps specified, the CloudML process does not shut down reliably. You need to periodically check on the process and kill it manually if it has completed and is hanging. In addition, there are occasionally other problems where either the master or chief will hang or die so that new checkpoints aren't written, or new validation data isn't written out. Again, killing and restarting the training is the solution. (This will pick up at the last checkpoint saved.)

  • running training locally -- this is primarily for testing as it will be quite slow unless you have a heavy duty machine:

      python -m classification.run_training \
          prod.fishing_range_classification \
          --feature_dimensions 14 \
          --root_feature_path FEATURE_PATH \
          --training_output_path OUTPUT_PATH \
          --fishing_range_training_upweight 1 \
          --metadata_file training_classes.csv \
          --fishing_ranges_file combined_fishing_ranges.csv \
          --metrics minimal
    
  • python -m train.compute_metrics -- evaluate restults and dump vessel lists. Use --help to see options

  • running inference -- Unless you have local access to a heavy duty machine, you should probably run this on the dataflow pipeline in pipe-features

    • Copy a model checkpoint locally:

      gsutil cp GCS_PATH_TO_CHECKPOINT ./model.ckpt

    • Run inference job:

    • Vessel Classification. This command only infers result for only the test data (for evaluation purposes), and infers a seperarate classification every 6 months:

      python -m classification.run_inference prod.vessel_classification
      --root_feature_path GCS_PATH_TO_FEATURES
      --inference_parallelism 32
      --feature_dimensions 14
      --dataset_split Test
      --inference_results_path=./RESULT_NAME.json.gz
      --model_checkpoint_path ./model.ckpt
      --metadata_file training_classes.csv
      --fishing_ranges_file combined_fishing_ranges.csv
      --interval_months 6

    • Fishing localisation: This infers all fishing at all time points (no --dataset_split specification)

      python -m classification.run_inference prod.fishing_range_classification \
             --root_feature_path GCS_PATH_TO_FEATURES \
             --inference_parallelism 32 \
             --feature_dimensions 14 \
             --inference_results_path=./RESULT_NAME.json.gz \
             --model_checkpoint_path ./model.ckpt \
             --metadata_file training_classes.csv \
             --fishing_ranges_file combined_fishing_ranges.csv
      

Local Environment Setup

Adding new models

  • For development: create a directory in classification/models/dev with the model name (usually the developer name). A __init__.py is required for the model to be picked up and the model package directory must be added to setup.py.

  • For production: add the model to classification/classification/models/prod

Formatting

YAPF is a code formatter for Python. All our python code should be autoformatted with YAPF before committing. To install it, run:

  • sudo pip install yapf

Run yapf -r -i . in the top level directory to fix the format of the full project.