Global Fishing Watch Vessel Classification Pipeline.
Global Fishing Watch is a partnership between Skytruth, Google and Oceana to map all of the trackable commercial fishing activity in the world, in near-real time, and make it accessible to researchers, regulators, decision-makers, and the public.
This repository contains code to build Tensorflow models to classify vessels and identify fishing behaviour based on AIS data.
(This is not an official Google Product).
Use AIS, and possibly VMS data in the future, to extract various types of information including:
Vessel fishing activity
Vessel attributes (length, tonnage, etc)
The project consists of a convolutional neural networks (CNN) that infers vessel features.
We have two CNN in production, as well as several experimental nets. One net
predict vessel class (
sailing, etc), as well as
vessel length and other vessel parameters, while the second predicts whether
a vessel is fishing or not at a given time point.
We initially used a single CNN to predict everything at once, but we've to having two CNN. The original hope was that we would be able to take advantage of transfer learning between the various features. However, we did not see any gains from that, and using a multiple nets adds useful flexibility.
The nets share a similar structure, consisting of a large number (currently 9) of 1-D convolutional layers, followed by a single dense layer. The net for fishing prediction is somewhat more complicated since it must predict fishing at each point. To do this all of the layers of the net are combined, with upscaling of the upper layers, to produce. These design of these nets incorporate ideas are borrowed from the ResNets and Inception nets among other places but adapted for the 1D environment.
The code associated with the neural networks is located in
classification. The models themselves are located
The data layout is currently in flux as we move data generation to Python-Dataflow managed by Airflow
In order to support the above layout, all our programs need the following common parameters:
env: to specify the environment - either development or production.
job-name: for the name (or date) of the current job.
- Additionally if the job is a dev job, the programs will read the $USER environment variable in order to be able to choose the appropriate subdirectory for the output data.
Neural Net Classification
python -m train.deploy_cloudml-- launch a training run on cloudml. Use
--helpto see options
If not running in the SkyTruth/GFW environment, you will need to edit
deploy_cloudml.yamlto set the gcs paths correctly.
For example, to run vessel classification in the dev environment with the name
```python -m train.deploy_cloudml \ --env dev \ --model prod.vessel_characterization \ --job_name test_deploy \ --config_file train/deploy_characterization.yaml ```
IMPORTANT: Even though there is a maximum number of training steps specified, the CloudML process does not shut down reliably. You need to periodically check on the process and kill it manually if it has completed and is hanging. In addition, there are occasionally other problems where either the master or chief will hang or die so that new checkpoints aren't written, or new validation data isn't written out. Again, killing and restarting the training is the solution. (This will pick up at the last checkpoint saved.)
running training locally -- this is primarily for testing as it will be quite slow unless you have a heavy duty machine:
python -m classification.run_training \ prod.fishing_range_classification \ --feature_dimensions 14 \ --root_feature_path FEATURE_PATH \ --training_output_path OUTPUT_PATH \ --fishing_range_training_upweight 1 \ --metadata_file training_classes.csv \ --fishing_ranges_file combined_fishing_ranges.csv \ --metrics minimal
python -m train.compute_metrics-- evaluate restults and dump vessel lists. Use
--helpto see options
running inference -- Unless you have local access to a heavy duty machine, you should probably run this on the dataflow pipeline in
Copy a model checkpoint locally:
gsutil cp GCS_PATH_TO_CHECKPOINT ./model.ckpt
Run inference job:
Vessel Classification. This command only infers result for only the test data (for evaluation purposes), and infers a seperarate classification every 6 months:
python -m classification.run_inference prod.vessel_classification
Fishing localisation: This infers all fishing at all time points (no
python -m classification.run_inference prod.fishing_range_classification \ --root_feature_path GCS_PATH_TO_FEATURES \ --inference_parallelism 32 \ --feature_dimensions 14 \ --inference_results_path=./RESULT_NAME.json.gz \ --model_checkpoint_path ./model.ckpt \ --metadata_file training_classes.csv \ --fishing_ranges_file combined_fishing_ranges.csv
Local Environment Setup
- Python 2.7+
- Tensorflow 12.1 from (https://www.tensorflow.org/get_started/os_setup)
pip install google-api-python-client pyyaml pytz newlinejson python-dateutil yattag
Adding new models
For development: create a directory in
classification/models/devwith the model name (usually the developer name). A
__init__.pyis required for the model to be picked up and the model package directory must be added to
For production: add the model to
YAPF is a code formatter for Python. All our python code should be autoformatted with YAPF before committing. To install it, run:
sudo pip install yapf
yapf -r -i . in the top level directory to fix the format of the full project.