Drone based waterbird counting

This project is a proof-of-concept tool to automatically count colonially nesting waterbirds from drone imagery, developed initialy on Straw-necked Ibis at Narran Lakes.

The purpose of developing a tool to automatically count colonially nesting waterbirds is to allow accurate, repeatable and timely monitoring of large colonies. It is envisioned that a user would be able to input drone imagery and with minimal technical skill derive an accurate count of waterbirds.

Quickstart instructions

This project contains 3 requirements files:

File	Type	Contents
requirements.txt	pip	minimal requirment to run the model and label datasets
Env.txt	Conda	the python 3.9 enviroment we are runing this project on
Envrequirements.txt	pip	the packages we have installed on our enviroment

We have been running this project on a NC4AS_T4_V3 in AzureML, to download this project and set up this enviroment run the following in a terminal

NOTE: Git lfs is used for the larger files https://git-lfs.github.com/

sudo apt-get install git-lfs

git lfs install

git clone https://github.com/MDBAuth/Drone-based-waterbird-counting.git

cd Drone-based-waterbird-counting

create environment:

conda create -n waterbirds python=3.9 

conda activate waterbirds 

conda install --file Env.txt

pip3 install -r Envrequirements.txt

pip3 install -r requirements.txt

export PYTHONPATH=$PWD

with the enviroment setup we can run the quickstart notebook in the notebook folder to see the tool in action. Further details about the tool and its development can be see in the report folder.

Instruction Guide

Annotating and training an object detector for automated counting of species

IMPORTANT NOTE: Parameters in params.yaml are used throughout the repository.

1. Definitions

The definitions below will be referenced throughout this document.

1.1. Raw Dataset

Project: Also called a mission, whereby a drone is sent to collect images over a specified terrestrial area, using a push broom or mosaicking pattern. Images are saved in a folder (e.g. project_name).
Dataset: A set of PNG or JPG images typically located in a single folder. These images must all have the same dimensions.
Overlap: When using a push broom or mosaic pattern of overhead image collection, the overlap is the fractional overlap between vertically adjacent images.
Sidelap: When using a push broom or mosaic pattern of overhead image collection, the overlap is the fractional overlap between horizontally adjacent images.

1.2. Machine Learning & Image Processing

Slices: Due a combination of the limited memory capacity of processors as well as the relatively small objects of interest (e.g. birds), high-resolution images must be processed in smaller pieces. In this guide we refer to these pieces as slices. Click here for more information.
Training Set: A set of slices that have been labelled with bounding boxes. This will typically consist of a data folder containing the sliced images themselves, and a labels.json file containing the label metadata.
Stride: Object detection algorithms are less performant near the edges of images, therefore allowing a fractional overlap (a.k.a. stride) between slices allows a ML model to improve detection performance. Note: trimming the prediction area by the same stride value avoids double-counting.

2. Preparing and Labelling a New Training Set

2.1. Loading a Dataset

The first step to preparing a Dataset is to copy a Project into the root directory, which is set by default to ./data/raw. With this pattern, raw images should be located ./data/raw/project_name/ directory.

2.2. Pre-Annotations Using DotDotGoose

Use DotDotGoose software to pre-annotate labels. The output of a DotDotGoose labelling job will be a PNT file with a .pnt extension, which must be placed inside the ./data/raw/project_name/ directory along with the raw images. NOTE: There must only be one PNT file per project.

2.3. Update `params.yaml`

To process the correct Project, the image_height, image_width and project_name fields must be updated to reflect the Project attributes. The project_name is simply the name of the folder inside of which the raw images are stored, while the image_height and image_width fields are the dimensions of the original high resolution (raw) images.

2.4. Prepare Data for Labelling

Once a Dataset is prepared, including a single PNT file, and the params.yaml is updated as described above, the following command should be executed to prepare the dataset for labelling.

```python3 ./scripts/split_raw_dataset.py```

2.5. Labelling Data using Label Studio

Following the preparation of a dataset for labelling, follow the instructions below to begin labelling data using Label Studio (a commercially available third party product).

NOTES:

this repository is only compatible with Label Studio annotation formatting, use of other annotation software will require work to ensure compatibility
the following steps should be run on a local computerl they are not guaranteed to work on cloud servers

Run the following command
```python3 ./scripts/start_label-studio.py```
Login to Label Studio
Load files.txt (located in the root directory) on the label studio GUI
Click Create (Project) and name your project
Click the Data Import tab*, then Upload Files button
Find and select files.txt (in root folder)
Once prompted to "Treat CSV/TST as:", Select "List of tasks"
Click on Labeling Setup and Remove Airplane and Car labels and add Bird (and/or other labels as required)
Click Save and begin labelling (click here for labelling best practices)
Once finished labelling, Click Export and select JSON format
Click Export and save the output file into ./data/labels/ directory. Update the relevant filename or filepath in params.yaml (located in slices:labels)

2.6. Prepare Training Set from Labelled Data

Once the steps in Section 2.5 have been followed to completion, first update the following fields in the training section of params.yaml as follows:

train_data_dir: ./data/training-sets/project_name/data
train_coco: ./data/training-sets/project_name/labels.json

Finally, run the following command to prepare the Training Set:

```python3 scripts/prepare_training_set.py```

2.7. Adding to an Existing Training Set

If all of the previous steps were followed for two projects (let's call them project_1 and project_2), then there should be two different labels.json files each in their respective folders, as well as two corresponding ```training-sets/project_name/data/`` directories containing the image (slices) data inputs. The following methodology can be used to combine these two Training Set (this process can be used to combine any number of Training Sets):

Create a new folder called ./data/training-sets/combined_project_name/data/
Copy contents of ./data/training-sets/project_1/data/ and ./data/training-sets/project_2/data/ into folder created in Step 1
Create an empty JSON file ./data/training-sets/combined_project_name/labels.json
Concatenate the two JSON files from project_1 and project_2 and copy the output into the JSON file created in Step 3
Update the relevant fields in params.yaml (see Section 2.6)

3. Training a Model on a Training Set

Once a Training Set has been prepared according to the steps described in Section 2, first update the project_name of the following field in the models section of params.yaml:

best_model: ./models/detection/weights/project_name.pth

as well as the training section of params.yaml:

train_data_dir: ./data/training-sets/project_name/data
train_coco: ./data/training-sets/project_name/labels.json

Finally, run the following command to start a training job:

```python3 scripts/train_mdba.py```

While this should "just work", hyperparameters may be tuned, and data augmentation techniques can be added (see lines 96-107 in ./utils/train_utils.py regarding the use of torchvision.transforms).

4. Counting Species using a Trained Model

4.1 Save Raw Images

Simply follow the instructions in Section 2.1. Keep the project name in mind for the next step.

4.2. Run Bird Counting Script

Update the relevant parameters in params.yaml then run the following command (arguments in square brackets are optional):


python3 scripts/bird_count.py
--project-name ```project_name```


[


```*These optional arguments can be set here or in params.yaml```


  --data-root ```data_root```

  --output-folder ```path_to_output```

  --conf-thresh ```confidence_threshold```

  --overlap ```overlap_ratio```

  --sidelap ```sidelap_ratio```

  --stride ```stride_ratio```

  --model-path ```model_path```

  --save-slices ```save_slices``` # 0 for False, 1 for True


]

where the parameters are defined as follows:

--project-name: This is the parent folder where image dataset is stored.
--data-root: This is root directory inside which the project-folder lies (default: ./data/raw/).
--output-folder: This is the path to store the counts CSV file (default: ./models/results/).
--conf-thresh: The confidence threshold (default: 0.95) allows for calibration of over- or under-counting.
--overlap: This ratio determines the amount of overlap (default: 0.1) between adjacent whole images to avoid double counting.
--sidelap: This ratio determines the amount of sidelap (default: 0.1) between adjacent whole images to avoid double counting.
--stride: This factor determines the stride (default: 0.5) between adjacent slices during inferencing.
--model-path: This is the path where the inferencing model is located (default: ./models/detection/weights/best.pth).
--save-slices: This optional parameter enables saving visual predictions of non-empty slices (default: 0). Set to '1' to enable.
WARNING: The --save-slices parameter if set to True can easily consume a large amount of storage so it is advised to create a small test folder containing 1 or only very few whole images for testing and calibration purposes.

See an example of a detection output (when save-slices is set to 1 or True).

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.ipynb_aml_checkpoints		.ipynb_aml_checkpoints
.ipynb_checkpoints		.ipynb_checkpoints
data/coco-detection-dataset		data/coco-detection-dataset
models		models
notebooks		notebooks
requirements		requirements
scripts		scripts
test		test
testproject		testproject
utils		utils
.amlignore		.amlignore
.amlignore.amltmp		.amlignore.amltmp
.gitattributes		.gitattributes
Installation_Notes.txt		Installation_Notes.txt
MDBA_Waterbirds_TrainValidationTest-Evaluation_Notebook.ipynb		MDBA_Waterbirds_TrainValidationTest-Evaluation_Notebook.ipynb
README.md		README.md
params.yaml		params.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drone based waterbird counting

Quickstart instructions

Instruction Guide

Annotating and training an object detector for automated counting of species

IMPORTANT NOTE: Parameters in params.yaml are used throughout the repository.

Table of Contents

1. Definitions

1.1. Raw Dataset

1.2. Machine Learning & Image Processing

2. Preparing and Labelling a New Training Set

2.1. Loading a Dataset

2.2. Pre-Annotations Using DotDotGoose

2.3. Update `params.yaml`

2.4. Prepare Data for Labelling

2.5. Labelling Data using Label Studio

2.6. Prepare Training Set from Labelled Data

2.7. Adding to an Existing Training Set

3. Training a Model on a Training Set

4. Counting Species using a Trained Model

4.1 Save Raw Images

4.2. Run Bird Counting Script

Supported through funding from the Australian Government Murray–Darling Water and Environment Research Program

About

Releases

Packages

Contributors 2

Languages

MDBAuth/Drone-based-waterbird-counting

Folders and files

Latest commit

History

Repository files navigation

Drone based waterbird counting

Quickstart instructions

Instruction Guide

Annotating and training an object detector for automated counting of species

IMPORTANT NOTE: Parameters in params.yaml are used throughout the repository.

Table of Contents

1. Definitions

1.1. Raw Dataset

1.2. Machine Learning & Image Processing

2. Preparing and Labelling a New Training Set

2.1. Loading a Dataset

2.2. Pre-Annotations Using DotDotGoose

2.3. Update params.yaml

2.4. Prepare Data for Labelling

2.5. Labelling Data using Label Studio

2.6. Prepare Training Set from Labelled Data

2.7. Adding to an Existing Training Set

3. Training a Model on a Training Set

4. Counting Species using a Trained Model

4.1 Save Raw Images

4.2. Run Bird Counting Script

Supported through funding from the Australian Government Murray–Darling Water and Environment Research Program

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

2.3. Update `params.yaml`

Packages