SALSA

Selection of Adjusted Litter Scene Annotations (SALSA) is a new image dataset for object detection that focuses on litter. This dataset is currently under active development.

Overview

SALSA is designed to further research in the area of computer vision for litter detection. The dataset is a mix of selected and adjusted annotations of images from TACO and new annotations of images from OpenLitterMap. SALSA contains 2569 images with 8356 annotations for objects across 10 categories. Images represent a diverse set of litter objects in the wild, i.e., outdoor images of litter in various natural environments. The annotations are provided in COCO format in annotations.json. Images are distributed via Azure Blob Storage (in West Europe Region). SALSA dataset, i.e., annotations.json, is made available under the Open Database License. salsa-utils package is distributed under Apache License.

Important

I do not own the copyright of the images distributed via the Blob Storage. Use of the images must abide by respective image licenses specified for each image in annotations.json. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.

Getting Started

salsa-utils is a Python package designed to facilite work with SALSA dataset. You can install the package from GitHub using

pip install git+https://github.com/alinacherkas/salsa.git

You can either use Python or CLI interface to download the complete dataset in YOLO format with just a couple of lines of code.

import salsa_utils as salsa

# transform annotations, download images, write labels and dataset config file all in one go
salsa.prepare_salsa()

Alternatively, you can run the below code from the command line to achieve the same result:

python -m salsa_utils

By default, data is saved to ./datasets while salsa.yaml is written to the current working directory.

Descriptive Statistics

Out of 2569 images in the dataset, about 48% come directly from TACO while the rest are new images obtained from OpenLitterMap. For all TACO images, bounding boxes were checked and adjusted if needed, while categories were manually recoded and validated according to the new label scheme with 10 categories. About 59% of annotations appear in newly annotated images from OpenLitterMap. The distribution of objects across the 10 classes is shown below.

Litter objects in images appear in different positions and various sizes. The distribution of bounding box areas across by object category is displayed in the figure below.

Comparison with TACO

Criterion	TACO	SALSA
Year	2020	2023
Number of Images	1500 (official version)	2569 (1325 taken from TACO and 1244 collected from OpenLitterMap)
Number of Annotations	4784	8356 (3396 adjusted from/added to TACO and 4960 annotated from OpenLitterMap)
Number of Categories	60	10
Number of Supercategories	28	1
Bounding Boxes	Yes	Yes
Instance Segmentation	Yes	Not yet
Source of Annotations	Crowdsourced	Manually Curated

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/salsa_utils		src/salsa_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/salsa_utils

src/salsa_utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

SALSA

Overview

Getting Started

Descriptive Statistics

Comparison with TACO

Examples

About

Releases 1

Languages

License

alinacherkas/salsa

Folders and files

Latest commit

History

Repository files navigation

SALSA

Overview

Getting Started

Descriptive Statistics

Comparison with TACO

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Languages