## Index:
- **Basic Tags Usage** part describes the available scene tags and gives an example on building your own tags filter
- **Canonical Partitioning** part illustrates how to use filters provided as a part of the API to get the dataset split used in the Shifts competition

## Basic Tags Usage

For the usage of MotionPredictionDataset refer to [example.ipynb](https://github.com/yandex-research/shifts/blob/main/sdc/examples/example.ipynb) and the [class definition](https://github.com/yandex-research/shifts/blob/main/sdc/ysdc_dataset_api/dataset/dataset.py#L23).

In [12]:
from ysdc_dataset_api.dataset import MotionPredictionDataset

In [13]:
# Path to the directory with protobuf data
dataset_path = '/path/to/dataset/dir'
# Path to the file containing scene tags.
# All the tag files are stored inside tar.gz data archives.
scene_tags_fpath = '/path/to/dataset/tags/file'

To filter scenes by tags a user should define a filter function which accepts a tags dict as an input and return True if a scene meets the desired filter criteria and False otherwise.

The scene tags dict has following structure:
```
{
    'day_time': one of {'kNight', 'kMorning', 'kAfternoon', 'kEvening'}
    'season': one of {'kWinter', 'kSpring', 'kSummer', 'kAutumn'}
    'track': one of {'Moscow' , 'Skolkovo', 'Innopolis', 'AnnArbor', 'Modiin', 'TelAviv'}
    'sun_phase': one of {'kAstronomicalNight', 'kTwilight', 'kDaylight'}
    'precipitation': one of {'kNoPrecipitation', 'kRain', 'kSleet', 'kSnow'}
}
```
Full description of protobuf message is available at [tags.proto](https://github.com/yandex-research/shifts/blob/main/sdc/ysdc_dataset_api/proto/tags.proto) file.

In [14]:
# Let's define some tags filter
def filter_rainy_innopolis(scene_tags_dict):
    return scene_tags_dict['track'] == 'Innopolis' and scene_tags_dict['precipitation'] == 'kRain'

MotionPredictionDataset inherits [torch.utils.data.IterableDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset) interface.

In [15]:
dataset = MotionPredictionDataset(
    dataset_path=dataset_path,
    scene_tags_fpath=scene_tags_fpath,
    scene_tags_filter=filter_rainy_innopolis,
)

176/50000 scenes fit the filter criteria.


In [16]:
# The number of scenes after filtering:
dataset.num_scenes

176

## Canonical Partitioning

To get the canonical dataset one can use a set of filters from the package.

In [17]:
from sdc.filters import DATASETS_TO_FILTERS

In [39]:
# DATASETS_TO_FILTERS dict contains a map from dataset type (train/development/evaluation)
# to a map from filter name to respective filter function.

for dataset_name, dataset_filters in DATASETS_TO_FILTERS.items():
    print(f'Dataset name: {dataset_name}')
    print('Dataset filters:')
    for filter_name, _ in dataset_filters.items():
        print(f'  * {filter_name}')
    print('---')

Dataset name: train
Dataset filters:
  * moscow__train
---
Dataset name: development
Dataset filters:
  * moscow__development
  * ood__development
---
Dataset name: evaluation
Dataset filters:
  * moscow__evaluation
  * ood__evaluation
---


In [26]:
# Specify paths to datasets and respective tags files.

train_dataset_path = '/path/to/train/dataset/dir'
train_tags_fpath = '/path/to/train/tags/file'

development_dataset_path = '/path/to/development/dataset/dir'
development_tags_fpath = '/path/to/development/tags/file'

evaluation_dataset_path = '/path/to/evaluation/dataset/dir'
evaluation_tags_fpath = '/path/to/evaluation/tags/file'

In [36]:
train_filters = DATASETS_TO_FILTERS['train']
development_filters = DATASETS_TO_FILTERS['development']
evaluation_filters = DATASETS_TO_FILTERS['evaluation']

In [38]:
train_dataset = MotionPredictionDataset(
    dataset_path=train_dataset_path,
    scene_tags_fpath=train_tags_fpath,
    scene_tags_filter=train_filters['moscow__train'],
)

dev_in_dataset = MotionPredictionDataset(
    dataset_path=development_dataset_path,
    scene_tags_fpath=development_tags_fpath,
    scene_tags_filter=development_filters['moscow__development'],
)
dev_out_dataset = MotionPredictionDataset(
    dataset_path=development_dataset_path,
    scene_tags_fpath=development_tags_fpath,
    scene_tags_filter=development_filters['ood__development'],
)

eval_in_dataset = MotionPredictionDataset(
    dataset_path=evaluation_dataset_path,
    scene_tags_fpath=evaluation_tags_fpath,
    scene_tags_filter=evaluation_filters['moscow__evaluation'],
)
eval_out_dataset = MotionPredictionDataset(
    dataset_path=evaluation_dataset_path,
    scene_tags_fpath=evaluation_tags_fpath,
    scene_tags_filter=evaluation_filters['ood__evaluation'],
)

27036/50000 scenes fit the filter criteria.
9569/50000 scenes fit the filter criteria.
