This topic describes how to manage the Dogs vs Cats Dataset, which is a dataset with reference/label_format:Classification
label.
An reference/glossary:accesskey
is needed to authenticate identity when using TensorBay.
../../../docs/code/DogsVsCats.py
../../../docs/code/DogsVsCats.py
It takes the following steps to organize the "Dogs vs Cats" dataset by the ~tensorbay.dataset.dataset.Dataset
instance.
A reference/dataset_structure:catalog
contains all label information of one dataset, which is typically stored in a json file.
../../../tensorbay/opendataset/DogsVsCats/catalog.json
The only annotation type for "Dogs vs Cats" is reference/label_format:Classification
, and there are 2 reference/label_format:category
types.
Important
See catalog table <reference/dataset_structure:catalog>
for more catalogs with different label types.
A reference/glossary:dataloader
is needed to organize the dataset into a ~tensorbay.dataset.dataset.Dataset
instance.
../../../tensorbay/opendataset/DogsVsCats/loader.py
See Classification annotation <reference/label_format:Classification>
for more details.
Note
Since the Dogs vs Cats dataloader <dogsvscats-dataloader>
above is already included in TensorBay, so it uses relative import. However, the regular import should be used when writing a new dataloader.
../../../docs/code/DogsVsCats.py
There are already a number of dataloaders in TensorBay SDK provided by the community. Thus, instead of writing, importing an available dataloadert is also feasible.
../../../docs/code/DogsVsCats.py
Note
Note that catalogs are automatically loaded in available dataloaders, users do not have to write them again.
Important
See dataloader table <reference/glossary:dataloader>
for more examples of dataloaders with different label types.
Optionally, the organized dataset can be visualized by Pharos, which is a TensorBay SDK plug-in. This step can help users to check whether the dataset is correctly organized. Please see features/visualization:Visualization
for more details.
The organized "Dogs vs Cats" dataset can be uploaded to TensorBay for sharing, reuse, etc.
../../../docs/code/DogsVsCats.py
Similar with Git, the commit step after uploading can record changes to the dataset as a version. If needed, do the modifications and commit again. Please see features/version_control:Version Control
for more details.
Now "Dogs vs Cats" dataset can be read from TensorBay.
../../../docs/code/DogsVsCats.py
In reference/dataset_structure:dataset
"Dogs vs Cats", there are two segments <reference/dataset_structure:segment>
: train
and test
. Get the segment names by listing them all.
../../../docs/code/DogsVsCats.py
Get a segment by passing the required segment name.
../../../docs/code/DogsVsCats.py
In the train reference/dataset_structure:segment
, there is a sequence of reference/dataset_structure:data
, which can be obtained by index.
../../../docs/code/DogsVsCats.py
In each reference/dataset_structure:data
, there is a sequence of reference/label_format:Classification
annotations, which can be obtained by index.
../../../docs/code/DogsVsCats.py
There is only one label type in "Dogs vs Cats" dataset, which is classification
. The information stored in reference/label_format:category
is one of the names in "categories" list of catalog.json <dogsvscats-catalog>
. See reference/label_format:Classification
label format for more details.
../../../docs/code/DogsVsCats.py