This topic describes dataset management, including:
features/dataset_management:Organize Dataset
features/dataset_management:Upload Dataset
features/dataset_management:Read Dataset
features/dataset_management:Update Dataset
features/dataset_management:Move and Copy
features/dataset_management:Merge Datasets
features/dataset_management:Get Label Statistics
TensorBay SDK supports methods to organize local datasets into uniform TensorBay dataset structure <reference/dataset_structure:Dataset Structure>
. The typical steps to organize a local dataset:
- First, write a catalog (
ref <reference/dataset_structure:Catalog>
) to store all the label schema information inside a dataset. - Second, write a dataloader (
ref <reference/glossary:dataloader>
) to load the whole local dataset into a~tensorbay.dataset.dataset.Dataset
instance.
Note
A catalog is needed only if there is label information inside the dataset.
Take the Organization of BSTLD <quick_start/examples/bstld:Organize Dataset>
as an example.
For an organized local dataset (i.e. the initialized ~tensorbay.dataset.dataset.Dataset
instance), users can:
- Upload it to TensorBay.
- Read it directly.
This section mainly discusses the uploading operation. There are plenty of benefits of uploading local datasets to TensorBay.
- REUSE: uploaded datasets can be reused without preprocessing again.
- SHARING: uploaded datasets can be shared the with your team or the community.
- VISUALIZATION: uploaded datasets can be visualized without coding.
- VERSION CONTROL: different versions of one dataset can be uploaded and controlled conveniently.
Note
During uploading dataset or data, if the remote path of the data is the same as another data under the same segment, the old data will be replaced.
Take the Upload Dataset of BSTLD <quick_start/examples/bstld:Upload Dataset>
as an example.
Two types of datasets can be read from TensorBay:
- Datasets uploaded by yourself as mentioned in
features/dataset_management:Upload Dataset
. - Datasets uploaded by the shared Open Datasets platform.
Note
Before reading a dataset uploaded by the community, fork it first.
Note
Visit my datasets(or team datasets) panel of TensorBay platform to check all datasets that can be read.
Take the Read Dataset of BSTLD <quick_start/examples/bstld:Read Dataset>
as an example.
Since TensorBay supports version control, users can update dataset meta, notes, data and labels to a new commit of a dataset. Thus, different versions of data and labels can coexist in one dataset, which greatly facilitates the datasets' maintenance.
Please see Update dataset<quick_start/examples/update_dataset:Update Dataset>
example for more details.
TensorBay supports four methods to copy or move data in datasets:
- copy segments
- copy data
- move segments
- move data
Copy is supported within a dataset or between datasets.
Moving is only supported within one dataset.
Note
The target dataset of copying and moving must be in reference/glossary:draft
status.
Please see Move and copy<quick_start/examples/move_and_copy:Move And Copy>
example for more details.
Since TensorBay supports copy operation between different datasets, users can use it to merge datasets.
Please see quick_start/examples/merge_datasets:Merge Datasets
example for more details.
TensorBay supports getting label statistics of dataset.
Please see quick_start/examples/get_label_statistics:Get Label Statistics
example for more details.