Skip to content

Commit

Permalink
docs: add documentation about merge datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
zhen.chen committed Aug 20, 2021
1 parent bfce46f commit b1bb900
Show file tree
Hide file tree
Showing 4 changed files with 131 additions and 1 deletion.
51 changes: 51 additions & 0 deletions docs/code/merge_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env python3
#
# Copyright 2021 Graviti. Licensed under MIT License.
#

# pylint: disable=wrong-import-position
# pylint: disable=wrong-import-order
# pylint: disable=ungrouped-imports
# pylint: disable=pointless-statement
# pylint: disable=pointless-string-statement
# pylint: disable=invalid-name
# pylint: disable=invalid-sequence-index


"""This file includes the python code of merged_dataset.rst."""

"""Create Target Dataset"""
from tensorbay import GAS

ACCESS_KEY = "Accesskey-*****"
gas = GAS(ACCESS_KEY)
dataset_client = gas.create_dataset("mergedDataset")
dataset_client.create_draft("merge dataset")
""""""

"""Copy Segment from Pet"""
pet_dataset_client = gas.get_dataset("OxfordIIITPet")
dataset_client.copy_segment("train", target_name="trainval", source_client=pet_dataset_client)
dataset_client.copy_segment("test", source_client=pet_dataset_client)
""""""

"""Unify category"""
from tensorbay.dataset import Data

segment_client = dataset_client.get_segment("train")
for remote_data in segment_client.list_data():
data = Data(remote_data.path)
data.label = remote_data.label
data.label.classification.category = data.label.classification.category.split(".")[0]
segment_client.upload_label(data)
""""""

"""Copy Data from Dog vs Cat"""
pet_dataset_client = gas.get_dataset("DogsVsCats")
for name in ["test", "train"]:
source_segment_client = pet_dataset_client.get_segment(name)
segment_client = dataset_client.get_segment(name)
segment_client.copy_data(
source_segment_client.list_data_paths(), source_client=source_segment_client
)
""""""
69 changes: 69 additions & 0 deletions docs/source/examples/merge_dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
################
Merge Datasets
################

This topic describes the merge dataset operation:

Take the `Oxford-IIIT Pet <https://gas.graviti.cn/dataset/data-decorators/OxfordIIITPet>`_
and `Dogs vs Cats <https://gas.graviti.cn/dataset/data-decorators/DogsVsCats>`_
as examples. Their structures looks like::

Oxford-IIIT Pet/
test/
Abyssinian_002.jpg
...
trainval/
Abyssinian_001.jpg
...

Dogs vs Cats/
test/
1.jpg
10.jpg
...
train/
cat.0.jpg
cat.1.jpg
...

There are lots of pictures of cats and dogs in these two datasets,
merge them to get a more diverse dataset

.. note::

Before merging datasets operation, fork them first.

Create a dataset which is named ``mergeDataset``

.. literalinclude:: ../../../docs/code/merge_dataset.py
:language: python
:start-after: """Create Target Dataset"""
:end-before: """"""

Copy all segments in ``OxfordIIITPetDog`` to ``mergedDataset``

.. literalinclude:: ../../../docs/code/merge_dataset.py
:language: python
:start-after: """Copy Segment from Pet"""
:end-before: """"""


Unify categories of ``train`` segment.

.. literalinclude:: ../../../docs/code/merge_dataset.py
:language: python
:start-after: """Unify category"""
:end-before: """"""

.. note::

The category in ``OxfordIIITPet`` is of two-level formats, like ``cat.Abyssinian``,
but in ``Dogs vs Cats`` it only has one level, like ``cat``,
thus we need to unify the categories, for example, rename ``cat.Abyssinian`` to ``cat``.

Copy data from ``Dogs vs Cats`` to ``mergeDataset``

.. literalinclude:: ../../../docs/code/merge_dataset.py
:language: python
:start-after: """Copy Data from Dog vs Cat"""
:end-before: """"""
10 changes: 9 additions & 1 deletion docs/source/features/dataset_management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,12 @@ Moving is only supported within one dataset.

The target dataset of copying and moving must be in :ref:`reference/glossary:draft` status.

Please see :ref:`move and copy<examples/move_and_copy:Move And Copy>` example for more details.
Please see :ref:`Move and copy<examples/move_and_copy:Move And Copy>` example for more details.

***************
Merge Dataset
***************

Since TensorBay supports copy operation between different datasets, users can use it to merge datasets.

Please see :ref:`Merge dataset<examples/merge_dataset:Merge Dataset>` example for more details.
2 changes: 2 additions & 0 deletions docs/source/quick_start/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The following table lists a series of examples to help developers to use TensorB
| Label Type: :ref:`reference/label_format:Sentence`
:ref:`examples/update_dataset:Update Dataset` | Topic: Update Dataset
:ref:`examples/move_and_copy:Move And Copy` | Topic: Move And Copy
:ref:`examples/merge_dataset:Merge Dataset` | Topic: Merge Dataset
======================================================= ===========================================================

.. toctree::
Expand All @@ -47,3 +48,4 @@ The following table lists a series of examples to help developers to use TensorB
../examples/Newsgroups20
../examples/update_dataset
../examples/move_and_copy
../examples/merge_dataset

0 comments on commit b1bb900

Please sign in to comment.