-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add documentation about merge datasets
- Loading branch information
zhen.chen
committed
Aug 20, 2021
1 parent
bfce46f
commit b1bb900
Showing
4 changed files
with
131 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
#!/usr/bin/env python3 | ||
# | ||
# Copyright 2021 Graviti. Licensed under MIT License. | ||
# | ||
|
||
# pylint: disable=wrong-import-position | ||
# pylint: disable=wrong-import-order | ||
# pylint: disable=ungrouped-imports | ||
# pylint: disable=pointless-statement | ||
# pylint: disable=pointless-string-statement | ||
# pylint: disable=invalid-name | ||
# pylint: disable=invalid-sequence-index | ||
|
||
|
||
"""This file includes the python code of merged_dataset.rst.""" | ||
|
||
"""Create Target Dataset""" | ||
from tensorbay import GAS | ||
|
||
ACCESS_KEY = "Accesskey-*****" | ||
gas = GAS(ACCESS_KEY) | ||
dataset_client = gas.create_dataset("mergedDataset") | ||
dataset_client.create_draft("merge dataset") | ||
"""""" | ||
|
||
"""Copy Segment from Pet""" | ||
pet_dataset_client = gas.get_dataset("OxfordIIITPet") | ||
dataset_client.copy_segment("train", target_name="trainval", source_client=pet_dataset_client) | ||
dataset_client.copy_segment("test", source_client=pet_dataset_client) | ||
"""""" | ||
|
||
"""Unify category""" | ||
from tensorbay.dataset import Data | ||
|
||
segment_client = dataset_client.get_segment("train") | ||
for remote_data in segment_client.list_data(): | ||
data = Data(remote_data.path) | ||
data.label = remote_data.label | ||
data.label.classification.category = data.label.classification.category.split(".")[0] | ||
segment_client.upload_label(data) | ||
"""""" | ||
|
||
"""Copy Data from Dog vs Cat""" | ||
pet_dataset_client = gas.get_dataset("DogsVsCats") | ||
for name in ["test", "train"]: | ||
source_segment_client = pet_dataset_client.get_segment(name) | ||
segment_client = dataset_client.get_segment(name) | ||
segment_client.copy_data( | ||
source_segment_client.list_data_paths(), source_client=source_segment_client | ||
) | ||
"""""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
################ | ||
Merge Datasets | ||
################ | ||
|
||
This topic describes the merge dataset operation: | ||
|
||
Take the `Oxford-IIIT Pet <https://gas.graviti.cn/dataset/data-decorators/OxfordIIITPet>`_ | ||
and `Dogs vs Cats <https://gas.graviti.cn/dataset/data-decorators/DogsVsCats>`_ | ||
as examples. Their structures looks like:: | ||
|
||
Oxford-IIIT Pet/ | ||
test/ | ||
Abyssinian_002.jpg | ||
... | ||
trainval/ | ||
Abyssinian_001.jpg | ||
... | ||
|
||
Dogs vs Cats/ | ||
test/ | ||
1.jpg | ||
10.jpg | ||
... | ||
train/ | ||
cat.0.jpg | ||
cat.1.jpg | ||
... | ||
|
||
There are lots of pictures of cats and dogs in these two datasets, | ||
merge them to get a more diverse dataset | ||
|
||
.. note:: | ||
|
||
Before merging datasets operation, fork them first. | ||
|
||
Create a dataset which is named ``mergeDataset`` | ||
|
||
.. literalinclude:: ../../../docs/code/merge_dataset.py | ||
:language: python | ||
:start-after: """Create Target Dataset""" | ||
:end-before: """""" | ||
|
||
Copy all segments in ``OxfordIIITPetDog`` to ``mergedDataset`` | ||
|
||
.. literalinclude:: ../../../docs/code/merge_dataset.py | ||
:language: python | ||
:start-after: """Copy Segment from Pet""" | ||
:end-before: """""" | ||
|
||
|
||
Unify categories of ``train`` segment. | ||
|
||
.. literalinclude:: ../../../docs/code/merge_dataset.py | ||
:language: python | ||
:start-after: """Unify category""" | ||
:end-before: """""" | ||
|
||
.. note:: | ||
|
||
The category in ``OxfordIIITPet`` is of two-level formats, like ``cat.Abyssinian``, | ||
but in ``Dogs vs Cats`` it only has one level, like ``cat``, | ||
thus we need to unify the categories, for example, rename ``cat.Abyssinian`` to ``cat``. | ||
|
||
Copy data from ``Dogs vs Cats`` to ``mergeDataset`` | ||
|
||
.. literalinclude:: ../../../docs/code/merge_dataset.py | ||
:language: python | ||
:start-after: """Copy Data from Dog vs Cat""" | ||
:end-before: """""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters