Skip to content

Conversation

@AChenQ
Copy link
Collaborator

@AChenQ AChenQ commented Jun 25, 2021

No description provided.

@AChenQ AChenQ requested a review from Hoteryoung as a code owner June 25, 2021 10:12
@coveralls
Copy link

coveralls commented Jun 25, 2021

Pull Request Test Coverage Report for Build 983312820

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 83.585%

Totals Coverage Status
Change from base Build 983220363: 0.0%
Covered Lines: 4985
Relevant Lines: 5964

💛 - Coveralls

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from e782aaa to 3294b79 Compare June 25, 2021 10:52
Copy link
Collaborator

@Hoteryoung Hoteryoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change L26 into the following line:

In authorized cloud storage mode, data are stored on other providers' cloud.
This mode is suitable for users who already host their data on a third-party cloud storage space.


The directory ``path/to/dataset`` should be empty when create an authorized storage Fusion Dataset.

Inport data into tensorby
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import Cloud Files into Authorized Storage Dataset

Inport data into tensorby
=========================

If your use cloud platform to storage data, you will first need to import your data into TensorBay.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 64 is a little bit redundant ...


If your use cloud platform to storage data, you will first need to import your data into TensorBay.

There are two options to import your data from raw cloud platform path to authorized storage dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two methods to import cloud files into an authorized storage dataset.


There are two options to import your data from raw cloud platform path to authorized storage dataset.

1. Import all files with a directory path into datset -- out-of-the-box
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • OUT-OF-THE-BOX: Import all files under a directory into a dataset.
  • CUSTOMIZED: Use AuthData to organize cloud files into a dataset.

1. Import all files with a directory path into datset -- out-of-the-box
2. Use AuthData to organized a dataset -- customization

Suppose the cloud storage platform structure like ::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take the following cloud directory as an example::

└── ...


Import all files from one directory into a target segment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-the-box Method

dataset_client.import_all_files("datas/images", "train")
Import files and upload label to several segments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customized Method


.. code:: python
datas = cloud_client.open("datas/labels/0001.json").read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why datas?

datas = cloud_client.open("datas/labels/0001.json").read()
Use the file path in cloud platform as the AuthData path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the cloud file path as ...

AuthData("train/data/0001.png")
load label file from cloud platform to tensorby label format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load -> Load

tensorbay -> TensorBay

:start-after: """Load label file from cloud platform into the AuthData"""
:end-before: """"""

Import dataset from cloud platform to authorized storage dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import cloud files to an authorized storage dataset.

.. important::

The file will copy from raw directory to the authorized cloud storage dataset path,
set `delete_source=True` to delete raw files when the import is finished.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please use double apostrophes like delete_source=True to describe inline code.
    There are several cases in this PR, please correct them all.
  2. Please describe explicitly the place to set delete_source=True.
    Add a line of code, for example.

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch 2 times, most recently from 03d7cb8 to 8be2803 Compare June 29, 2021 03:52
Out-of-the-box Method
*********************

Import all files in `datas/images` directory into `train` segment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use double apostrophes when writing inline code.

cloud_client = dataset_client.get_cloud_client()
List all files in `datas/train`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from 8be2803 to c3d9a6e Compare June 29, 2021 07:57

.. code:: python
contents = cloud_client.open("datas/labels/0001.json").read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should open used with with

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch 4 times, most recently from f2e107d to 784643b Compare June 29, 2021 14:53

.. important::

The file will copy from raw directory to the authorized cloud storage dataset path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files will be copied from ...
, -> .

.. important::

The file will copy from raw directory to the authorized cloud storage dataset path,
So the storage space of file will double
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus the storage space will be doubled on the cloud platform.

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch 2 times, most recently from 6e9cc32 to dbff152 Compare June 29, 2021 15:17

from tensorbay.label import LabeledBox2D

images = cloud_client.list_auth_data("datas/images")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datas?

labels = cloud_client.list_auth_data("datas/labels")

auth_data = images[0]
auth_data.label.box2d = [LabeledBox2D.loads(json.load(labels[0].open()))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classification is better for this example

segment = dataset.create_segment()

segment.append(AuthData("train/data/0001.png"))
segment.append(AuthData("train/data/0002.png"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using AuthData.__init__ to get AuthData instance is not recommend now.

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from dbff152 to 9015d2a Compare June 29, 2021 15:32
images = cloud_client.list_auth_data("data/images")
labels = cloud_client.list_auth_data("data/labels")

for index, auth_data in enumerate(images):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use zip

@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from 9015d2a to acc5a9c Compare June 29, 2021 15:37
@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from acc5a9c to e618bc4 Compare June 29, 2021 15:40
@AChenQ AChenQ force-pushed the T16354_cloud_doc branch from e618bc4 to a230451 Compare June 29, 2021 15:53
@AChenQ AChenQ merged commit 138e0b1 into Graviti-AI:main Jun 29, 2021
@AChenQ AChenQ deleted the T16354_cloud_doc branch June 29, 2021 15:53
linjiX pushed a commit that referenced this pull request Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants