-
Notifications
You must be signed in to change notification settings - Fork 35
docs: add document about import dataset from cloud storage platform #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 983312820
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change L26 into the following line:
In authorized cloud storage mode, data are stored on other providers' cloud.
This mode is suitable for users who already host their data on a third-party cloud storage space.
|
|
||
| The directory ``path/to/dataset`` should be empty when create an authorized storage Fusion Dataset. | ||
|
|
||
| Inport data into tensorby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import Cloud Files into Authorized Storage Dataset
| Inport data into tensorby | ||
| ========================= | ||
|
|
||
| If your use cloud platform to storage data, you will first need to import your data into TensorBay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 64 is a little bit redundant ...
|
|
||
| If your use cloud platform to storage data, you will first need to import your data into TensorBay. | ||
|
|
||
| There are two options to import your data from raw cloud platform path to authorized storage dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two methods to import cloud files into an authorized storage dataset.
|
|
||
| There are two options to import your data from raw cloud platform path to authorized storage dataset. | ||
|
|
||
| 1. Import all files with a directory path into datset -- out-of-the-box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- OUT-OF-THE-BOX: Import all files under a directory into a dataset.
- CUSTOMIZED: Use AuthData to organize cloud files into a dataset.
| 1. Import all files with a directory path into datset -- out-of-the-box | ||
| 2. Use AuthData to organized a dataset -- customization | ||
|
|
||
| Suppose the cloud storage platform structure like :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take the following cloud directory as an example::
| └── ... | ||
|
|
||
|
|
||
| Import all files from one directory into a target segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out-of-the-box Method
| dataset_client.import_all_files("datas/images", "train") | ||
| Import files and upload label to several segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Customized Method
|
|
||
| .. code:: python | ||
| datas = cloud_client.open("datas/labels/0001.json").read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why datas?
| datas = cloud_client.open("datas/labels/0001.json").read() | ||
| Use the file path in cloud platform as the AuthData path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the cloud file path as ...
| AuthData("train/data/0001.png") | ||
| load label file from cloud platform to tensorby label format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load -> Load
tensorbay -> TensorBay
| :start-after: """Load label file from cloud platform into the AuthData""" | ||
| :end-before: """""" | ||
|
|
||
| Import dataset from cloud platform to authorized storage dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import cloud files to an authorized storage dataset.
| .. important:: | ||
|
|
||
| The file will copy from raw directory to the authorized cloud storage dataset path, | ||
| set `delete_source=True` to delete raw files when the import is finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please use double apostrophes like
delete_source=Trueto describe inline code.
There are several cases in this PR, please correct them all. - Please describe explicitly the place to set delete_source=True.
Add a line of code, for example.
03d7cb8 to
8be2803
Compare
| Out-of-the-box Method | ||
| ********************* | ||
|
|
||
| Import all files in `datas/images` directory into `train` segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use double apostrophes when writing inline code.
| cloud_client = dataset_client.get_cloud_client() | ||
| List all files in `datas/train`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
|
|
||
| .. code:: python | ||
| contents = cloud_client.open("datas/labels/0001.json").read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should open used with with
f2e107d to
784643b
Compare
|
|
||
| .. important:: | ||
|
|
||
| The file will copy from raw directory to the authorized cloud storage dataset path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files will be copied from ...
, -> .
| .. important:: | ||
|
|
||
| The file will copy from raw directory to the authorized cloud storage dataset path, | ||
| So the storage space of file will double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus the storage space will be doubled on the cloud platform.
6e9cc32 to
dbff152
Compare
docs/code/cloud_storage.py
Outdated
|
|
||
| from tensorbay.label import LabeledBox2D | ||
|
|
||
| images = cloud_client.list_auth_data("datas/images") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datas?
docs/code/cloud_storage.py
Outdated
| labels = cloud_client.list_auth_data("datas/labels") | ||
|
|
||
| auth_data = images[0] | ||
| auth_data.label.box2d = [LabeledBox2D.loads(json.load(labels[0].open()))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classification is better for this example
docs/code/cloud_storage.py
Outdated
| segment = dataset.create_segment() | ||
|
|
||
| segment.append(AuthData("train/data/0001.png")) | ||
| segment.append(AuthData("train/data/0002.png")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using AuthData.__init__ to get AuthData instance is not recommend now.
docs/code/cloud_storage.py
Outdated
| images = cloud_client.list_auth_data("data/images") | ||
| labels = cloud_client.list_auth_data("data/labels") | ||
|
|
||
| for index, auth_data in enumerate(images): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use zip
No description provided.