Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev flow.utils.data part3 #5644

Merged
merged 98 commits into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
ef446dc
add more datasets
Flowingsun007 Jul 28, 2021
3499641
add more transform funcs
Flowingsun007 Jul 28, 2021
08b262d
export interface
Flowingsun007 Jul 28, 2021
b2041fb
Merge branch 'master' of https://github.com.cnpmjs.org/Oneflow-Inc/on…
Flowingsun007 Jul 28, 2021
8048814
Merge branch 'dev_flow.utils.data-part3'
Flowingsun007 Jul 28, 2021
d88bbad
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Jul 30, 2021
fc08cb5
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 2, 2021
2749006
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 2, 2021
099c9d8
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 2, 2021
6048943
export datasets interface
Flowingsun007 Aug 2, 2021
e4c3864
merge master
Flowingsun007 Aug 2, 2021
2f5eabd
auto format by CI
oneflow-ci-bot Aug 2, 2021
b67c286
fix docs
Flowingsun007 Aug 2, 2021
2cc9b94
Merge branch 'fix_datasets_export' of https://github.com/Oneflow-Inc/…
Flowingsun007 Aug 2, 2021
7d50e8f
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 2, 2021
5c9f2cf
Merge branch 'fix_datasets_export' into dev_flow.utils.data-part3
Flowingsun007 Aug 2, 2021
6084d8b
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 3, 2021
e2d0843
skip test
Flowingsun007 Aug 3, 2021
26fd2cc
support DistributedSampler
Flowingsun007 Aug 3, 2021
272b97e
refine
Flowingsun007 Aug 3, 2021
561687d
add more transform function
Flowingsun007 Aug 3, 2021
136396e
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 3, 2021
68bffa8
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 3, 2021
a2e6951
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 4, 2021
c79f928
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 4, 2021
c705f48
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 4, 2021
0f48f6b
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 4, 2021
123bbdb
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 5, 2021
77fba7f
fix err import
Flowingsun007 Aug 5, 2021
27faabd
fix comment
Flowingsun007 Aug 5, 2021
9ac8d98
refine
Flowingsun007 Aug 5, 2021
06048c4
add more transform test
Flowingsun007 Aug 5, 2021
f4bf241
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 5, 2021
0cbefa3
refactor dataloader test
Flowingsun007 Aug 5, 2021
272df42
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 5, 2021
305e399
refine
Flowingsun007 Aug 5, 2021
e2a935b
add ddp test
Flowingsun007 Aug 5, 2021
175b77d
refine
Flowingsun007 Aug 6, 2021
50eee85
refine
Flowingsun007 Aug 6, 2021
b43b5ce
add ddp test case
Flowingsun007 Aug 6, 2021
7f65a68
skil test
Flowingsun007 Aug 6, 2021
89ea598
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 8, 2021
bf04b24
add ddp test case
Flowingsun007 Aug 9, 2021
8dc0740
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 9, 2021
4699460
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 10, 2021
94a9688
add test case
Flowingsun007 Aug 10, 2021
ecf6a22
refine
Flowingsun007 Aug 10, 2021
4085749
rm ddp test
Flowingsun007 Aug 11, 2021
df926ef
remove ddp test
Flowingsun007 Aug 11, 2021
1a6fc30
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
03e3052
auto format by CI
oneflow-ci-bot Aug 11, 2021
8bcf3aa
format
Flowingsun007 Aug 11, 2021
afc2d56
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
ec8a35b
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
8e847e0
update api docs
Flowingsun007 Aug 11, 2021
27861a5
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 11, 2021
86edc24
add utils.rst
Flowingsun007 Aug 11, 2021
3a469e8
auto format by CI
oneflow-ci-bot Aug 11, 2021
21f05a1
fix ddp grad size
daquexian Aug 11, 2021
55183e9
Merge remote-tracking branch 'origin/master' into fix_ddp_grad_size
daquexian Aug 11, 2021
3b5cf1a
remove print
daquexian Aug 11, 2021
9218800
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
38fe352
refine as comments
Flowingsun007 Aug 11, 2021
df07a10
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 11, 2021
c25a097
refine
Flowingsun007 Aug 11, 2021
e19ea1b
auto format by CI
oneflow-ci-bot Aug 11, 2021
84ec6c7
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
11f49aa
Merge branch 'master' into fix_ddp_grad_size
oneflow-ci-bot Aug 11, 2021
cbcc038
auto format by CI
oneflow-ci-bot Aug 11, 2021
aa4fcda
Merge remote-tracking branch 'origin/fix_ddp_grad_size' into dev_flow…
Flowingsun007 Aug 11, 2021
1d95021
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
6294526
refine
Flowingsun007 Aug 11, 2021
e8724b5
add ddp test
Flowingsun007 Aug 11, 2021
02a7c7e
Merge branch 'dev_flow.utils.data-part3' of https://github.com/Oneflo…
Flowingsun007 Aug 11, 2021
50ab414
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 11, 2021
57506ef
auto format by CI
oneflow-ci-bot Aug 11, 2021
1c6aa4d
rm test case
Flowingsun007 Aug 12, 2021
546868a
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
7cbd439
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
423cf08
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
9492712
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
a435c97
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
7d4fc04
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
91a31a9
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
a0cf4ae
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 12, 2021
089f0cf
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 12, 2021
bd566d1
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 12, 2021
efa9526
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 13, 2021
65fbd06
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 13, 2021
2dceb82
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 13, 2021
0449fb2
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 13, 2021
52900ed
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 13, 2021
343d5e4
Merge branch 'master' into dev_flow.utils.data-part3
Flowingsun007 Aug 13, 2021
477c1e8
fix reshape
Flowingsun007 Aug 13, 2021
54f3497
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 13, 2021
c02c419
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 13, 2021
3fcf059
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 13, 2021
32d2d3d
Merge branch 'master' into dev_flow.utils.data-part3
oneflow-ci-bot Aug 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ OneFlow API Reference
linalg
image
optim
utils



Indices and tables
Expand Down
62 changes: 62 additions & 0 deletions docs/source/utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
oneflow.utils
===================================
Utils
----------------------------------
.. currentmodule:: oneflow.utils
.. automodule:: oneflow.utils.data
:members: DataLoader,
Dataset,
IterableDataset,
TensorDataset,
ConcatDataset,
Subset,
random_split,
Sampler,
SequentialSampler,
RandomSampler,
SubsetRandomSampler,
BatchSampler

.. currentmodule:: oneflow.utils
.. automodule:: oneflow.utils.data.distributed
:members: DistributedSampler

.. currentmodule:: oneflow.utils
.. automodule:: oneflow.utils.vision.datasets
:members: MNIST,
FashionMNIST,
CIFAR10,
CIFAR100,
ImageNet,
CocoCaptions,
CocoDetection,
VOCDetection,
VOCSegmentation,
DatasetFolder,
ImageFolder

.. currentmodule:: oneflow.utils
.. automodule:: oneflow.utils.vision.transforms
:members: Compose,
ToTensor,
PILToTensor,
ConvertImageDtype,
ToPILImage,
Normalize,
Resize,
Scale,
CenterCrop,
Pad,
Lambda,
RandomTransforms,
RandomApply,
RandomOrder,
RandomChoice,
RandomCrop,
RandomHorizontalFlip,
RandomVerticalFlip,
RandomResizedCrop,
RandomSizedCrop,
FiveCrop,
TenCrop,
InterpolationMode
148 changes: 148 additions & 0 deletions python/oneflow/test/dataloader/data_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
"""
Copyright 2020 The OneFlow Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""
import os
import oneflow as flow
import oneflow.utils.vision.transforms as transforms


def load_data_cifar10(
batch_size,
data_dir="./data-test/cifar10",
download=True,
transform=None,
source_url=None,
num_workers=0,
):
cifar10_train = flow.utils.vision.datasets.CIFAR10(
root=data_dir,
train=True,
download=download,
transform=transform,
source_url=source_url,
)
cifar10_test = flow.utils.vision.datasets.CIFAR10(
root=data_dir,
train=False,
download=download,
transform=transform,
source_url=source_url,
)

train_iter = flow.utils.data.DataLoader(
cifar10_train, batch_size=batch_size, shuffle=True, num_workers=num_workers
)
test_iter = flow.utils.data.DataLoader(
cifar10_test, batch_size=batch_size, shuffle=False, num_workers=num_workers
)
return train_iter, test_iter


def load_data_mnist(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这类多封装一层的函数是没有必要的吧。pytorch 没有。我们封装了,教育用户的成本很大。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,这个仅仅是在test_case里测试使用的,避免每个test case都写一遍同样的数据加载过程

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件是放在 test 目录下,作为 data_utils 的话没有问题

batch_size, resize=None, root="./data/mnist", download=True, source_url=None
):
"""Download the MNIST dataset and then load into memory."""
root = os.path.expanduser(root)
transformer = []
if resize:
transformer += [transforms.Resize(resize)]
transformer += [transforms.ToTensor()]
transformer = transforms.Compose(transformer)

mnist_train = flow.utils.vision.datasets.MNIST(
root=root,
train=True,
transform=transformer,
download=download,
source_url=source_url,
)
mnist_test = flow.utils.vision.datasets.MNIST(
root=root,
train=False,
transform=transformer,
download=download,
source_url=source_url,
)
train_iter = flow.utils.data.DataLoader(mnist_train, batch_size, shuffle=True)
test_iter = flow.utils.data.DataLoader(mnist_test, batch_size, shuffle=False)
return train_iter, test_iter


def get_fashion_mnist_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

resize=None, root="./data-test/fashion-mnist", download=True, source_url=None,
):
root = os.path.expanduser(root)
trans = []
if resize:
trans.append(transforms.Resize(resize))
trans.append(transforms.ToTensor())
transform = transforms.Compose(trans)

mnist_train = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=True,
transform=transform,
download=download,
source_url=source_url,
)
mnist_test = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=False,
transform=transform,
download=download,
source_url=source_url,
)
return mnist_train, mnist_test


# reference: http://tangshusen.me/Dive-into-DL-PyTorch/#/chapter03_DL-basics/3.10_mlp-pytorch
def load_data_fashion_mnist(
batch_size,
resize=None,
root="./data-test/fashion-mnist",
download=True,
source_url=None,
num_workers=0,
):
"""Download the Fashion-MNIST dataset and then load into memory."""
root = os.path.expanduser(root)
trans = []
if resize:
trans.append(transforms.Resize(resize))
trans.append(transforms.ToTensor())
transform = transforms.Compose(trans)

mnist_train = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=True,
transform=transform,
download=download,
source_url=source_url,
)
mnist_test = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=False,
transform=transform,
download=download,
source_url=source_url,
)

train_iter = flow.utils.data.DataLoader(
mnist_train, batch_size, shuffle=True, num_workers=num_workers
)
test_iter = flow.utils.data.DataLoader(
mnist_test, batch_size, shuffle=False, num_workers=num_workers
)
return train_iter, test_iter
20 changes: 6 additions & 14 deletions python/oneflow/test/dataloader/test_cifar_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import oneflow as flow
import oneflow.nn as nn
import oneflow.optim as optim
from data_utils import load_data_cifar10


classes = (
Expand Down Expand Up @@ -81,21 +82,19 @@ def test(test_case):
os.getenv("ONEFLOW_TEST_CACHE_DIR", "./data-test"), "cifar10"
)

trainset = flow.utils.vision.datasets.CIFAR10(
root=data_dir,
train=True,
train_iter, test_iter = load_data_cifar10(
batch_size=batch_size,
data_dir=data_dir,
download=True,
transform=transform,
source_url="https://oneflow-public.oss-cn-beijing.aliyuncs.com/datasets/cifar/cifar-10-python.tar.gz",
)
trainloader = flow.utils.data.DataLoader(
trainset, batch_size=batch_size, shuffle=False, num_workers=0
num_workers=0,
)

final_loss = 0
for epoch in range(1, train_epoch + 1): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 1):
for i, data in enumerate(train_iter, 1):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs = inputs.to(dtype=flow.float32, device=device)
Expand Down Expand Up @@ -130,10 +129,3 @@ def test_cifar_dataset(test_case):

if __name__ == "__main__":
unittest.main()
# 1 epoch training log
# epoch: 1 step: 2000 loss: 2.107
# epoch: 1 step: 4000 loss: 1.838
# epoch: 1 step: 6000 loss: 1.644
# epoch: 1 step: 8000 loss: 1.535
# epoch: 1 step: 10000 loss: 1.528
# epoch: 1 step: 12000 loss: 1.476
42 changes: 2 additions & 40 deletions python/oneflow/test/dataloader/test_fashion_mnist_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,42 +20,7 @@
import oneflow.unittest
import oneflow as flow
import oneflow.nn as nn


# reference: http://tangshusen.me/Dive-into-DL-PyTorch/#/chapter03_DL-basics/3.10_mlp-pytorch
def load_data_fashion_mnist(
batch_size, resize=None, root="./data/fashion-mnist", download=True, source_url=None
):
"""Download the Fashion-MNIST dataset and then load into memory."""
root = os.path.expanduser(root)
transformer = []
if resize:
transformer += [flow.utils.vision.transforms.Resize(resize)]
transformer += [flow.utils.vision.transforms.ToTensor()]
transformer = flow.utils.vision.transforms.Compose(transformer)

mnist_train = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=True,
transform=transformer,
download=download,
source_url=source_url,
)
mnist_test = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=False,
transform=transformer,
download=download,
source_url=source_url,
)
num_workers = 0
train_iter = flow.utils.data.DataLoader(
mnist_train, batch_size, shuffle=True, num_workers=num_workers
)
test_iter = flow.utils.data.DataLoader(
mnist_test, batch_size, shuffle=False, num_workers=num_workers
)
return train_iter, test_iter
from data_utils import load_data_fashion_mnist


def get_fashion_mnist_labels(labels):
Expand Down Expand Up @@ -124,7 +89,7 @@ def test(test_case):
)
source_url = "https://oneflow-public.oss-cn-beijing.aliyuncs.com/datasets/mnist/Fashion-MNIST/"
train_iter, test_iter = load_data_fashion_mnist(
batch_size, root=data_dir, download=True, source_url=source_url
batch_size, resize=None, root=data_dir, download=True, source_url=source_url
)
loss = nn.CrossEntropyLoss()
loss.to(device)
Expand Down Expand Up @@ -174,6 +139,3 @@ def test_fashion_mnist_dataset(test_case):

if __name__ == "__main__":
unittest.main()
# 1 epoch training log
# epoch 1, loss 0.0034, train acc 0.718, test acc 0.771, cost >>>>>>> 158.32699990272522(s)
# epoch 2, loss 0.0022, train acc 0.807, test acc 0.726, cost >>>>>>> 159.64465260505676(s)
46 changes: 1 addition & 45 deletions python/oneflow/test/dataloader/test_lenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import oneflow as flow
import oneflow.nn as nn
import oneflow.unittest
from data_utils import load_data_fashion_mnist


# reference: http://tangshusen.me/Dive-into-DL-PyTorch/#/chapter05_CNN/5.5_lenet
Expand Down Expand Up @@ -49,46 +50,6 @@ def forward(self, img):
return output


def load_data_fashion_mnist(
batch_size,
resize=None,
root="./data-test/fashion-mnist",
download=True,
source_url=None,
num_workers=0,
):
"""Download the Fashion-MNIST dataset and then load into memory."""
root = os.path.expanduser(root)
trans = []
if resize:
trans.append(flow.utils.vision.transforms.Resize(resize))
trans.append(flow.utils.vision.transforms.ToTensor())
transform = flow.utils.vision.transforms.Compose(trans)

mnist_train = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=True,
transform=transform,
download=download,
source_url=source_url,
)
mnist_test = flow.utils.vision.datasets.FashionMNIST(
root=root,
train=False,
transform=transform,
download=download,
source_url=source_url,
)

train_iter = flow.utils.data.DataLoader(
mnist_train, batch_size, shuffle=True, num_workers=num_workers
)
test_iter = flow.utils.data.DataLoader(
mnist_test, batch_size, shuffle=False, num_workers=num_workers
)
return train_iter, test_iter


def evaluate_accuracy(data_iter, net, device=None):
if device is None and isinstance(net, nn.Module):
device = list(net.parameters())[0].device
Expand Down Expand Up @@ -176,8 +137,3 @@ def test_lenet(test_case):

if __name__ == "__main__":
unittest.main()
# 1 epoch training log
# epoch 1, loss 1.1473, train acc 0.569, test acc 0.742, time 162.4 sec
# epoch 2, loss 0.5736, train acc 0.784, test acc 0.796, time 158.1 sec
# epoch 3, loss 0.4761, train acc 0.826, test acc 0.821, time 154.0 sec
# epoch 4, loss 0.4215, train acc 0.848, test acc 0.855, time 160.3 sec