Add MNIST example for DALI and PyTorch Lightning #2360

JanuszL · 2020-10-14T15:26:49Z

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It adds MNIST example for DALI and PyTorch Lightning

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
new example of DALI + PyTorch Lightning integration
Affected modules and functionalities:
examples
Key points relevant for the review:
NA
Validation and testing:
CI
Documentation (including examples):
new example is added

JIRA TASK: [DALI-1660]

review-notebook-app · 2020-10-14T15:26:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

JanuszL · 2020-10-14T15:27:38Z

!build

dali-automaton · 2020-10-14T15:30:35Z

CI MESSAGE: [1701254]: BUILD STARTED

klecki · 2020-10-14T17:07:10Z

I didn't read the whole thing, but how about bringing the fn api to the PyTorch community?

dali-automaton · 2020-10-14T18:51:53Z

CI MESSAGE: [1701254]: BUILD FAILED

jantonguirao · 2020-10-15T07:08:11Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+    "\n",
+    "This example shows how to use DALI in PyTorch Lightning.\n",
+    "\n",
+    "Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) of the clasification network and let us see how DALI can accelerate it.\n",


Suggested change

"Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) of the clasification network and let us see how DALI can accelerate it.\n",

"Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) showcasing a classification network and see how DALI can accelerate it.\n",

jantonguirao · 2020-10-15T07:10:33Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+    "\n",
+    "Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) of the clasification network and let us see how DALI can accelerate it.\n",
+    "\n",
+    "DALI_EXTRA_PATH environment variable should point to the place where data from DALI extra repository is downloaded. Please make sure that the proper release tag is checked out."


Suggested change

"DALI_EXTRA_PATH environment variable should point to the place where data from DALI extra repository is downloaded. Please make sure that the proper release tag is checked out."

"The DALI_EXTRA_PATH environment variable should point to a [DALI extra](https://github.com/NVIDIA/DALI_extra) copy. Please make sure that the proper release tag, the one associated with your DALI version, is checked out."

jantonguirao · 2020-10-15T07:11:54Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let us implement the bare training class with the native data loader"


Suggested change

"Now let us implement the bare training class with the native data loader"

"We will start by implement a training class that uses the native data loader"

I feel there's too much of "let us ..."

Let me fix it...

jantonguirao · 2020-10-15T07:16:50Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let us define a DALI pipeline which would load the data."


The next step is to define a DALI pipeline that will be used for loading and pre-processing data.

jantonguirao · 2020-10-15T07:29:44Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Add DALI to the data preparation step in the training class and adjust the `process_batch` to accept the data returned by DALIClassificationIterator, which returns a list of dictionaries, where each list element corresponds to one pipeline wrapped by the DALIIterator, and entries in the dictionary corresponds to the relevant outputs. Check for details in the DALIGenericIterator documenation."


Now we are ready to modify the training class to use the DALI pipeline we have just defined. Because we want to integrate with PyTorch, we wrap our pipeline with a PyTorch DALI iterator, that can replace the native data loader with some minor changes in the code. The DALI iterator returns a list dictionaries, where each element in the list corresponds to a pipeline instance, and the entries in the dictionary map to the outputs of the pipeline. For more information, check the documentation of DALIGenericIterator.

jantonguirao · 2020-10-15T07:37:34Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+    "        num_shards = self.trainer.world_size\n",
+    "        mnist_pipeline = MnistPipeline(BATCH_SIZE, device='cpu', device_id=device_id, shard_id=shard_id, num_shards=num_shards, num_threads=8)\n",
+    "\n",
+    "        class LightingWrapper(DALIClassificationIterator):\n",


There are several typos Lighting -> Lightning. Please search and replace.

jantonguirao · 2020-10-15T07:42:38Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let us provide the custom DALI iterator wrapper so we don't have to do any extra processing inside `LitMNIST.process_batch`, also PyTorch can learn how big is the dataset"


Suggested change

"Now let us provide the custom DALI iterator wrapper so we don't have to do any extra processing inside `LitMNIST.process_batch`, also PyTorch can learn how big is the dataset"

"For even better integration, we can provide a custom DALI iterator wrapper so that no extra processing is required inside `LitMNIST.process_batch`. Also, PyTorch can learn the size of the dataset this way.

jantonguirao · 2020-10-15T07:43:16Z

qa/TL1_jupyter_plugins/test_pytorch.sh

@@ -2,7 +2,7 @@

 # used pip packages
 # TODO(janton): remove explicit pillow version installation when torch fixes the issue with PILLOW_VERSION not being defined


just wondering, should we try to remove the version pin on pillow?

jantonguirao · 2020-10-15T07:43:49Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let us rerun the training:"


Suggested change

"Now, let us rerun the training:"

"Let us run the training one more time:"

jantonguirao · 2020-10-15T07:44:04Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+    "            def __next__(self):\n",
+    "                out = super().__next__()\n",
+    "                # DALIClassificationIterator calls next during the construction\n",
+    "                # so first brach would be already converted to a list not a dict, no need to post process it\n",


Suggested change

" # so first brach would be already converted to a list not a dict, no need to post process it\n",

" # so first batch would be already converted to a list not a dict, no need to post process it\n",

JanuszL · 2020-10-15T09:32:35Z

!build

dali-automaton · 2020-10-15T09:35:36Z

CI MESSAGE: [1703974]: BUILD STARTED

jantonguirao · 2020-10-15T09:42:02Z

docs/examples/frameworks/pytorch/pytorch-lightning.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us run the training one more timemodel = BetterDALILitMNIST()"


Suggested change

"Let us run the training one more timemodel = BetterDALILitMNIST()"

"Let us run the training one more time"

Missed that. Done

dali-automaton · 2020-10-15T20:30:45Z

CI MESSAGE: [1703974]: BUILD PASSED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-10-16T13:42:05Z

!build

dali-automaton · 2020-10-16T14:25:31Z

CI MESSAGE: [1708056]: BUILD STARTED

dali-automaton · 2020-10-16T18:00:25Z

CI MESSAGE: [1708056]: BUILD PASSED

jantonguirao reviewed Oct 15, 2020

View reviewed changes

jantonguirao approved these changes Oct 15, 2020

View reviewed changes

JanuszL added 4 commits October 16, 2020 14:51

Add MNIST example for DALI and PyTorch Lightning

27b707a

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Code review fixes

2d969a8

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

More code review fixes

be2c8ad

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Add __len__ method to the base iterator class

b31e9bc

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the pytorch_lighting branch from 62bf0c0 to b31e9bc Compare October 16, 2020 13:38

awolant approved these changes Oct 16, 2020

View reviewed changes

JanuszL merged commit 9098679 into NVIDIA:master Oct 16, 2020

JanuszL deleted the pytorch_lighting branch October 16, 2020 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MNIST example for DALI and PyTorch Lightning #2360

Add MNIST example for DALI and PyTorch Lightning #2360

JanuszL commented Oct 14, 2020

review-notebook-app bot commented Oct 14, 2020

JanuszL commented Oct 14, 2020

dali-automaton commented Oct 14, 2020

klecki commented Oct 14, 2020

dali-automaton commented Oct 14, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

JanuszL commented Oct 15, 2020

dali-automaton commented Oct 15, 2020

jantonguirao Oct 15, 2020

JanuszL Oct 15, 2020

dali-automaton commented Oct 15, 2020

JanuszL commented Oct 16, 2020

dali-automaton commented Oct 16, 2020

dali-automaton commented Oct 16, 2020

	"Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) of the clasification network and let us see how DALI can accelerate it.\n",
	"Let us grab [a toy example](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) showcasing a classification network and see how DALI can accelerate it.\n",

	"DALI_EXTRA_PATH environment variable should point to the place where data from DALI extra repository is downloaded. Please make sure that the proper release tag is checked out."
	"The DALI_EXTRA_PATH environment variable should point to a [DALI extra](https://github.com/NVIDIA/DALI_extra) copy. Please make sure that the proper release tag, the one associated with your DALI version, is checked out."

	"Now let us implement the bare training class with the native data loader"
	"We will start by implement a training class that uses the native data loader"

	"Now let us provide the custom DALI iterator wrapper so we don't have to do any extra processing inside `LitMNIST.process_batch`, also PyTorch can learn how big is the dataset"
	"For even better integration, we can provide a custom DALI iterator wrapper so that no extra processing is required inside `LitMNIST.process_batch`. Also, PyTorch can learn the size of the dataset this way.

		@@ -2,7 +2,7 @@

		# used pip packages
		# TODO(janton): remove explicit pillow version installation when torch fixes the issue with PILLOW_VERSION not being defined

	"Now, let us rerun the training:"
	"Let us run the training one more time:"

	" # so first brach would be already converted to a list not a dict, no need to post process it\n",
	" # so first batch would be already converted to a list not a dict, no need to post process it\n",

	"Let us run the training one more timemodel = BetterDALILitMNIST()"
	"Let us run the training one more time"

Add MNIST example for DALI and PyTorch Lightning #2360

Add MNIST example for DALI and PyTorch Lightning #2360

Conversation

JanuszL commented Oct 14, 2020

Why we need this PR?

What happened in this PR?

review-notebook-app bot commented Oct 14, 2020

JanuszL commented Oct 14, 2020

dali-automaton commented Oct 14, 2020

klecki commented Oct 14, 2020

dali-automaton commented Oct 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL commented Oct 15, 2020

dali-automaton commented Oct 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Oct 15, 2020

JanuszL commented Oct 16, 2020

dali-automaton commented Oct 16, 2020

dali-automaton commented Oct 16, 2020