Refactor custom ops classes to use python_op_factory as base class by klecki · Pull Request #5338 · NVIDIA/DALI

klecki · 2024-02-27T19:59:34Z

Category: Refactoring Breaking change

Description:

python_op_factory was extended and documented. There is a new, optional internal_schema_name parameter, that will be used to retrieve schema and spec for argument handling on Python side, while allowing the original schema to be used for the purpose of exposing the documentation.

As both TFRecordReader and PythonFunction has several variants with different base implementation schemas, the base classes are converted into a class generator function. The new classes are marked as not generated, as the type hints are done manually for them.

This reduces the custom code to minimum, allowing the base class to handle all properties and some common arguments.
The argument handling is moved to base class by updating the kwargs before invoking the base class.

When a tfrec.Feature is encountered in the Python argument serialization layer, it is again processed by the tfrec.Feature constructor (as we do for other types, normalizing numbers with int() or float()). For this purpose a copy constructor is added and exposed in Pybind. An alternative would be to change the conversion to an identity function - that would keep the error in the spec.AddArg rather than moving it to constructor parameter matching.
Validation code is added to the operator, to check for matching type with better error message.

NumbaFunction was also reworked, inheriting directly from the python_op_factory.

Calls to _raise_no_current_pipeline were eliminated - this function doesn't exist!

Breaking change: There is no longer a PythonFunctionBase class in nvidia.dali.ops.

Additional information:

Affected modules and functionalities:

TFRecord, Python Function and Numba Function operators in Python

Key points relevant for the review:

Does it has a potential to break something?
Does the documentation render correctly?
What would be a cleaner way to have schema and internal schema? - otherwise we would need to rewrite the operators to not use two schemas - one for presentation and one for implementation.
Should we keep implementation detail of PythonFunctionBase alive?

As a followup a code that prohibits MIS would be a nice generalization for those custom implementations.

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

klecki · 2024-02-27T20:04:33Z

!build

dali-automaton · 2024-02-27T20:10:48Z

CI MESSAGE: [13125785]: BUILD STARTED

klecki · 2024-02-27T21:37:35Z

!build

dali-automaton · 2024-02-27T21:40:12Z

CI MESSAGE: [13127551]: BUILD STARTED

dali-automaton · 2024-02-27T23:13:40Z

CI MESSAGE: [13125785]: BUILD FAILED

dali-automaton · 2024-02-28T00:12:34Z

CI MESSAGE: [13127551]: BUILD FAILED

klecki · 2024-02-28T12:45:25Z

!build

dali-automaton · 2024-02-28T12:51:25Z

CI MESSAGE: [13145398]: BUILD STARTED

klecki · 2024-02-28T17:24:08Z

!build

dali-automaton · 2024-02-28T18:05:26Z

CI MESSAGE: [13151269]: BUILD STARTED

dali-automaton · 2024-02-29T00:27:04Z

CI MESSAGE: [13151269]: BUILD PASSED

mzient · 2024-03-04T16:39:25Z

dali/python/nvidia/dali/ops/_operators/python_function.py

+
+        def __init__(self, function, num_outputs=1, **kwargs):
+
+            # The layout need to be handled manually due to implementation detail


Suggested change

# The layout need to be handled manually due to implementation detail

# The layouts need to be handled manually due to implementation details

or

Suggested change

# The layout need to be handled manually due to implementation detail

# The layouts need to be handled manually due to an implementation detail

mzient · 2024-03-04T16:54:12Z

dali/python/nvidia/dali/ops/__init__.py

+        If it is not provided, the class is assumed to be `_generated=True`, otherwise, we mark
+        it as False - the user will provide a custom wrapper.


Making a (formerly) well-defined property a side effect of some other condition is making the code hard to read and understand. I'd rather have an explicit argument or override this attribute manually for those few operators that need it to be False.

mzient · 2024-03-04T16:55:38Z

dali/python/nvidia/dali/ops/__init__.py

+    Operator._internal_schema_name = internal_schema_name
+    # The class was generated using python_op_factory, and we don't expect custom wrapper.
+    # If needed, allow this tag to be overridden by an argument to this function
+    Operator._generated = internal_schema_name is None


What I've written at the declaration.

Now that this attribute is always there (?) we should probably get rid of those getattr(op, "_generated", None)

This still leaves the External Source without the use of python_op_factory, maybe I can look at it as a follow-up, but the ExternalSourceGroup captures the OperatorInstances directly, so this whole abstraction is not usable there at this point in time.

klecki · 2024-03-04T17:52:36Z

!build

dali-automaton · 2024-03-04T18:58:12Z

CI MESSAGE: [13264883]: BUILD STARTED

As _TFRecordReaderImpl is used twice with different schema parametrization, it is now generated in a function, inheriting from the same base class as all other ops. This reduces the custom code to minimum, allowing the base class to handle all properties and some common arguments. The argument handling is moved to base class by updating the kwargs before invoking the base class. When a tfrec.Feature is encountered in the Python argument serialization layer, it is again processed by the tfrec.Feature constructor (as we do for other types, normalizing numbers with int() or float()). For this purpose a copy constructor is added and exposed in Pybind. An alternative would be to change the conversion to an identity function - that would keep the error in the spec.AddArg rather than moving it to constructor parameter matching. Validation code is added to the operator, to check for matching type with better error message. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Refactor PythonFunctionBase class into a base class generator. Adjust TFRecord to use internal schema correctly. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2024-03-04T21:24:43Z

!build

dali-automaton · 2024-03-04T21:30:35Z

CI MESSAGE: [13268279]: BUILD STARTED

dali-automaton · 2024-03-04T22:02:35Z

CI MESSAGE: [13264883]: BUILD PASSED

dali-automaton · 2024-03-05T00:22:25Z

CI MESSAGE: [13268279]: BUILD PASSED

klecki marked this pull request as draft February 27, 2024 20:43

klecki changed the title ~~Refactor TFRecordReader to use python_op_factory as base class~~ Refactor custom ops classes to use python_op_factory as base class Feb 27, 2024

klecki force-pushed the use-base-class-in-ops branch from bb65224 to 3b93984 Compare February 27, 2024 21:20

klecki marked this pull request as ready for review February 27, 2024 21:35

dali-automaton assigned jantonguirao and mzient Feb 28, 2024

jantonguirao approved these changes Feb 28, 2024

View reviewed changes

NVIDIA deleted a comment from dali-automaton Feb 29, 2024

NVIDIA deleted a comment from JanuszL Feb 29, 2024

NVIDIA deleted a comment from dali-automaton Feb 29, 2024

mzient reviewed Mar 4, 2024

View reviewed changes

mzient approved these changes Mar 4, 2024

View reviewed changes

klecki added 3 commits March 4, 2024 21:51

Extend the python_op_factory with a internal schema parameter

c85ae00

Refactor PythonFunctionBase class into a base class generator. Adjust TFRecord to use internal schema correctly. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Fix nonexisting function call, handle numba

6884602

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki added 4 commits March 4, 2024 21:52

Linter

0157aed

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Improvements and layout fix

9784033

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Fix

7a3ec08

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Review

bad8044

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki force-pushed the use-base-class-in-ops branch from cf67060 to bad8044 Compare March 4, 2024 20:52

Remove unused code

546a308

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki merged commit ea16aff into NVIDIA:main Mar 5, 2024


		def __init__(self, function, num_outputs=1, **kwargs):

		# The layout need to be handled manually due to implementation detail

	# The layout need to be handled manually due to implementation detail
	# The layouts need to be handled manually due to implementation details

		If it is not provided, the class is assumed to be `_generated=True`, otherwise, we mark
		it as False - the user will provide a custom wrapper.

Conversation

klecki commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Category: Refactoring Breaking change

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

klecki commented Feb 27, 2024

Uh oh!

dali-automaton commented Feb 27, 2024

Uh oh!

klecki commented Feb 27, 2024

Uh oh!

dali-automaton commented Feb 27, 2024

Uh oh!

dali-automaton commented Feb 27, 2024

Uh oh!

dali-automaton commented Feb 28, 2024

Uh oh!

klecki commented Feb 28, 2024

Uh oh!

dali-automaton commented Feb 28, 2024

Uh oh!

klecki commented Feb 28, 2024

Uh oh!

dali-automaton commented Feb 28, 2024

Uh oh!

dali-automaton commented Feb 29, 2024

Uh oh!

mzient Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klecki Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

mzient Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

klecki Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

mzient Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

klecki Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

klecki commented Mar 4, 2024

Uh oh!

dali-automaton commented Mar 4, 2024

Uh oh!

klecki commented Mar 4, 2024

Uh oh!

dali-automaton commented Mar 4, 2024

Uh oh!

dali-automaton commented Mar 4, 2024

Uh oh!

dali-automaton commented Mar 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

klecki commented Feb 27, 2024 •

edited

Loading

mzient Mar 4, 2024 •

edited

Loading