Remove CUDA headers and generate stubs in runtime #2420

JanuszL · 2020-10-30T16:53:46Z

removes files that should not be in the repo due to license
adds a step that generates dynlink_cuda during the build based on the available headers

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It removes CUDA headers and generate stubs in runtime

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
removes files that should not be in the repo due to license
adds a step that generates dynlink_cuda during the build based on the available headers
Affected modules and functionalities:
build system
dynlink cuda
Key points relevant for the review:
NA
Validation and testing:
CI
Documentation (including examples):
NA

JIRA TASK: [DALI-1610]

dali-automaton · 2020-10-30T18:54:44Z

CI MESSAGE: [1749684]: BUILD STARTED

dali-automaton · 2020-10-30T19:13:33Z

CI MESSAGE: [1749684]: BUILD FAILED

lgtm-com · 2020-11-02T17:42:07Z

This pull request introduces 1 alert when merging 8ed0dabe24242d23b05c8d103e0b28f9eb4c7161 into 359a6a5 - view on LGTM.com

new alerts:

1 for Unused import

lgtm-com · 2020-11-02T19:48:09Z

This pull request introduces 1 alert when merging 77aa257045fa8ad2eeb8d5c5eeab6da887cf011d into 359a6a5 - view on LGTM.com

new alerts:

1 for Unused import

dali-automaton · 2020-11-02T20:15:27Z

CI MESSAGE: [1755428]: BUILD STARTED

dali-automaton · 2020-11-02T20:45:36Z

CI MESSAGE: [1755428]: BUILD FAILED

dali-automaton · 2020-11-03T09:24:52Z

CI MESSAGE: [1757622]: BUILD STARTED

mzient · 2020-11-03T09:47:55Z

tools/stub_generator/stub_codegen.py

+      raise Exception(str(diag))
+
+  args.output.write('#include <string>\n')
+  args.output.write('#include <cuda.h>\n')


Do we need that for other headers (nvcuvid, optical flow, etc...)?

All uses cuda.h defined types.
string is used symbol loader function.

My point is: we must include the relevant header anyway (e.g. to get the types) and it should include cuda.h if it's necessary, so this inclusion is redundant. Meanwhile, if we use this tool for something else entirely, then keeping cuda.h may be a problem.

Moving to json file.

mzient · 2020-11-03T09:49:35Z

tools/stub_generator/stub_codegen.py

+  args.output.write(prolog.format(args.unique_prefix))
+
+  for cursor in translation_unit.cursor.get_children():
+    if cursor.kind != clang.cindex.CursorKind.FUNCTION_DECL:


Can we check if there's a definition of that function in this translation unit?

Done, I hope.

mzient · 2020-11-03T09:50:00Z

tools/stub_generator/stub_codegen.py

+    if cursor.spelling not in config['functions']:
+      continue
+
+    with open(cursor.location.file.name, 'r', encoding='latin-1') as file:


latin-1? Why not utf-8?

Ok. Apparently cuda.h is not utf-8, xavier build yields strange result. Switching back.

mzient · 2020-11-03T10:19:51Z

tools/stub_generator/stub_codegen.py

+    if cursor.kind != clang.cindex.CursorKind.FUNCTION_DECL:
+      continue
+
+    if cursor.spelling not in config['functions'] or cursor.is_definition():


6/10. This would work as long as there's no declaration followed by definition. Also, I think that C allows multiple declarations of the same function (not that I expect that in the headers we're interested in).

Done, I hope.

dali-automaton · 2020-11-03T12:00:23Z

CI MESSAGE: [1757622]: BUILD FAILED

dali-automaton · 2020-11-03T16:52:11Z

CI MESSAGE: [1758673]: BUILD STARTED

jantonguirao · 2020-11-03T10:18:14Z

Acknowledgements.txt

@@ -2978,3 +2978,209 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
+
+========================


maybe say what it is that we are using?

Still, I'd add a line like - otherwise it looks like we're using the whole runtime
Dynamic loader generator for shared libraries.

TBH - I don't know if anyone is using such convention in the Acknowledgements. Usually it is this project is using parts of..., and the appropriate license is copied. Also we may use something else later and I don't think we will remember to update this file.

jantonguirao · 2020-11-03T10:21:54Z

include/dali/core/dynlink_cuda.h

+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at


I am wondering. What's the purpose of this file now?

We have cuInitChecked which opens/loads the lib itself.

jantonguirao · 2020-11-03T10:22:38Z

third_party/turing_of/nvOpticalFlowCommon.h

-#endif /* __cplusplus */
-
-#endif
+/*


what happened here? I understand this is not our file

I have updated the file and the license, also updated the line endings to be unix one.

jantonguirao · 2020-11-03T10:22:54Z

third_party/turing_of/nvOpticalFlowCuda.h

-
-#endif
+/*
+* Copyright(c) 2020, NVIDIA CORPORATION.All rights reserved.


jantonguirao · 2020-11-03T17:11:40Z

dali/operators/reader/nvdecoder/dynlink_nvcuvid.cc

+// it is defined in the generated file
+typedef void *tLoadSymbol(const std::string &name);
+void NvcuvidSetSymbolLoader(tLoadSymbol loader_func);
+void Nvcuvid_2SetSymbolLoader(tLoadSymbol loader_func);


why there's a _2 in the name?

There are two separated dynlink implementations, each have a separate *SetSymbolLoader function.

jantonguirao · 2020-11-03T17:12:46Z

dali/operators/reader/nvdecoder/dynlink_nvcuvid.cc

+  static std::unordered_map<std::string, void*> symbol_map;
+  std::lock_guard<std::mutex> lock(symbol_mutex);
+  auto it = symbol_map.find(name);
+  if (it == symbol_map.end()) {


I don't like that a function called "is symbol availave" is modying the symbol map. Can you explain why is that?

It lazy loads the symbols to the map only for the purpose of checking is it is available. It is not very frequently used functionality, so it won't happen very often.
I don't have a good idea how to move it to the generated file and not affect the perf.

Current implementation of generated functions is:

using FuncPtr = CUresult (CUDAAPI *)(CUresult, const char **); static auto func_ptr = reinterpret_cast<FuncPtr>(load_symbol_func("cuGetErrorString")); if (!func_ptr) return CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND; return func_ptr(error, pStr); }

If we add a map there we would need add a sync point for that.
We cannot easily expose the f_pointer itself as well from it to check the value.
No idea how to approach differently.

I was just commenting on the name. This function is actually loading symbols. Perhaps it should be called something like LoadSymbolIfNeeded to better reflect what it does

But it doesn't load the symbol either, in the way that the first invocation of the relevant function will call again LoadSymbol.
I would keep the name as it is as this is the purpose of the function - check if a function with the given name is available, not to load it. One day we may rework the stub implementation but the external API would rather stay the same.

jantonguirao · 2020-11-03T17:13:33Z

dali/operators/reader/nvdecoder/CMakeLists.txt

+
+# there is a one cuvidGetVideoSourceState function that doesn't follow the returned value convention
+# so need a dedicated stub for it
+set(NVCUVID_2_GENERATED_STUB "${CMAKE_CURRENT_BINARY_DIR}/dynlink_nvcuvid_2_gen.cc")


why do we have a _2 stub?

We have a one method that doesn't follow the common and we need a separate json for it.

Out wrappers returns CUresult and CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND is symbol is not available.
cuvidGetVideoSourceState returns cudaVideoState so we need to return cudaVideoState_Error in case of lack of the symbol.

Can't we add one more level of JSON that would aggregate functions, return type and error code? This doesn't seem too complicated - we'd have a dictionary that maps a function to a proper return type and error code and we'd generate stubs with these. I think we might run into this issue in the future.

dali-automaton · 2020-11-03T17:28:44Z

CI MESSAGE: [1758777]: BUILD STARTED

dali-automaton · 2020-11-03T18:33:37Z

CI MESSAGE: [1758777]: BUILD FAILED

dali-automaton · 2020-11-03T23:21:01Z

CI MESSAGE: [1760163]: BUILD STARTED

dali-automaton · 2020-11-04T00:57:57Z

CI MESSAGE: [1760163]: BUILD FAILED

dali-automaton · 2020-11-09T22:25:26Z

CI MESSAGE: [1777411]: BUILD FAILED

dali-automaton · 2020-11-09T22:28:32Z

CI MESSAGE: [1777423]: BUILD FAILED

dali-automaton · 2020-11-10T06:50:41Z

CI MESSAGE: [1782650]: BUILD STARTED

dali-automaton · 2020-11-10T07:03:39Z

CI MESSAGE: [1782661]: BUILD STARTED

dali-automaton · 2020-11-10T07:33:20Z

CI MESSAGE: [1782650]: BUILD FAILED

dali-automaton · 2020-11-10T08:51:41Z

CI MESSAGE: [1782661]: BUILD PASSED

jantonguirao · 2020-11-10T09:03:10Z

dali/operators/reader/nvdecoder/dynlink_nvcuvid.cc

+  static std::unordered_map<std::string, void*> symbol_map;
+  std::lock_guard<std::mutex> lock(symbol_mutex);
+  auto it = symbol_map.find(name);
+  if (it == symbol_map.end()) {


I was just commenting on the name. This function is actually loading symbols. Perhaps it should be called something like LoadSymbolIfNeeded to better reflect what it does

jantonguirao · 2020-11-10T09:06:09Z

tools/stub_generator/cuda.json

+      "cuGetErrorName": {},
+      "cuGetErrorString": {},
+      "cuDriverGetVersion": {},
+      "cuDeviceGetCount": {},


Just a question: Can this be somehow generated (offline is fine) by listing the symbols from the library? I am wondering if we need to manually add any new symbol that we'd like to use

That's something I've already suggested - to have a regex pattern. But maybe let's merge it as-is and improve later - at least more people will have an opportunity to do that when they have the code in their working copies.

We can try to regexp inside stub_codegen.py. But we can think about this in the next step, now it provides the same functionality as we used to have - hardcoded list of supported functions.

jantonguirao · 2020-11-10T09:08:23Z

tools/stub_generator/stub_codegen.py

+  extra_args = args.extra_args
+
+  translation_unit = index.parse(header, args=extra_args)
+


I see a mix of 2-space and 4-space indentations here. Can you unify?

dali-automaton · 2020-11-10T09:38:43Z

CI MESSAGE: [1783091]: BUILD STARTED

- removes files that should not be in the repo due to license - adds a step that generates dynlink_cuda during the build based on the available headers Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2020-11-10T11:23:53Z

CI MESSAGE: [1783091]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2020-11-10T11:36:34Z

CI MESSAGE: [1783362]: BUILD STARTED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2020-11-10T13:21:59Z

CI MESSAGE: [1783647]: BUILD STARTED

dali-automaton · 2020-11-10T13:45:16Z

CI MESSAGE: [1783647]: BUILD FAILED

dali-automaton · 2020-11-10T15:06:38Z

CI MESSAGE: [1783362]: BUILD PASSED

dali-automaton · 2020-11-10T16:21:12Z

CI MESSAGE: [1783647]: BUILD PASSED

JanuszL force-pushed the move_to_stubs branch from cd59368 to 645ab70 Compare October 30, 2020 17:03

JanuszL force-pushed the move_to_stubs branch 2 times, most recently from dc49beb to 77aa257 Compare November 2, 2020 19:38

JanuszL force-pushed the move_to_stubs branch from 77aa257 to df4e19f Compare November 2, 2020 19:48

JanuszL force-pushed the move_to_stubs branch from df4e19f to f19fd54 Compare November 2, 2020 20:48

mzient reviewed Nov 3, 2020

View reviewed changes

JanuszL force-pushed the move_to_stubs branch 3 times, most recently from ad1250e to 0966726 Compare November 3, 2020 16:25

jantonguirao reviewed Nov 3, 2020

View reviewed changes

JanuszL force-pushed the move_to_stubs branch 4 times, most recently from 127fbe9 to 5d847f7 Compare November 3, 2020 22:36

jantonguirao approved these changes Nov 10, 2020

View reviewed changes

mzient approved these changes Nov 10, 2020

View reviewed changes

JanuszL added 9 commits November 10, 2020 12:23

Remove CUDA headers and generate stubs in runtime

6d94300

- removes files that should not be in the repo due to license - adds a step that generates dynlink_cuda during the build based on the available headers Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

More work

a26194a

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Code review fixes

1b22a6d

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

More fixes

56f68d3

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

More fixes

8066fa2

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Review fixes

201ac25

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Review fixes

9ffb67f

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Docs udpate

9a6280a

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Review fixes

7e78f5f

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix QNX build

0b6c651

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the move_to_stubs branch from 60c02c9 to 0b6c651 Compare November 10, 2020 11:25

Review fixes

ed54acb

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

mzient approved these changes Nov 10, 2020

View reviewed changes

JanuszL merged commit a2ad452 into NVIDIA:master Nov 10, 2020

JanuszL deleted the move_to_stubs branch November 10, 2020 16:51

		extra_args = args.extra_args

		translation_unit = index.parse(header, args=extra_args)

Remove CUDA headers and generate stubs in runtime #2420

Remove CUDA headers and generate stubs in runtime #2420

Conversation

JanuszL commented Oct 30, 2020

Why we need this PR?

What happened in this PR?

dali-automaton commented Oct 30, 2020

dali-automaton commented Oct 30, 2020

lgtm-com bot commented Nov 2, 2020

lgtm-com bot commented Nov 2, 2020

dali-automaton commented Nov 2, 2020

dali-automaton commented Nov 2, 2020

dali-automaton commented Nov 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Nov 3, 2020

dali-automaton commented Nov 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Nov 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Nov 3, 2020

dali-automaton commented Nov 3, 2020

dali-automaton commented Nov 3, 2020

dali-automaton commented Nov 4, 2020

dali-automaton commented Nov 9, 2020

dali-automaton commented Nov 9, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

dali-automaton commented Nov 10, 2020

JanuszL Nov 3, 2020 •

edited