Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TensorFlow plugin operation without GPU #3719

Merged
merged 8 commits into from
Mar 10, 2022

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Mar 7, 2022

  • TensorFlow DALI plugin doesn't work without GPU as it forces
    synchronization after copying out the output data to the TF tensor
    what invokes CUDA call. This PR removes this synchronization when
    we copy to the non-pinned CPU buffer and when the DALI TF operator
    is placed on the CPU

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

  • TensorFlow DALI plugin doesn't work without GPU as it forces
    synchronization after copying out the output data to the TF tensor
    what invokes CUDA call. This PR removes this synchronization when
    we copy to the non-pinned CPU buffer and when the DALI TF operator
    is placed on the CPU

Additional information:

Affected modules and functionalities:

  • DALI TF operator and dataset
  • CPU only tests

Key points relevant for the review:

  • tests

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

- TensorFlow DALI plugin doesn't work without GPU as it forces
  synchronization after copying out the output data to the TF tensor
  what invokes CUDA call. This PR removes this synchronization when
  we copy to the non-pinned CPU buffer and when the DALI TF operator
  is placed on the CPU

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Mar 7, 2022
return data

def get_data(batch_size, value):
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
pipe = get_dali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def test_dali_tf_op_cpu_only():
try:
tf.compat.v1.disable_eager_execution()
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except:
except Exception:

LGTM was complaining about using "except:" as a bad practice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



@pipeline_def()
def get_datali_pipe(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_datali_pipe(value):
def get_dali_pipe(value):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return data

def get_data(batch_size, value):
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
pipe = get_dali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

skip_for_incompatible_tf()
try:
tf.compat.v1.enable_eager_execution()
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except:
except Exception:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


batch_size = 3
value = random.randint(0, 1000)
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe = get_datali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)
pipe = get_dali_pipe(batch_size=batch_size, device_id=types.CPU_ONLY_DEVICE_ID, num_threads=1, value=value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

unsigned int wait_flag = (out_id == num_outputs - 1) ? DALI_ext_force_sync : DALI_ext_default;
// if the OP runs on the CPU the output memory is not pinned and we don't need to sync
unsigned int wait_flag = (i == dali_num_out - 1) ?
(this->device_type_ == device_type_t::CPU ? 0 : DALI_ext_force_sync) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(this->device_type_ == device_type_t::CPU ? 0 : DALI_ext_force_sync) :
(this->device_type_ == device_type_t::CPU ? DALI_ext_default : DALI_ext_force_sync) :

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

unsigned int wait_flag =  this->device_type_ != device_type_t::CPU && (i == dali_num_out - 1) ? DALI_ext_force_sync : DALI_ext_default;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Synchronize with the dataset()->stream_ when doing the last copy, so the outputs
// are fully finished before we release the output buffers for reuse.
unsigned int wait_flag = (i == dali_num_out - 1) ? DALI_ext_force_sync : DALI_ext_default;
// if the OP runs on the CPU the output memory is not pinned and we don't need to sync
unsigned int wait_flag = (i == dali_num_out - 1) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# CPU only test, remove CUDA from the search path just in case
export LD_LIBRARY_PATH=""
export PATH=${PATH/cuda/}
nosetests --verbose test_dali_cpu_only.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nosetests --verbose test_dali_cpu_only.py
nosetests --verbose -m '(?:^|[\b_\./-])[Tt]est.*pytorch*' test_dali_cpu_only.py

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we use --attr 'pytorch' vs. --attr '!pytorch' ? Right now we are running all tests under test_pytorch.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jantonguirao jantonguirao self-assigned this Mar 8, 2022
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4093192]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4106587]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4106908]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4107712]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4106908]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4108281]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4108281]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4111696]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4111696]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4116022]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4116022]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4117029]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4117029]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4117747]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4117747]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4118800]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4118800]: BUILD PASSED

@JanuszL JanuszL merged commit ff497cc into NVIDIA:main Mar 10, 2022
@JanuszL JanuszL deleted the fix_cpu_only_tf branch March 10, 2022 18:19
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
- TensorFlow DALI plugin doesn't work without GPU as it forces
  synchronization after copying out the output data to the TF tensor
  what invokes CUDA call. This PR removes this synchronization when
  we copy to the non-pinned CPU buffer and when the DALI TF operator
  is placed on the CPU

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
- TensorFlow DALI plugin doesn't work without GPU as it forces
  synchronization after copying out the output data to the TF tensor
  what invokes CUDA call. This PR removes this synchronization when
  we copy to the non-pinned CPU buffer and when the DALI TF operator
  is placed on the CPU

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
important-fix Fixes an important issue in the software or development environment.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants