-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust tf code for experimental Dataset with inputs #2993
Adjust tf code for experimental Dataset with inputs #2993
Conversation
* Add dynamic .experimental module to nvidia.dali.plugin.tf * Add placeholder Dataset * Expose it in documentation * Adjust the TF Dataset tests for custom pipeline/dataset definition. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [2410986]: BUILD STARTED |
This pull request introduces 1 alert when merging 7bd651d into fdf10f6 - view on LGTM.com new alerts:
|
The alert reported here is for explicit NotImplementedError for the placeholder Dataset which I will be supporting in the follow-up PRs. |
CI MESSAGE: [2410986]: BUILD FAILED |
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [2411439]: BUILD STARTED |
This pull request introduces 1 alert when merging a79586d into fdf10f6 - view on LGTM.com new alerts:
|
CI MESSAGE: [2411439]: BUILD PASSED |
if device == 'gpu': | ||
shard_outputs.append( | ||
tuple(result.as_cpu().as_array() for result in pipe_outputs)) | ||
else: | ||
shard_outputs.append( | ||
tuple(result.as_array() for result in pipe_outputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if device == 'gpu': | |
shard_outputs.append( | |
tuple(result.as_cpu().as_array() for result in pipe_outputs)) | |
else: | |
shard_outputs.append( | |
tuple(result.as_array() for result in pipe_outputs)) | |
shard_outputs.append(tuple(test_utils.to_array(result) for result in pipe_outputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
dali/python/nvidia/dali/plugin/tf.py
Outdated
|
||
Each of the input datasets must be mapped to a :meth:`~nvidia.dali.fn.external_source` operator | ||
that will represent the input to the DALI pipeline. The input is represented by the ``name`` | ||
parameter of :meth:`~nvidia.dali.fn.external_source` and needs to be provided for corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter of :meth:`~nvidia.dali.fn.external_source` and needs to be provided for corresponding | |
parameter of :meth:`~nvidia.dali.fn.external_source` and needs to be provided to the corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/python/nvidia/dali/plugin/tf.py
Outdated
|
||
|
||
.. warning:: | ||
This version of the class is just API placeholder and the functionality is not yet implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version of the class is just API placeholder and the functionality is not yet implemented. | |
This version of the class is just an API placeholder and the functionality is not yet implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This version of the class is just API placeholder and the functionality is not yet implemented. | ||
|
||
The operator adds additional parameters to the ones supported by the | ||
:class:`~nvidia.dali.plugin.tf.DALIDataset`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use double back quotes (``). It is used earlier in the string but starting from here I see only single back quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different use - it is to insert a link to the DALIDataset, so it has to be single AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the Parameters below, I kept it uniform with the current Dataset doc, we could skip them completely there, but I think it should be just one PR cleaning our docs (we have a mix of those).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can wrap in double backticks the names used in the text of the doc. 👍
dali/python/nvidia/dali/plugin/tf.py
Outdated
as well to provide the mapping of input datasets to `External Source` nodes. | ||
`input_names` : `str` or `tuple` of `str`, optional, default = None | ||
names of inputs to the DALI Pipeline. Must match arity of the `input_datasets`. | ||
`input_datasets[i]` will be provided to `External Source` of the name `input_names[i]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`input_datasets[i]` will be provided to `External Source` of the name `input_names[i]`. | |
`input_datasets[i]` will be provided to the external source instance with the name `input_names[i]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/python/nvidia/dali/plugin/tf.py
Outdated
Sample mode means that every input dataset is treated as if providing individual samples. | ||
DALIDataset will query the inputs dataset `batch_size`-times to build a batch that would | ||
be fed into the DALI Pipeline. | ||
In sample mode, each sample produced by the input dataset can have different shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In sample mode, each sample produced by the input dataset can have different shape, | |
In sample mode, each sample produced by the input dataset can have a different shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/python/nvidia/dali/plugin/tf.py
Outdated
DALIDataset will query the inputs dataset `batch_size`-times to build a batch that would | ||
be fed into the DALI Pipeline. | ||
In sample mode, each sample produced by the input dataset can have different shape, | ||
but not number of dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not number of dimensions. | |
but not a different number of dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
class DALIDatasetWithInputs(dataset_ops._OptionsDataset): | ||
# @functools.wraps(_DALIDatasetV2.__init__) | ||
def __init__(self, pipeline, **kwargs): | ||
raise NotImplementedError("DALIDatasetWithInputs is only a placeholder operator.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("DALIDatasetWithInputs is only a placeholder operator.") | |
raise NotImplementedError("DALIDatasetWithInputs is not implemented.") |
just a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error type is already NotImplemented, I would keep some explanation, it will be removed soon. (I hope).
# Respective signatures: | ||
# get_pipeline_desc(batch_size, num_threads, device, device_id, shard_id, num_shards, | ||
# def_for_dataset) -> nvidia.dali.pipeline, shapes, dtypes | ||
# `def_for_dataset` - indicates if this function is invoked to create a basaline standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# `def_for_dataset` - indicates if this function is invoked to create a basaline standalone | |
# `def_for_dataset` - indicates if this function is invoked to create a baseline standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# ################################################################################################ # | ||
|
||
def get_image_pipeline(batch_size, num_threads, device, device_id=0, shard_id=0, num_shards=1, | ||
def_for_dataset=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to grasp the meaning of def_for_dataset
without an implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a need to have a slightly different definition for a pipeline that is run standalone vs the one used in TF (namely the need to do:
fn.external_source(name="...") # tf-wrapped one
fn.external_source(source=...) # DALI-only one
I could probably introduce explicit feed-inputs step as well, but this is easier and probably more generic.
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
This pull request introduces 1 alert when merging d0fa545 into 4f4bf48 - view on LGTM.com new alerts:
|
!build |
CI MESSAGE: [2419541]: BUILD STARTED |
CI MESSAGE: [2419541]: BUILD FAILED |
!build |
CI MESSAGE: [2420154]: BUILD STARTED |
This pull request introduces 1 alert when merging e413353 into 30e9705 - view on LGTM.com new alerts:
|
CI MESSAGE: [2420154]: BUILD FAILED |
CI MESSAGE: [2420154]: BUILD PASSED |
definition.
Signed-off-by: Krzysztof Lecki klecki@nvidia.com
Why we need this PR?
What happened in this PR?
Dynamic nvidia.dali.plugin.tf.experimental module added for the placeholder Dataset.
Tests extended to allow for custom pipeline/dataset pair to be specified.
tf api & docs, tf-dataset tests
Mostly the tf api "changes"?
CI, local testing, 👀 for docs
Yes
JIRA TASK: [Use DALI-XXXX or NA]