Don't serialize dcp objects#43
Conversation
|
It would be really really really valuable to have CI tests which ensure job deploy succeeds, the work function executed inside a worker gets the correct slice data / arguments, and results are processed correctly. However that's out of the scope of this PR so I'm just leaving it as a general comment #44 |
|
LGTM @JosephAcernese -- please get a review+approval from one of the Distributive folks before merging |
|
LGTM! |
| serialized_input_data = self.js_ref.jobInputData | ||
| if hasattr(self.jobArguments, 'js_ref') and dry.class_manager.reg.find_from_js_instance(self.jobArguments.js_ref): | ||
| serialized_arguments = self.jobArguments.js_ref | ||
| serialized_arguments = [self.jobArguments.js_ref] |
There was a problem hiding this comment.
This looks incorrect to me, doesn't this make serialized_arguments an array of array-likes when it should just be an array-like?
There was a problem hiding this comment.
There was a problem hiding this comment.
This is necessary because line 143 on this file expects serialzed_arguments to be concatenated with other PYTHON lists
self.js_ref.jobArguments = [offset_to_argument_vector] + ["gzImage", job_fs] + env_args + serialized_arguments + [meta_arguments]
This solution did seem a little fishy to me in some way, but I was getting the correct behaviour with single objects (such as RemoteDataSet) properly converting to single and/or multiple job args
There might be other cleaner more intuitive solutions than just slapping it in an array as well, my justification for this though was that the other branch of logic for job args always leaves serialized_arguments as an array
There was a problem hiding this comment.
I don't understand why this would work, for ex. [range(1,10)] is a list containing a range, it doesn't flatten the iterable
Also we shouldn't be using + to concatenate iterables since it only works for concatenating lists, but that's maybe out of scope of this PR and we have to deal with the different iterator interfaces between js and python 🥴
Maybe we should instead use itertools.chain, and write a small iterator wrapper which converts a js iterator into a python iterable if it's a js_ref and it has Symbol.iterator attribute
There are some minor errors in the compute_for and job wrappers. A particular one is that
self.js_ref.jobInputData.js_refshould always returns None but it is being used in the conditional statements for serialization.Nested objects should never need to be unwrapped multiple times in the call chain as the underlying JS objects should not be wrapped
jobis the bifrost2 wrapper forjobjob.jobInputDatais the bifrost2 wrapper forjobInputDatajob.js_refis the JS proxy forjobjob.js_ref.jobInputDataANDjob.jobInputData.js_refare the same JS proxy forjobInputDatajob.js_ref.jobInputData.js_refexists technically but direct access of an undefined attribute in JS returnsUndefined, which pythonmonkey then converts to return NoneThis PR also fixes the compute_for function not unwrapping potential DCP objects, and adds an example of remote data jobs since it fixes it