-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return specified intermediate outputs #104
Comments
I like the way to dump the intermediate outputs like this. One question about adding a general output field to the JSON. What is the difference between "general output field" and "intermediate output"? They seem serving the same purpose. In this regard, why not just name the "intermediate_output" to "output"? |
I like the approach of specifying the outputs in the JSON file, but I want to suggest a slightly different JSON structure. The concepts would be:
Then, in the JSON I would do the following:
Some examples of possible specifications:
Notice how the first three examples are completely equivalent, and only the last one introduces an alternative output. Now, the behavior will be: when executing the pipeline, the And the internal behavior will be:
Finally, when returning, if the output specification ends up having a single element, that element will be returned alone. Following this specification, if a pipeline is created using the last output example above, all these calls would be valid: # return the default output, which is the y in the last primitive
anomalies = pipeline.predict(X)
anomalies = pipeline.predict(X, output_="default")
# return ONLY the debug outputs
X, y, target_index, y_hat = pipeline.predict(X, output_="debug")
# return BOTH the default and the debug outputs
anomalies, X, y, target_index, y_hat = pipeline.predict(X, output_=["default", "debug"])
# return ONLY one variable, y_hat
y_hat = pipeline.predict(X, output_="keras.Sequential.LSTMTimeSeriesRegressor#1.y_hat")
# return the default output and also one variable
y_hat = pipeline.predict(X, output_=["default", "keras.Sequential.LSTMTimeSeriesRegressor#1.y_hat"]) On a side note, the "get the whole context" behavior from the current implementation should be kept. This means that, even though the JSON specification will always require the |
Here is an additional proposal on top of the previous one. A part from specifying the outputs in the JSON file as a single string, allow them to be specified as a dictionary with two entries:
On top of that, add these two methods to the MLPipeline object:
For example, if the pipeline JSON specifies:
One can do:
And, potentially:
|
Description
We want to introduce a way of specifying exactly which variable(s) from which primitive(s) should be returned. This way we would have the ability to get multiple intermediate outputs from the pipeline without needing to return the whole context.
Possible approach
We let the user define in the Pipeline JSON which intermediate outputs he wants to see. Using that information, MLBlocks keeps track of the outputs while iterating over the primitives and returns a dictionary containing all the outputs.
The JSON could look like:
Also, we might want to add a general output field to the JSON, where the user can specify what the last output of the pipeline will be and that will be returned as an array.
Then we would have the general output of the pipeline and the intermediate outputs.
@csala you already had some specifics about the implementation in mind, so please let me know what you think about it and how you would do it.
The text was updated successfully, but these errors were encountered: