Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce node_input param #1854

Merged
merged 4 commits into from Dec 14, 2021
Merged

introduce node_input param #1854

merged 4 commits into from Dec 14, 2021

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented Dec 7, 2021

Proposed changes:

  • introduce node_input parameter to EvaluationResult.calculate_metrics() and EvaluationResult.wrong_examples()
  • introduce node_input as eval dataframe column with default value 'prediction'
  • (node_input value for 'perfect retriever' would be 'label')

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

Closes #1850

@tstadel tstadel marked this pull request as ready for review December 8, 2021 09:23
@tstadel
Copy link
Member Author

tstadel commented Dec 8, 2021

@julian-risch @tholor
What do you think about the 'node_input' parameter? For me it sounds more intuitive than anything containing 'open / closed domain' for choosing different inputs we evaluated on. However, it might be harder to refer to it if we're talking about 'open / closed domain' in the docs.

@julian-risch
Copy link
Member

I like the idea of that parameter. Maybe we can find names for the parameter values that explain both what happens (as of now with pipeline and labels) and when to use it (open / closed domain eval).
However, let's still consider alternatives. For example, we could store separate EvalResults for open and closed domain eval. In that case no change would be necessary in the calculate_metrics() and wrong_examples() method, right? Only in the pipeline.eval() method and in the generation of the eval report.

@tholor
Copy link
Member

tholor commented Dec 13, 2021

How would the signature of pipeline.eval() look like this with this parameter design?
Would we, by default, always create both predictions in the dataframe (from prediction and from label)?
I am just thinking where else this node_input param might appear and if it sounds intuitive in these places 🤔


For example, we could store separate EvalResults for open and closed domain eval. In that case no change would be necessary in the calculate_metrics() and wrong_examples() method, right? Only in the pipeline.eval() method and in the generation of the eval report.

I think there should be at least the option to run pipeline.eval() only once and have all results you need available. Returning two dataframes in this case sounds a bit unintuitive to me.

@tstadel
Copy link
Member Author

tstadel commented Dec 13, 2021

How would the signature of pipeline.eval() look like this with this parameter design?
Would we, by default, always create both predictions in the dataframe (from prediction and from label)?
I am just thinking where else this node_input param might appear and if it sounds intuitive in these places

Yes, node_input might not be what we want to pass to pipeline.eval(). Here we might rather pass a flag, that enables label node_input evaluation. prediction only would be the default. So node_input is designed to be an advanced param, that you don't need in the most common way (call pipeline.eval() and pass the result to pipeline.print_eval_report(), if there is 'label' node_input it'll be used, otherwise not)

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (as already discussed via call). 👍

@tstadel tstadel merged commit 57a0463 into master Dec 14, 2021
@tstadel tstadel deleted the eval_multi_node_output branch December 14, 2021 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more than one answer/document output per node in pipeline.eval()
3 participants