Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict create erroneous prediction if you use input created from formula (or edit domain or random data) #6832

Closed
jcmhk opened this issue Jun 13, 2024 · 3 comments
Assignees
Labels
bug report Bug is reported by user, not yet confirmed by the core team

Comments

@jcmhk
Copy link

jcmhk commented Jun 13, 2024

If you use predict with input data created from formula or randomize or with edit domain the result is incorrect.
bug_prediction.zip

I added prompt in source code of predict on my computer:
in function call_predictors of owpredictions.py
it s look like classless_data.domain.attributes are different than predictor.domain in particular on the value of '_compute_value'.
I was unable to adjust this value because I don't know the orange library well enough. I ajusted '_number_of decimal' but it was not the
problem.
I am working on orange 3.35 3.36 and 3.37

@jcmhk jcmhk added the bug report Bug is reported by user, not yet confirmed by the core team label Jun 13, 2024
@lanzagar
Copy link
Contributor

This is actually not a bug, although it can be very confusing. Maybe there is some additional warning that could guide users to a solution.

What happens is that Formula (and some other transformations) create a new variable that can have the same name, but is not the same object. As you noted, they have a different compute_value.
When Orange tries to use a dataset for predictions it checks that the variables are the same as the data the model was trained on. It can't just match the variables by names, because a normalized columns is not the same as the original one for example.

To tell Orange to forget about transformations (the compute_value) and "reset" a variable (=same effect as saving the data to a file and reading it again), you can use Edit Domain: select the variable and check "Unlink variable from its source variable".
The tooltip provides this explanation:

Make Orange forget that the variable is derived from another.
Use this for instance when you want to consider variables with the same name but from different sources as the same variable.

@jcmhk
Copy link
Author

jcmhk commented Jun 14, 2024

Thank you. Can I submit a pull request to display the warning?

@lanzagar
Copy link
Contributor

Sure. But before you spend too much time preparing the PR, maybe propose a suggestion where and when you would show the warning so we can discuss the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team
Projects
None yet
Development

No branches or pull requests

2 participants