-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance due copying of instances #987
Comments
To me operators should modify dictionaries in place, the reason is that that's how most people work with dictionaries. In cases where this creates confusion we should avoid it. Also I think we should have a separate issue for creating performance benchmark to unitxt we run on different pipelines, so we can keep track over performance across PRs. |
The problem is when operators modify nested dictionary instance: Then we have an operator Copy(field="predictions", to_field="orig_predictions"} instance: Now we do Copy(field = "references/0/a" to, "predictions/a") instance: How do you suggest we handle it? |
So, until that metric that modifies |
Addressed at #1087 |
There is no bug in dict_utils. Dict_utils performs exactly what described in its docstring, and dict_utils has nothing to do with the case described here above. |
Current dict_utils performs overwrite of dictionaries:
This caused problems because operators made changes to data of their source and it cause conflict with other operators. To workaround this, we now copy instances :
However this cause performance issue with large datasets (6x runtime increase) We should change the code such that new instances are returned with shallow copy - and only relevant atomic are copied.
The text was updated successfully, but these errors were encountered: