Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only the attributes written by the last tagger in the tagger list gets written in version 1.0.0 #113

Closed
peterbjorgensen opened this issue Feb 6, 2024 · 0 comments · Fixed by #114

Comments

@peterbjorgensen
Copy link
Contributor

Since upgrading dolma to version 1.0.0 I only get the attributes from the last tagger in the list.
I think the problem is here:

# if not set; it will potentially not write to the output stream
# in case a tagger emits no spans
attributes_by_stream[tagger_output.path] = {}

tagger_output.path is the same for all the taggers in the list, but attributes_by_stream[tagger_output.path] will be set to empty dictionary when looping through the taggers, leaving only the attributes from the last tagger in the list.
This bug is not present in version 0.9.4.
I would submit a pull request, but I am not sure what these three lines are supposed to fix.

soldni added a commit that referenced this issue Feb 7, 2024
…#114)

* do not overwrite tagger outputs with the same output path

* added test for failure

* removed unused import

* caught error

---------

Co-authored-by: Luca Soldaini <luca@soldaini.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant