Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#1225): create iob tags from record spans #1226

Merged
merged 8 commits into from Mar 7, 2022

Conversation

frascuchon
Copy link
Member

In this PR we include client functionality for generate iob tags from record spans definitions. This will help for generate training dataset for huggingface models training

See #1225

@frascuchon frascuchon self-assigned this Mar 4, 2022
@frascuchon frascuchon added this to In progress in Release via automation Mar 4, 2022
@frascuchon
Copy link
Member Author

@dcfidalgo Any idea about how to define as read-only the text and tokens properties?

@frascuchon frascuchon force-pushed the feature/build-iob-tags-in-client branch from 8d5fdec to 54a5588 Compare March 4, 2022 17:57
@frascuchon
Copy link
Member Author

Finally I didn't found a way to keep text and tokens inmutables. So the tokens/chars map will be dynamically computed every time. I've include a cache resolution to avoid extra computations.

Take a look @dcfidalgo

@codecov
Copy link

codecov bot commented Mar 4, 2022

Codecov Report

Merging #1226 (f5c8430) into master (fd2186d) will decrease coverage by 0.03%.
The diff coverage is 91.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1226      +/-   ##
==========================================
- Coverage   94.88%   94.84%   -0.04%     
==========================================
  Files         127      127              
  Lines        5391     5449      +58     
==========================================
+ Hits         5115     5168      +53     
- Misses        276      281       +5     
Flag Coverage Δ
pytest 94.84% <91.93%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/rubrix/client/models.py 96.89% <91.80%> (-3.11%) ⬇️
src/rubrix/client/datasets.py 98.03% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd2186d...f5c8430. Read the comment docs.

@dcfidalgo dcfidalgo force-pushed the feature/build-iob-tags-in-client branch from 54a5588 to c0b5900 Compare March 7, 2022 09:58
@dcfidalgo
Copy link
Contributor

@frascuchon Have a look at c0b5900 for making text and tokens immutable. Maybe we could move the "immutability" of text to the base _Validators class to have this for all record models.

Release automation moved this from In progress to Review Mar 7, 2022
@frascuchon frascuchon merged commit 07b895d into master Mar 7, 2022
Release automation moved this from Review to Ready to DEV QA Mar 7, 2022
@frascuchon frascuchon deleted the feature/build-iob-tags-in-client branch March 7, 2022 21:21
@frascuchon frascuchon moved this from Ready to DEV QA to Approved DEV QA in Release Mar 8, 2022
@frascuchon frascuchon moved this from Approved DEV QA to Ready to DEV QA in Release Mar 8, 2022
@frascuchon frascuchon moved this from Ready to DEV QA to Ready to Release QA in Release Mar 25, 2022
frascuchon added a commit that referenced this pull request Mar 25, 2022
* feat(#1225): create iob tags from record spans

* test: add tests

* refactor: dynamic tokens map with text/tokens mutability

* chore: naming

* feat: make text and tokens immutable

* chore: adapt to inmutable text and tokens

* test: fix tests

* test: fixing tests

Co-authored-by: dcfidalgo <david@recogn.ai>

(cherry picked from commit 07b895d)
@frascuchon frascuchon moved this from Ready to Release QA to Approved Release QA in Release Mar 28, 2022
frascuchon added a commit that referenced this pull request Mar 28, 2022
* feat(#1225): create iob tags from record spans

* test: add tests

* refactor: dynamic tokens map with text/tokens mutability

* chore: naming

* feat: make text and tokens immutable

* chore: adapt to inmutable text and tokens

* test: fix tests

* test: fixing tests

Co-authored-by: dcfidalgo <david@recogn.ai>

(cherry picked from commit 07b895d)
frascuchon added a commit that referenced this pull request Mar 28, 2022
* feat(#1225): create iob tags from record spans

* test: add tests

* refactor: dynamic tokens map with text/tokens mutability

* chore: naming

* feat: make text and tokens immutable

* chore: adapt to inmutable text and tokens

* test: fix tests

* test: fixing tests

Co-authored-by: dcfidalgo <david@recogn.ai>

(cherry picked from commit 07b895d)
frascuchon added a commit that referenced this pull request Mar 30, 2022
* feat(#1225): create iob tags from record spans

* test: add tests

* refactor: dynamic tokens map with text/tokens mutability

* chore: naming

* feat: make text and tokens immutable

* chore: adapt to inmutable text and tokens

* test: fix tests

* test: fixing tests

Co-authored-by: dcfidalgo <david@recogn.ai>

(cherry picked from commit 07b895d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Release
Approved Release QA
Development

Successfully merging this pull request may close these issues.

None yet

2 participants