Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations/enforce unicity #413

Merged
merged 15 commits into from
Feb 20, 2023
Merged

Annotations/enforce unicity #413

merged 15 commits into from
Feb 20, 2023

Conversation

LoannPeurey
Copy link
Contributor

The purpose of the PR is to enforce that:

  • we cannot have multiple lines in annotations.csv with the same set and annotation_filename
  • A set cannot have overlapping annotation files. Meaning inside a specific set, it is not possible to annotate multiple times the same stretch of audio (bc then this stretch is counted multiple times in metrics and other analysis)

To achieve this, all the functions that interact with annotations.csv must make sure before writing to it that those cases don't exist,
this is especially relevant for importations that now will fail for every row that does not satisfy those conditions.

Previously, the errors encountered during importation would find themselves recorded into annotations.csv. I felt like this could be troublesome as the file is intended as a record of indexed annotation and should not serve as a logfile. My take was to output all the errors to a separate csv in extra but not too sure about this solution

@LoannPeurey
Copy link
Contributor Author

LoannPeurey commented Feb 8, 2023

I am also unsatisfied about the

def find_lines_involved_in_overlap(df: pd.DataFrame, onset_label: str = 'range_onset', offset_label:str = 'range_offset', recording_label: str= None, set_label: str = None):
function to handle correctly when we want to select by a number of other columns, I could not think immediately of a clean way to do it without eval() .

EDIT: changed to use eval, looks alright to me this way :)

@LoannPeurey
Copy link
Contributor Author

LoannPeurey commented Feb 10, 2023

This change also leads to annotations always being sorted by 'imported_at' values. However, a lot of lines have the same value for that, leading to annotations.csv being changed in the order inconsistently by every importation done. So write should always order before writing changes, I suggest by : 'imported_at' then 'set' then 'annotation_filename' . (set, annotation_filename) should be always unique, so this should be consistent.

And maybe add a test for that sorting?

Copy link

@shuvayanti shuvayanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me. But I see a merge conflict which probably need to be resolved.

@LoannPeurey LoannPeurey merged commit b2f7aa4 into master Feb 20, 2023
@LoannPeurey LoannPeurey linked an issue Feb 24, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-imported annotations
3 participants