Annotations/enforce unicity #413

LoannPeurey · 2023-02-08T16:02:18Z

The purpose of the PR is to enforce that:

we cannot have multiple lines in annotations.csv with the same set and annotation_filename
A set cannot have overlapping annotation files. Meaning inside a specific set, it is not possible to annotate multiple times the same stretch of audio (bc then this stretch is counted multiple times in metrics and other analysis)

To achieve this, all the functions that interact with annotations.csv must make sure before writing to it that those cases don't exist,
this is especially relevant for importations that now will fail for every row that does not satisfy those conditions.

Previously, the errors encountered during importation would find themselves recorded into annotations.csv. I felt like this could be troublesome as the file is intended as a record of indexed annotation and should not serve as a logfile. My take was to output all the errors to a separate csv in extra but not too sure about this solution

LoannPeurey · 2023-02-08T16:06:13Z

I am also unsatisfied about the

ChildProject/ChildProject/utils.py

Line 180 in 6f6474f

    
           def find_lines_involved_in_overlap(df: pd.DataFrame, onset_label: str = 'range_onset', offset_label:str = 'range_offset', recording_label: str= None, set_label: str = None):

function to handle correctly when we want to select by a number of other columns, I could not think immediately of a clean way to do it without eval() .

EDIT: changed to use eval, looks alright to me this way :)

LoannPeurey · 2023-02-10T15:03:45Z

This change also leads to annotations always being sorted by 'imported_at' values. However, a lot of lines have the same value for that, leading to annotations.csv being changed in the order inconsistently by every importation done. So write should always order before writing changes, I suggest by : 'imported_at' then 'set' then 'annotation_filename' . (set, annotation_filename) should be always unique, so this should be consistent.

And maybe add a test for that sorting?

shuvayanti

Everything looks good to me. But I see a merge conflict which probably need to be resolved.

LoannPeurey and others added 11 commits February 2, 2023 17:08

first shot at changes, clearly has errors as of now

fd1ceba

assuring unicity of annotation lines

0428d2d

adding tests with data to the importation

2478bc7

make tests work for cli

57d1305

make error outputs consistent in paths in windows for importation

a22a650

replace deprecated sklearn by scikit-learn

2d219c2

change error output for overlaps

8dbf985

Update annotations.py

4f64686

update err_overlap

f5dace2

importation -> assert offset > onset >= 0

fb81d3a

am validation checks for overlaps and onset offset

6f6474f

LoannPeurey requested review from alecristia and William-N-Havard February 8, 2023 16:11

always sort annotations.csv before writing it

d44885d

William-N-Havard approved these changes Feb 10, 2023

View reviewed changes

rework the find overlaps function, change tests accordingly

189103e

LoannPeurey requested a review from shuvayanti February 20, 2023 11:11

fix type not subscriptable

cf47617

shuvayanti approved these changes Feb 20, 2023

View reviewed changes

Merge branch 'master' into annotations/enforce-unicity

d47698a

LoannPeurey merged commit b2f7aa4 into master Feb 20, 2023

LoannPeurey linked an issue Feb 24, 2023 that may be closed by this pull request

Re-imported annotations #412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotations/enforce unicity #413

Annotations/enforce unicity #413

LoannPeurey commented Feb 8, 2023

LoannPeurey commented Feb 8, 2023 •

edited

Loading

LoannPeurey commented Feb 10, 2023 •

edited

Loading

shuvayanti left a comment

Annotations/enforce unicity #413

Annotations/enforce unicity #413

Conversation

LoannPeurey commented Feb 8, 2023

LoannPeurey commented Feb 8, 2023 • edited Loading

LoannPeurey commented Feb 10, 2023 • edited Loading

shuvayanti left a comment

Choose a reason for hiding this comment

LoannPeurey commented Feb 8, 2023 •

edited

Loading

LoannPeurey commented Feb 10, 2023 •

edited

Loading