Attributes CSV loading by BryonLewis · Pull Request #617 · Kitware/dive

BryonLewis · 2021-03-05T19:17:36Z

On reading of a CSV file it will inline calculate the necessary attributes metadata to be added to the folder.
Slightly different than the Node.js version. That one does an additional pass on all tracks after initial read. This one currently does it inline to build up the values and then a single loop through the attributes to confirm the type.

CSVAttributesImportsmall.mp4

Fixed an issue in the Node.js version regarding predefined attribute values that weren't first in a track
python getTracks have now been updated to getTrackAndAttributes. To make attribute calculations in a single loop through the CSV file this was required. Let me know if you have any other idea or better ways to structure the typing for the return value of the functions that used to just load the tracks.
Added testing of the attribute processor on python, specifically the predefined values
Added NodeJS attributeProcessor testing. Let me know if I should do an integrated cli.ts parseFile test at the same time. I don't know if the parseFile function would ever be used without the attributeProcessor. It seems natural that they should be tested at the same time.

subdavis

Couple comments.

I tested it out and things seem to work really well!

subdavis · 2021-03-22T15:30:05Z

      if (testVals[attributeKey]) {
        let attributeType: ('number' | 'boolean' | 'text') = 'number';
-        let lowCount = 1;
+        let lowCount = 3;


Maybe refactor magic number to const with description.

subdavis · 2021-03-22T15:30:57Z

+    for attributeKey in metadata_attributes.keys():
+        if attributeKey in test_vals:
+            attribute_type = 'number'
+            low_count = 3


maybe refactor const with description

subdavis · 2021-03-22T15:35:51Z

+                        attribute_type = 'boolean'
+                if attribute_type == 'boolean' and key != 'True' and key != 'False':
+                    attribute_type = 'text'
+            # If all text values are used 3 or more times they are defined values


Won't this lead to cases where some values are "defined" and others are freeform? I don't know of any use cases where this might be desired. For example, you might have 100 values, 95 of them are the same, and the rest are unique, and I think in that case, you'd want all 5 as preset values.

This behavior may confuse users, who may think that, during import, some of their attributes had been lost or discarded completely.

A possible alternative approach is to just store all unique values in a set, and at the end:

if len(unique_set) / len(total_values) < SOME_UNIQIENESS_THRESHOLD then keep all values as presets, else just assume free-form and have no presets.

If you have 100 values and 95 of them are the same and the 5 others are unique it then becomes a free-form text value. The code there ensures that all values are used at least 3 times before it swaps itself to predefined. So all unique values must be used at least 3 times.
TestAttribute of type text

value 1: 10 times

value 2: 3 times

value 3: 24 times

value 4: 2 times

Because 'value 4' is only used 2 times, TestAttribute will be free form text instead of predefined values.

This still isn't the best way to handle this because I'm trying to guess at intention from the values provided.

I never questioned why I even imported this over from the previous attribute section. The viame-csv format has no support for predefined values and its a 'feature' that was only in the UI. So maybe we should just remove predefined values or they have to be manually set by the user. May be a question for Matt.

subdavis

Two more small things. Otherwise I think this is ready to merge.

subdavis · 2021-03-23T12:16:12Z

+    if f'{atr_type}_{key}' not in metadata_attributes:
+        metadata_attributes[f'{atr_type}_{key}'] = {
+            'belongs': atr_type,
+            'datatype': 'text',
+            'name': key,
+            'key': f'{atr_type}_{key}',
+        }
+        test_vals[f'{atr_type}_{key}'] = {}
+        test_vals[f'{atr_type}_{key}'][valstring] = 1
+    elif (
+        f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals
+    ):
+        if valstring in test_vals[f'{atr_type}_{key}']:
+            test_vals[f'{atr_type}_{key}'][valstring] += 1
+        else:
+            test_vals[f'{atr_type}_{key}'][valstring] = 1


Suggested change

if f'{atr_type}_{key}' not in metadata_attributes:

metadata_attributes[f'{atr_type}_{key}'] = {

'belongs': atr_type,

'datatype': 'text',

'name': key,

'key': f'{atr_type}_{key}',

}

test_vals[f'{atr_type}_{key}'] = {}

test_vals[f'{atr_type}_{key}'][valstring] = 1

elif (

f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals

):

if valstring in test_vals[f'{atr_type}_{key}']:

test_vals[f'{atr_type}_{key}'][valstring] += 1

else:

test_vals[f'{atr_type}_{key}'][valstring] = 1

# 'type_name' matches the client convention in AttributeEditor.vue

atr_key_str = f'{atr_type}_{key}'

if atr_key_str not in metadata_attributes:

metadata_attributes[atr_key_str] = {

'belongs': atr_type,

'datatype': 'text',

'name': key,

'key': atr_key_str,

}

test_vals[atr_key_str] = {}

test_vals[atr_key_str][valstring] = 1

elif (

atr_key_str in metadata_attributes and atr_key_str in test_vals

):

if valstring in test_vals[atr_key_str]:

test_vals[atr_key_str][valstring] += 1

else:

test_vals[atr_key_str][valstring] = 1

replace f'{atr_type}_{key}' with a variable since it's critical that all usages be identical.

subdavis · 2021-03-23T12:29:58Z

+                        float(key)
+                    except ValueError:
+                        attribute_type = 'boolean'
+                if attribute_type == 'boolean' and key != 'True' and key != 'False':


According to https://viame.readthedocs.io/en/latest/section_links/detection_file_conversions.html, true and false are valid boolean values.

These are a string representation of the values used within the tracks/detections. I build up a dictionary with the values as a key to determine how many times a specific value is used. It's part of trying to see if there are predefined values for text strings (Not the best way to do this but hopefully can talk about it in the meeting today). Here I start with number type, then see if any of the values-converted-to-string are not boolean to kick it over to text type. So using true and false in the CSV will properly convert to 'True' and 'False' here. I think the change would be if we wanted to support True and False directly in the CSV instead of true and false.

So parsing converts the strings to values (bool, float, str), create_attributes converts them back to strings, and calculate_attribute_types converts them back to primitives again to test their type.

Out of curiosity, why convert to string and back at all? int, float, and bool are all valid dictionary keys?

Then you could just type(key) is bool, for example.

Another possible alternative is to just add all the values to an array from create_attributes, then generate your Dict[value, count] using collections.Counter https://docs.python.org/3/library/collections.html#collections.Counter

I'm guessing tunnel vision from doing this in electron first and copying it over.

Hmm. Makes sense. There's arguably value in having identical logic in python and js (I made that argument at some point I think). If you want to keep this as-is, I think test coverage mostly takes care of my concerns.

subdavis · 2021-03-23T13:43:30Z

That's what I get for suggested edits.... It's a lint error, black will fix it.

BryonLewis · 2021-03-23T13:44:32Z

entirely my fault

BryonLewis added 5 commits March 4, 2021 17:15

beginning attributes from CSV metadata

ba0d75a

initial version

b1ddd2b

Merge branch 'main' into attributes-csv-load

0db2042

Merge branch 'main' into attributes-csv-load

91a7636

typing and fixing of the predefined values as well as testing

cfee412

BryonLewis force-pushed the attributes-csv-load branch from 118d88c to cfee412 Compare March 17, 2021 18:09

BryonLewis added 2 commits March 19, 2021 15:23

adding in attributeProcessor tests

42dd4a0

Merge branch 'main' into attributes-csv-load

bdbf478

BryonLewis marked this pull request as ready for review March 20, 2021 20:35

subdavis self-requested a review March 22, 2021 15:01

subdavis suggested changes Mar 22, 2021

View reviewed changes

BryonLewis and others added 4 commits March 22, 2021 15:12

changing some documentation

98d8b5e

Merge branch 'main' into attributes-csv-load

7cf472c

name change for saving CSV import attributes

035f533

Merge branch 'main' into attributes-csv-load

9c27cb2

subdavis suggested changes Mar 23, 2021

View reviewed changes

using attribute_key instead of computing each time

b20da39

subdavis self-requested a review March 23, 2021 13:40

subdavis previously approved these changes Mar 23, 2021

View reviewed changes

ran black and isort

dbcbf42

BryonLewis dismissed subdavis’s stale review via dbcbf42 March 23, 2021 13:45

subdavis self-requested a review March 23, 2021 13:46

subdavis approved these changes Mar 23, 2021

View reviewed changes

BryonLewis merged commit 5867a6c into main Mar 23, 2021

BryonLewis deleted the attributes-csv-load branch March 29, 2021 12:07

Conversation

BryonLewis commented Mar 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

subdavis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

subdavis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subdavis Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subdavis commented Mar 23, 2021

Uh oh!

BryonLewis commented Mar 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BryonLewis commented Mar 5, 2021 •

edited

Loading

subdavis Mar 23, 2021 •

edited

Loading