Attributes CSV loading#617
Conversation
118d88c to
cfee412
Compare
subdavis
left a comment
There was a problem hiding this comment.
Couple comments.
I tested it out and things seem to work really well!
| if (testVals[attributeKey]) { | ||
| let attributeType: ('number' | 'boolean' | 'text') = 'number'; | ||
| let lowCount = 1; | ||
| let lowCount = 3; |
There was a problem hiding this comment.
Maybe refactor magic number to const with description.
| for attributeKey in metadata_attributes.keys(): | ||
| if attributeKey in test_vals: | ||
| attribute_type = 'number' | ||
| low_count = 3 |
There was a problem hiding this comment.
maybe refactor const with description
| attribute_type = 'boolean' | ||
| if attribute_type == 'boolean' and key != 'True' and key != 'False': | ||
| attribute_type = 'text' | ||
| # If all text values are used 3 or more times they are defined values |
There was a problem hiding this comment.
Won't this lead to cases where some values are "defined" and others are freeform? I don't know of any use cases where this might be desired. For example, you might have 100 values, 95 of them are the same, and the rest are unique, and I think in that case, you'd want all 5 as preset values.
This behavior may confuse users, who may think that, during import, some of their attributes had been lost or discarded completely.
A possible alternative approach is to just store all unique values in a set, and at the end:
if len(unique_set) / len(total_values) < SOME_UNIQIENESS_THRESHOLD then keep all values as presets, else just assume free-form and have no presets.
There was a problem hiding this comment.
If you have 100 values and 95 of them are the same and the 5 others are unique it then becomes a free-form text value. The code there ensures that all values are used at least 3 times before it swaps itself to predefined. So all unique values must be used at least 3 times.
TestAttribute of type text
- value 1: 10 times
- value 2: 3 times
- value 3: 24 times
- value 4: 2 times
Because 'value 4' is only used 2 times, TestAttribute will be free form text instead of predefined values.
This still isn't the best way to handle this because I'm trying to guess at intention from the values provided.
I never questioned why I even imported this over from the previous attribute section. The viame-csv format has no support for predefined values and its a 'feature' that was only in the UI. So maybe we should just remove predefined values or they have to be manually set by the user. May be a question for Matt.
subdavis
left a comment
There was a problem hiding this comment.
Two more small things. Otherwise I think this is ready to merge.
| if f'{atr_type}_{key}' not in metadata_attributes: | ||
| metadata_attributes[f'{atr_type}_{key}'] = { | ||
| 'belongs': atr_type, | ||
| 'datatype': 'text', | ||
| 'name': key, | ||
| 'key': f'{atr_type}_{key}', | ||
| } | ||
| test_vals[f'{atr_type}_{key}'] = {} | ||
| test_vals[f'{atr_type}_{key}'][valstring] = 1 | ||
| elif ( | ||
| f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals | ||
| ): | ||
| if valstring in test_vals[f'{atr_type}_{key}']: | ||
| test_vals[f'{atr_type}_{key}'][valstring] += 1 | ||
| else: | ||
| test_vals[f'{atr_type}_{key}'][valstring] = 1 |
There was a problem hiding this comment.
| if f'{atr_type}_{key}' not in metadata_attributes: | |
| metadata_attributes[f'{atr_type}_{key}'] = { | |
| 'belongs': atr_type, | |
| 'datatype': 'text', | |
| 'name': key, | |
| 'key': f'{atr_type}_{key}', | |
| } | |
| test_vals[f'{atr_type}_{key}'] = {} | |
| test_vals[f'{atr_type}_{key}'][valstring] = 1 | |
| elif ( | |
| f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals | |
| ): | |
| if valstring in test_vals[f'{atr_type}_{key}']: | |
| test_vals[f'{atr_type}_{key}'][valstring] += 1 | |
| else: | |
| test_vals[f'{atr_type}_{key}'][valstring] = 1 | |
| # 'type_name' matches the client convention in AttributeEditor.vue | |
| atr_key_str = f'{atr_type}_{key}' | |
| if atr_key_str not in metadata_attributes: | |
| metadata_attributes[atr_key_str] = { | |
| 'belongs': atr_type, | |
| 'datatype': 'text', | |
| 'name': key, | |
| 'key': atr_key_str, | |
| } | |
| test_vals[atr_key_str] = {} | |
| test_vals[atr_key_str][valstring] = 1 | |
| elif ( | |
| atr_key_str in metadata_attributes and atr_key_str in test_vals | |
| ): | |
| if valstring in test_vals[atr_key_str]: | |
| test_vals[atr_key_str][valstring] += 1 | |
| else: | |
| test_vals[atr_key_str][valstring] = 1 |
replace f'{atr_type}_{key}' with a variable since it's critical that all usages be identical.
| float(key) | ||
| except ValueError: | ||
| attribute_type = 'boolean' | ||
| if attribute_type == 'boolean' and key != 'True' and key != 'False': |
There was a problem hiding this comment.
According to https://viame.readthedocs.io/en/latest/section_links/detection_file_conversions.html, true and false are valid boolean values.
There was a problem hiding this comment.
These are a string representation of the values used within the tracks/detections. I build up a dictionary with the values as a key to determine how many times a specific value is used. It's part of trying to see if there are predefined values for text strings (Not the best way to do this but hopefully can talk about it in the meeting today). Here I start with number type, then see if any of the values-converted-to-string are not boolean to kick it over to text type. So using true and false in the CSV will properly convert to 'True' and 'False' here. I think the change would be if we wanted to support True and False directly in the CSV instead of true and false.
There was a problem hiding this comment.
So parsing converts the strings to values (bool, float, str), create_attributes converts them back to strings, and calculate_attribute_types converts them back to primitives again to test their type.
Out of curiosity, why convert to string and back at all? int, float, and bool are all valid dictionary keys?
Then you could just type(key) is bool, for example.
There was a problem hiding this comment.
Another possible alternative is to just add all the values to an array from create_attributes, then generate your Dict[value, count] using collections.Counter https://docs.python.org/3/library/collections.html#collections.Counter
There was a problem hiding this comment.
I'm guessing tunnel vision from doing this in electron first and copying it over.
There was a problem hiding this comment.
Hmm. Makes sense. There's arguably value in having identical logic in python and js (I made that argument at some point I think). If you want to keep this as-is, I think test coverage mostly takes care of my concerns.
|
That's what I get for suggested edits.... It's a lint error, black will fix it. |
|
entirely my fault |
On reading of a CSV file it will inline calculate the necessary attributes metadata to be added to the folder.
Slightly different than the Node.js version. That one does an additional pass on all tracks after initial read. This one currently does it inline to build up the values and then a single loop through the attributes to confirm the type.
CSVAttributesImportsmall.mp4
getTrackAndAttributes. To make attribute calculations in a single loop through the CSV file this was required. Let me know if you have any other idea or better ways to structure the typing for the return value of the functions that used to just load the tracks.attributeProcessortesting. Let me know if I should do an integratedcli.tsparseFiletest at the same time. I don't know if the parseFile function would ever be used without theattributeProcessor. It seems natural that they should be tested at the same time.