Skip to content

Attributes CSV loading#617

Merged
BryonLewis merged 13 commits into
mainfrom
attributes-csv-load
Mar 23, 2021
Merged

Attributes CSV loading#617
BryonLewis merged 13 commits into
mainfrom
attributes-csv-load

Conversation

@BryonLewis
Copy link
Copy Markdown
Collaborator

@BryonLewis BryonLewis commented Mar 5, 2021

On reading of a CSV file it will inline calculate the necessary attributes metadata to be added to the folder.
Slightly different than the Node.js version. That one does an additional pass on all tracks after initial read. This one currently does it inline to build up the values and then a single loop through the attributes to confirm the type.

CSVAttributesImportsmall.mp4
  • Fixed an issue in the Node.js version regarding predefined attribute values that weren't first in a track
  • python getTracks have now been updated to getTrackAndAttributes. To make attribute calculations in a single loop through the CSV file this was required. Let me know if you have any other idea or better ways to structure the typing for the return value of the functions that used to just load the tracks.
  • Added testing of the attribute processor on python, specifically the predefined values
  • Added NodeJS attributeProcessor testing. Let me know if I should do an integrated cli.ts parseFile test at the same time. I don't know if the parseFile function would ever be used without the attributeProcessor. It seems natural that they should be tested at the same time.

@BryonLewis BryonLewis force-pushed the attributes-csv-load branch from 118d88c to cfee412 Compare March 17, 2021 18:09
@BryonLewis BryonLewis marked this pull request as ready for review March 20, 2021 20:35
@subdavis subdavis self-requested a review March 22, 2021 15:01
Copy link
Copy Markdown
Contributor

@subdavis subdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments.

I tested it out and things seem to work really well!

if (testVals[attributeKey]) {
let attributeType: ('number' | 'boolean' | 'text') = 'number';
let lowCount = 1;
let lowCount = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe refactor magic number to const with description.

Comment thread server/dive_server/serializers/viame.py Outdated
for attributeKey in metadata_attributes.keys():
if attributeKey in test_vals:
attribute_type = 'number'
low_count = 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe refactor const with description

attribute_type = 'boolean'
if attribute_type == 'boolean' and key != 'True' and key != 'False':
attribute_type = 'text'
# If all text values are used 3 or more times they are defined values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this lead to cases where some values are "defined" and others are freeform? I don't know of any use cases where this might be desired. For example, you might have 100 values, 95 of them are the same, and the rest are unique, and I think in that case, you'd want all 5 as preset values.

This behavior may confuse users, who may think that, during import, some of their attributes had been lost or discarded completely.

A possible alternative approach is to just store all unique values in a set, and at the end:

if len(unique_set) / len(total_values) < SOME_UNIQIENESS_THRESHOLD then keep all values as presets, else just assume free-form and have no presets.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have 100 values and 95 of them are the same and the 5 others are unique it then becomes a free-form text value. The code there ensures that all values are used at least 3 times before it swaps itself to predefined. So all unique values must be used at least 3 times.
TestAttribute of type text

  • value 1: 10 times
  • value 2: 3 times
  • value 3: 24 times
  • value 4: 2 times

Because 'value 4' is only used 2 times, TestAttribute will be free form text instead of predefined values.

This still isn't the best way to handle this because I'm trying to guess at intention from the values provided.

I never questioned why I even imported this over from the previous attribute section. The viame-csv format has no support for predefined values and its a 'feature' that was only in the UI. So maybe we should just remove predefined values or they have to be manually set by the user. May be a question for Matt.

Comment thread server/dive_server/utils.py Outdated
Copy link
Copy Markdown
Contributor

@subdavis subdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more small things. Otherwise I think this is ready to merge.

Comment thread server/dive_server/serializers/viame.py Outdated
Comment on lines +170 to +185
if f'{atr_type}_{key}' not in metadata_attributes:
metadata_attributes[f'{atr_type}_{key}'] = {
'belongs': atr_type,
'datatype': 'text',
'name': key,
'key': f'{atr_type}_{key}',
}
test_vals[f'{atr_type}_{key}'] = {}
test_vals[f'{atr_type}_{key}'][valstring] = 1
elif (
f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals
):
if valstring in test_vals[f'{atr_type}_{key}']:
test_vals[f'{atr_type}_{key}'][valstring] += 1
else:
test_vals[f'{atr_type}_{key}'][valstring] = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if f'{atr_type}_{key}' not in metadata_attributes:
metadata_attributes[f'{atr_type}_{key}'] = {
'belongs': atr_type,
'datatype': 'text',
'name': key,
'key': f'{atr_type}_{key}',
}
test_vals[f'{atr_type}_{key}'] = {}
test_vals[f'{atr_type}_{key}'][valstring] = 1
elif (
f'{atr_type}_{key}' in metadata_attributes and f'{atr_type}_{key}' in test_vals
):
if valstring in test_vals[f'{atr_type}_{key}']:
test_vals[f'{atr_type}_{key}'][valstring] += 1
else:
test_vals[f'{atr_type}_{key}'][valstring] = 1
# 'type_name' matches the client convention in AttributeEditor.vue
atr_key_str = f'{atr_type}_{key}'
if atr_key_str not in metadata_attributes:
metadata_attributes[atr_key_str] = {
'belongs': atr_type,
'datatype': 'text',
'name': key,
'key': atr_key_str,
}
test_vals[atr_key_str] = {}
test_vals[atr_key_str][valstring] = 1
elif (
atr_key_str in metadata_attributes and atr_key_str in test_vals
):
if valstring in test_vals[atr_key_str]:
test_vals[atr_key_str][valstring] += 1
else:
test_vals[atr_key_str][valstring] = 1

replace f'{atr_type}_{key}' with a variable since it's critical that all usages be identical.

float(key)
except ValueError:
attribute_type = 'boolean'
if attribute_type == 'boolean' and key != 'True' and key != 'False':
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://viame.readthedocs.io/en/latest/section_links/detection_file_conversions.html, true and false are valid boolean values.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a string representation of the values used within the tracks/detections. I build up a dictionary with the values as a key to determine how many times a specific value is used. It's part of trying to see if there are predefined values for text strings (Not the best way to do this but hopefully can talk about it in the meeting today). Here I start with number type, then see if any of the values-converted-to-string are not boolean to kick it over to text type. So using true and false in the CSV will properly convert to 'True' and 'False' here. I think the change would be if we wanted to support True and False directly in the CSV instead of true and false.

Copy link
Copy Markdown
Contributor

@subdavis subdavis Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So parsing converts the strings to values (bool, float, str), create_attributes converts them back to strings, and calculate_attribute_types converts them back to primitives again to test their type.

Out of curiosity, why convert to string and back at all? int, float, and bool are all valid dictionary keys?

Then you could just type(key) is bool, for example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible alternative is to just add all the values to an array from create_attributes, then generate your Dict[value, count] using collections.Counter https://docs.python.org/3/library/collections.html#collections.Counter

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing tunnel vision from doing this in electron first and copying it over.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Makes sense. There's arguably value in having identical logic in python and js (I made that argument at some point I think). If you want to keep this as-is, I think test coverage mostly takes care of my concerns.

@subdavis subdavis self-requested a review March 23, 2021 13:40
subdavis
subdavis previously approved these changes Mar 23, 2021
@subdavis
Copy link
Copy Markdown
Contributor

That's what I get for suggested edits.... It's a lint error, black will fix it.

@BryonLewis
Copy link
Copy Markdown
Collaborator Author

entirely my fault

@BryonLewis BryonLewis merged commit 5867a6c into main Mar 23, 2021
@BryonLewis BryonLewis deleted the attributes-csv-load branch March 29, 2021 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants