Hi all — I recently decided to add the nearest_neighbors quality metric into my pipeline, but when I tried to compute just that metric, I was annoyed to find that it overwrote all the other previously calculated metrics. This seems like non-optimal behavior — imagine the user computes slow quality metric X, then wants to also add slow quality metric Y the next day, they will also have to re-compute X, or otherwise manually futz with the saved CSVs.
This behavior seems to be implemented here. Instead of creating a new df each time, why not check for an existing one, and merge it with any new metrics created? I understand that overwriting is perhaps a better default, in case the user has curated the units or otherwise changed pre-processing, and is trying to compute metrics de novo, but there could be a "keep_existing" kwarg or something that allows the user to specify not to overwrite what's already there.
Thanks!
Hi all — I recently decided to add the
nearest_neighborsquality metric into my pipeline, but when I tried to compute just that metric, I was annoyed to find that it overwrote all the other previously calculated metrics. This seems like non-optimal behavior — imagine the user computes slow quality metric X, then wants to also add slow quality metric Y the next day, they will also have to re-compute X, or otherwise manually futz with the saved CSVs.This behavior seems to be implemented here. Instead of creating a new df each time, why not check for an existing one, and merge it with any new metrics created? I understand that overwriting is perhaps a better default, in case the user has curated the units or otherwise changed pre-processing, and is trying to compute metrics de novo, but there could be a "keep_existing" kwarg or something that allows the user to specify not to overwrite what's already there.
Thanks!