Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow duplicates in infrastructure.cell_info #6626

Closed
jc-harrison opened this issue May 24, 2024 · 0 comments · Fixed by #6627
Closed

Disallow duplicates in infrastructure.cell_info #6626

jc-harrison opened this issue May 24, 2024 · 0 comments · Fixed by #6627
Labels
FlowDB Issues related to FlowDB refactoring

Comments

@jc-harrison
Copy link
Member

#6433 added a new table infrastructure.cell_info to FlowDB for recording the full history of cell information, including cells that have been excluded due to quality concerns.

The original intention behind the design of this table was for it to include all cell information, including records with duplicate cell IDs. As such, the constraint that ensures no two simultaneously-valid cells have the same ID is only applied over rows where to_include = True, so that duplicates can be included in the table provided they are excluded from use in analysis.

On reflection, I think this was a bad design decision - when ingesting new cell information, we want to join to the previous cell information (including "excluded" cells so that we can carry over these exclusions and avoid re-including cells), and this join is made more complicated by the presence of duplicates in the cell info table. I think we would do better to make a distinction between "valid-but-excluded cells" (such as those with suspicious longitude/latitude coordinates), which should be included in infrastructure.cell_info to avoid re-including excluded cells in future updates, and "invalid cell records" (such as duplicates, or cell records with null cell ID - which cannot be included in infrastructure.cell_info due to non-null constraint), which should perhaps be kept in a separate table so we do not lose the information, but do not need to be in infrastructure.cell_info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FlowDB Issues related to FlowDB refactoring
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant