Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data file 'exists' check needs to be updated as duplicates still being saved to json_files #50

Open
stuchalk opened this issue Apr 23, 2021 · 0 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@stuchalk
Copy link
Collaborator

Currently, if the same data file is ingested a second time there are situations where the 'exists' check fails because the file being ingested is compared to only the most recent file (df_functions.py, def:updatedatafile, lines 81-89 of v0.2.1).

Therefore, the code needs to be updated to check the new data file against all versions that have been ingested. This should be done using the new 'jhash' field already added to the 'json_files' table, where an md5 hash of the 'file' field is stored. Although the 'jhash' field might well be unique across the table, using the 'file_lookup_id' and the 'jhash' to search the table would verify if the file had already been uploaded.

Note: in code the current 'generatedAt' field must be emptied (set to '') before the md5 hash generation.

@stuchalk stuchalk added the bug Something isn't working label Apr 23, 2021
@stuchalk stuchalk added this to the Beta 2 milestone Apr 23, 2021
@stuchalk stuchalk self-assigned this Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant