-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genomes from Elan 20220105 have been unpublished #179
Comments
Yesterday I should have noticed that there was no new work to do when republishing the data set; this should have been grounds to stop what we were doing as we were writing OVER the data set (and so there should have been plenty of work to do). The key mistake was I ran the first step of The name coded into the |
I think we have two options:
I'm tending towards the latter, if only because it would be more technically correct for the lost genomes to have their published date changed to the date they were inserted into the data set properly, and it's an easier solution to reason about. |
The latter also strikes me as less likely to lead to some unforeseen consequence. |
So:
|
We're going to go with the safer (and technically more correction Option 2). It should be straightforward (famous last words) as we can query Majora for published artifact groups with a I'm just chasing up a loose end at PHE as they are reporting the Asklepian genome table was 2 genomes smaller which is a discrepancy that doesn't fit expectations given what's happened. |
The number of times I have typed 2021 in here is embarrassing |
Wow I actually didn't notice which is possibly worse |
I have consulted with the Majora oracle:
The number of affected sequences is officially 13,986. |
I've also cleared up the -2 situation at PHE. I think we're ready to go and update the published dates for the affected sequences. |
OK that's done.
Note that we've left the rejected and suppressed genomes with the original published date because they are unaffected by this problem. |
Looking good
|
I have checked the metadata TSV and big MSA and the missing samples appear to have been included in the latest dataset. |
During the handling of yesterday's data integrity incident (#178), the data set for 2022-01-05 was republished. It appears during this process the new data for 2022-01-05 was not added back to the data set; essentially republishing the 2022-01-04 data set. These genomes are now missing from the 2022-01-06 data set and will need to be reinserted.
The text was updated successfully, but these errors were encountered: