Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elan 20210615 processed samples without minimal metadata #91

Closed
SamStudio8 opened this issue Jun 15, 2021 · 6 comments
Closed

Elan 20210615 processed samples without minimal metadata #91

SamStudio8 opened this issue Jun 15, 2021 · 6 comments

Comments

@SamStudio8
Copy link
Member

SamStudio8 commented Jun 15, 2021

  • Datapipe failed at 09:03, emitting an announcement into #tael-stream
  • RC notified SN at 09:13 today that datapipe had encountered an error due to a sample missing adm1.
  • SN confirmed this is not intended behaviour at 09:19, RC began patching datapipe to silently ignore these cases for today
  • RC raised patched datapipe at 09:30
  • SN accessed Majora's shell at 09:38 and confirmed 78 PAGs had been processed by Elan this morning
  • After an investigation SN confirmed the error was due to yesterday's change (Use Elan find+resolve step to send AM pipeline message #89) causing a cached version of Ocarina to be used to conduct the pre-Elan steps, with a version that had less robust checking for blank biosamples

Screenshot from 2021-06-15 09-29-29

  • SN removed the 78 PAGs from Majora at 09:45
  • SN killed Asklepian at 09:55 and raised it manually at 10:00
@SamStudio8
Copy link
Member Author

10:23: SN unlinks 6 BAM/FASTA entries in the downstream artifact directories. RC confirmed at 09:27 there were 6 entries in the datapipe dataset with this condition.
10:25: SN confirms the other 72 records had failed basic QC and were not published

This unlinking process will cause these 6 sequences to drop out of the dataset tomorrow.

@SamStudio8
Copy link
Member Author

SamStudio8 commented Jun 15, 2021

[nicholsz@bham 20210615]$ wc -l best_refs.paired.ls 
534663 best_refs.paired.ls
[nicholsz@bham 20210615]$ wc -l ../20210615-bk/best_refs.paired.ls 
534669 ../20210615-bk/best_refs.paired.ls

Looks like we've caught these in Asklepian now, so things should run as normal (#92).

@SamStudio8
Copy link
Member Author

The incident is over. This issue will remain open until tomorrow once we've confirmed the 6 sequences are removed from the downstream dataset tomorrow.

@SamStudio8
Copy link
Member Author

Given this has happened twice it goes to show the filtering should probably be done server side 🙄

@SamStudio8
Copy link
Member Author

Shout out to @rmcolq for detecting this so early, allowing us to recover much more efficiently than last time this happened.

@SamStudio8
Copy link
Member Author

The n=6 bad eggs were pruned from yesterday's dataset. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant