Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add One Health Enteric BioSample Package + misc bugfixes #38

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

erikwolfsohn
Copy link

Hi! I just found out about this fantastic submission pipeline you all built and I'm really excited to start using it. I had to make some updates since the majority of my NCBI submissions are enteric pathogens, so figured I'd submit a pull request in case any of these changes can be useful for you all.

📋 Updates

  • Added all mandatory and optional metadata fields for the One Health Enteric BioSample Package to main_config.yaml
  • Added a template and config file for BioSample & SRA submission using that package
  • Changed some handling to remove optional columns if left blank before submission

🛠️ Fixes

  • Slightly modified the behavior of the check_submission_status workflow to prevent FTP navigation errors/query errors.
  • Slightly modified handling for input arguments so users shouldn't be prompted for FASTA input unless they're submitting to Genbank or GISAID
  • Changed handling for mandatory vs optional columns; the pipeline should no longer require optional columns in metadata.csv prior to submission

…rough database names when querying for submission updates

Changed navigation in get_ncbi_process_report to step through parent directories individually before entering test or production - ncbi ftp is configured to hide child directories until you access the parents
Added some very preliminary handling for the OneHealth Enteric metadata package in SRA and BioSample
…s submitting to Genbank or GISAID

Added all mandatory and optional fields for onehealth enteric biosample package to main_config.yaml & added metadata/config templates
Added handling to remove empty optional metadata columns if not filled out at submission time
@erikwolfsohn erikwolfsohn changed the title Onehealth enteric biosample Add One Health Enteric BioSample Package + misc bugfixes Mar 6, 2024
erikwolfsohn and others added 4 commits March 6, 2024 12:39
… - fields with * also must be unique. Probably have to revisit this for GISAID/GENBANK
Correcting this, not all FTP accounts have the "submit" folder, adjusting it to automatically detect the folder and correctly step into it if it exists.
If bs-description is empty don't build descriptor with empty string. Remove for NCBI to automatically generate
@dthoward96
Copy link
Collaborator

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

@erikwolfsohn
Copy link
Author

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file.

Feel free to shoot me an email at erik.wolfsohn@cchealth.org if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can.

@dthoward96
Copy link
Collaborator

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file.

Feel free to shoot me an email at erik.wolfsohn@cchealth.org if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can.

Yes, I'm working quickly to get it added. The different requirements for One Health Enteric BioSample attributes has caused some issues when testing so instead of reinventing the wheel, I'm going to move up the pandera validation to the next version update to just resolve this issue instead of implementing a temporary fix. I don't think you'll need to manually create a One Health Enteric specific schema as I'm currently testing a way to automatically generate it from NCBI's website. I should have this available on the version update branch later this week. I've already added the Enteric xml as part of the test set I'm working on. I do have a couple other questions that I'm pooling together so once I have the update live I'll send you a email to let you know with my other questions included. Once you get my email if you could test it with the automatically generated schema that would be a major help.

@erikwolfsohn
Copy link
Author

erikwolfsohn commented May 1, 2024 via email

@dthoward96
Copy link
Collaborator

Hi Dakota, I just wanted to check in and make sure I didn't miss any emails from you. I was focusing on some other projects and this dropped off my radar a little bit. Let me know if there's any way I can contribute currently or if anything is ready for testing. Pulling the metadata templates directly from NCBI sounds fantastic, I'm definitely excited for that feature. I'll be at a conference next week so I won't be available to do much testing, but I'll be back on May 13th.

On Tue, Mar 19, 2024 at 8:10 AM Dakota Howard @.> wrote: Hey @erikwolfsohn https://github.com/erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see. Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file. Feel free to shoot me an email at @. if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can. Yes, I'm working quickly to get it added. The different requirements for One Health Enteric BioSample attributes has caused some issues when testing so instead of reinventing the wheel, I'm going to move up the pandera validation to the next version update to just resolve this issue instead of implementing a temporary fix. I don't think you'll need to manually create a One Health Enteric specific schema as I'm currently testing a way to automatically generate it from NCBI's website. I should have this available on the version update branch later this week. I've already added the Enteric xml as part of the test set I'm working on. I do have a couple other questions that I'm pooling together so once I have the update live I'll send you a email to let you know with my other questions included. Once you get my email if you could test it with the automatically generated schema that would be a major help. — Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGEJNHYOLQT6FQ7G2VOWE3YZBIPDAVCNFSM6AAAAABEIUGL7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGQ2TEMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

Hey @erikwolfsohn,

Sorry, for not having reached out sooner, it took me a bit longer than anticipated to finish working through the update. The update is currently mostly complete and I'm in the process of finalizing the updated instructions for the documentation. I'll go ahead and send you an email now to connect, but I'm definitely in need of users to test this new version. The updated documentation will be available on the branch: https://github.com/CDCgov/seqsender/tree/v1.2.0 by the time you're back on the 13th. I'll also send you an email when I do upload the documentation, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants