Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing critical typo in HaplotypeCaller disk spec. #450

Merged
merged 17 commits into from
May 8, 2024

Conversation

jonn-smith
Copy link
Collaborator

No description provided.

@shadizaheri
Copy link
Collaborator

Thank you for the great work on these updates. I tested this branch on 3,000 malaria samples with the SRFlowcell workflow on Terra and it worked perfectly.
Based on my observations, I’ve written a summary of the changes. Please correct any parts if needed. This summary is meant to clearly summarize the updates in this PR for both users and developers.

Review Summary of Key Improvements

  1. Refactoring the Output Calculation Logic:

    • This PR simplifies complex calculation expressions by breaking them into smaller, clearly defined variables. It also prevents errors by incorporating conditional checks to handle cases like division by zero.
    • Metrics such as estimated fold coverage and aligned fraction of bases are now calculated upfront and stored in descriptive variables, which are then used in the output block. This approach improves the readability of the code and centralizes the logic for easier future adjustments.
  2. Enhanced Error Handling in QC Tasks:

    • The PR also addresses the error handling in the FastQC QC tasks. By adding checks to ensure that base quality data is present before proceeding with calculations, the workflow is made more robust against incomplete data, which could lead to runtime errors.
  3. Optimization of Java Options in Utility Tasks

  4. HaplotypeCaller wdl Change in Source for gVCF Files:

    • The output_gvcf and output_gvcf_index are now sourced from ReblockHcGVCF.output_gvcf and ReblockHcGVCF.output_gvcf_index, respectively.
    • Previously, these files were sourced from MergeGVCFs.output_vcf and MergeGVCFs.output_vcf_index, indicating that the gVCF files were directly obtained from the merging of multiple GVCF files without additional processing.

Copy link
Collaborator

@shadizaheri shadizaheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the conversation for the details.

- added option to use gnarly genotyper
- added het inputs to joint genotyping
- fixed java memory allocation in joint genotyping to be based on memory of the VM, not hard-coded
- Added stack trace logging for errors in `ExtractVariantAnnotations`,
  `TrainVariantAnnotationsModel`, and `ScoreVariantAnnotations`.
- Removed `HAPCOMP`, `HAPDOM`, and `HEC` from default annotations for SNP and INDEL VETS filtration.  Need to do more testing / debugging to include these in joint calling.
- Fixed name of outdir in `ConvertToZarrStore` to be correct for this workflow.
- Updated the zarr conversion to use parallel Dask processes and to log to stdout.
@jonn-smith jonn-smith merged commit fe32d91 into main May 8, 2024
5 checks passed
@jonn-smith jonn-smith deleted the jts_quick_hc_bugfix branch May 8, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants