Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new workflow to deal with multiple basecalls from same flowcell #325

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

SHuang-Broad
Copy link
Collaborator

No description provided.

@SHuang-Broad SHuang-Broad requested a review from kvg April 8, 2022 20:13
wdl/ONTFlowcellFromMultipleBasecalls.wdl Outdated Show resolved Hide resolved

Map[String, String] ref_map = read_map(ref_map_file)

String outdir = sub(gcs_out_root_dir, "/$", "") + "/ONTFlowcell/~{flowcell}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure output is meant to go into the usual ONTFlowcell directory? I'm fine with that, but just wanted to check that this is what you intended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's intended. the multiple basecalls themselves are output to a different dir

String outdir = sub(gcs_out_root_dir, "/$", "") + "/ONTBasecall/~{prefix}"

It's done this way so that flowcell level data are all stored in ONTFlowcell.

One disparity between 1-basecall flowcells and 1+-basecall flowcells is that the latter has many metrics stored in the basecalls directory, and how they should be aggregated to the flowcell level is to be done in the planned overhaul.

Thoughts?

@SHuang-Broad SHuang-Broad force-pushed the sh_ont_multbasecall branch 3 times, most recently from 54ae623 to 9b3c700 Compare October 15, 2022 03:45
@SHuang-Broad SHuang-Broad self-assigned this Mar 29, 2023
@SHuang-Broad SHuang-Broad force-pushed the sh_ont_multbasecall branch 2 times, most recently from 5acec5d to ae2a722 Compare April 11, 2023 19:23
version 1.0

import "../../../tasks/Utility/Utils.wdl" as Utils
import "../../../tasks/Utility/GeneralUtils.wdl" as GU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are Utils vs GeneralUtils?


Map[String, Float] raw_reads_stats = nanoplot_map

Map[String, Float] aligned_reads_stats = NanoPlotFromBam.stats_map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but at some point I really would like us to rethink a few things. This output is not particularly conducive to easy analysis in Jupyter notebooks. The answer could be to have some official boilerplate code that presents the Terra table as a pandas dataframe or an R tibble, with the maps here exploded out as columns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The little library that I'm going to present on tomorrow can support this easily.

…directories

  * one workflow that essentially copies current ONTFlowcell, but aims at a basecall directory
  * one workflow that plays the role of ONTFlowcell, but actually aggregates results from the WF above
@SHuang-Broad SHuang-Broad merged commit d805eed into main Apr 11, 2023
@SHuang-Broad SHuang-Broad deleted the sh_ont_multbasecall branch April 11, 2023 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants