Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fade stats-clip output documentation? #30

Open
jon-nowacki opened this issue Apr 29, 2022 · 1 comment
Open

fade stats-clip output documentation? #30

jon-nowacki opened this issue Apr 29, 2022 · 1 comment

Comments

@jon-nowacki
Copy link

Is there description for these values? It's the output for fade stats-clip

  • qname
  • sc_q_scores
  • sc_seq
  • sc_avg_bq
  • avg_bq
  • art_status

Also, do you have a way to produce a histogram of the soft clip read lengths? It would be a great way to identify read quality coming off of the sequencing machine.

@charlesgregory
Copy link
Collaborator

No I don't have it explicitly documented anywhere.

  • qname: query name of the original read
  • sc_q_scores: phred-scaled base quality scores of the soft-clipped sequence
  • sc_seq: nucleotide sequence of the soft-clipped sequence
  • sc_avg_bq: average base quality of soft-clipped sequence
  • avg_bq: average base quality of the entire read
  • art_status: boolean value specifying whether this soft-clip was identified as artifact or not

As far as a histogram goes, I think creating the histogram could be done with bash tools.

This line below should extract the lengths of all the soft-clipped sequences from fade's stats-clip output:

cat fade_stats_clip_output.tsv | cut -f3 | awk '{ print length }' > soft_clipped.lengths.txt

Then by following this stack overflow question: https://stackoverflow.com/questions/39614454/creating-histograms-in-bash

If you save their script to the file hist.sh (I would also modify the bin size to be something like 3bp):

chmod +x hist.sh
./hist.sh  soft_clipped.lengths.txt

That could output what you need. It should output a column of bins and a column of bin sizes.

Though I haven't tested this yet. I could add a histogram-clips subcommand to fade, though it would yield similar results. If you would like to plot a histogram, that would be outside of fade's scope, but the stats-clip output has the data you need. Let me know if that helps at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants