-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to train a model #7
Comments
Hi @ShangjinTan , Thanks for your interest.
Some scripts from /scripts may be useful. Feel free to ask any more details and scripts to the email nipeng at csu.edu.cn. Best, |
I am interested in training my own model. Thanks in advance! |
Hi @ardakdemir ,
Best, |
Thanks a lot for the suggestions! Best Arda |
Dear @PengNi "First you can check out nanopolish. The data (PRJEB13021) contains R9 reads of E.coli and Human NA12878. The reads are either totally methylated or totally unmethylated for 5mC." The dataset you mentioned above contains many files. Which ones did you use for training? And how should I infer whether the reads are methylated or unmethylated? Is the information contained inside the fast5 files? |
Hi @ardakdemir , We use E.coli R9 reads for training and testing. You can recognize the type of files by the filenames. The file of which the filename contains "pcr" means the reads are unmethylated. "pcr_MSssI" means the reads are methylated. You can read their paper for double-check. Best, |
Thanks a lot! |
How can I obtain the same reference you used for mapping the fast5 files for : E. coli K12 ER2925 I could not find any reference for ER2925 |
I used this reference: ftp://ftp.ensemblgenomes.org/pub/release-29/bacteria//fasta/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/dna/Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.29.dna.genome.fa.gz |
Thanks! I also downloaded that but tombo gives:
Poor raw to expected signal matching
error, and suggests (revert with `tombo filter clear_filters`)
Did you experience anything similar?
|
tombo only supports R9.4+ reads. If you want to process the E.coli R9 2D reads, you can use nanoraw. Also, I suggest you use the R9.4 reads (maybe human NA12878 (PRJEB23027) ) for experiments too. Nanopore may no longer use R9 2D flowcell anymore. |
Thanks a lot for the information.
I wonder how using the raw basecalls would affect the final performance on
read level?
Do you think we can skip the resquiggle step and do the methylation calling
directly from nanopore basecalls?
We may not always have the reference for the resquiggle step
Peng Ni <notifications@github.com>, 12 Eki 2019 Cmt, 21:08 tarihinde şunu
yazdı:
… tombo only supports R9.4+ reads. If you want to process the E.coli R9 2D
reads, you can use nanoraw <https://github.com/marcus1487/nanoraw>.
Also, I suggest you use the R9.4 reads (maybe human NA12878 (PRJEB23027) )
for experiments too. Nanopore may no longer use R9 2D flowcell anymore.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AC5IHLSWH5TZU3T6KUJ35KLQOG435A5CNFSM4G7E7X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBB56JI#issuecomment-541318949>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC5IHLVPNTEO2AYDLTTUKB3QOG435ANCNFSM4G7E7X5Q>
.
|
Emm, in my opinion, it makes no sense to call methylation without a reference. We always need to align reads to a genome to do some analysis. |
Hi Deepsignal,
I am impressed by the high sensitivity and accuracy of deepsignal in calling methylation sites. I would very much like to try it in my study. Here I have a few questions.
Deepsignal only provides a human CpG model. I want is to extract all methylation motifs (not only CpG) of all methylaiton types (6mA, 5mC, 4mC) from microorganisms. So it seems I have to train a custom model. Am I right?
deepsignal extract can extract features for training. Could you please explain a little bit about what exactly is extracted?
I have tried deepsignal extract on the example yeast data. The methy_label of all positions are all '1'. Does '1' mean that this position will be used for training? What does '1' mean?
If the result of deepsignal extract is used for training a model, how can deepsignal know which base is methylated?
deepsignal extracts selected motifs with the same mod_loc. If I want to extract all types of motifs (probably with different mod_loc), including novel motifs. Does this mean that deepsignal extract is not applicable to me?
For training a model, if the input is a pool of all methylation types, is there a requirement for the number of a type, or of a specific motif of a type?
Could you please give some advice on how to prepare the files for training a model?
Thank you so much.
Shangjin
The text was updated successfully, but these errors were encountered: