-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
methyl_summary generation problem #65
Comments
Can you submit your bed file so I can try to reproduce the error? |
I have solved the problem of methyl_frame geneartion but now other problem I am facing is this
can you tell me how should I resolve this and what might be the issue.I have added most of the information needed to resolve this issue hope you reply soon. Also I have used Dorado with 5mC model |
can anyone help me solve this issue |
Can you share either your R environment or attach your |
I think it's going to be difficult for me to import this representation of your dmr_obj into my code to try to reproduce the error. Can you save your r environment and attach it here as a .Rdata file? |
sorry that is confidential can you suggest me a solution by the above data or if you need any other data to get to solution I can provide you. What I can tell you is I followed the example data for demo run for which I got the results without facing any problems and I have edited my input file in same way as the example file were but I am facing this problem now. For experimental data
For example data in package
could this be making the problem ? |
I appreciate the additional information! So I'm curious what your fix was to the very first issue you raised? Our pipeline was set up for an older version of deepsignal and I'm curious if the output file is drastically different from what the code expects. Our latest improvements have been geared toward the use of Dorado since we have found that works better for us. It does appear like in your input data you have multiple rows for each position? That is the most likely culprit that I can think of with what's been provided since the older version of deepsignal / dorado output produces one row per position in the genome |
I am attaching the R environment below as a workspace. I have removed
multiple rows for each position from the input bed file but I am getting
the same error. Please look through it and suggest how to solve the problem.
new_workspace.zip
<https://drive.google.com/file/d/1ZcB5i16qlHO9xCCQTYnh1Vz3TJdHsvXN/view?usp=drive_web>
…On Mon, 22 Jul 2024 at 20:20, Tom Cairns ***@***.***> wrote:
I appreciate the additional information! So I'm curious what your fix was
to the very first issue you raised? Our pipeline was set up for an older
version of deepsignal and I'm curious if the output file is drastically
different from what the code expects. Our latest improvements have been
geared toward the use of Dorado <https://github.com/nanoporetech/dorado>
since we have found that works better for us. It does appear like in your
input data you have multiple rows for each position?
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BEEDUDMYYJZ37I35RNFJUQDZNUL2BAVCNFSM6AAAAABKSITWZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGE2TENRQGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
As for the earlier issue I raised about methylframe generation it was because I got the output from deepsignal3 had that format which I later adjusted as per the example files you provided after doing that the issue was resolved. |
The issue is indeed because there are multiple rows for the same position. This leads to "LongMeth" and "LongPercent" not being the appropriate lengths since they get created by using a grouping function. If you remove duplicate positions from your methylframe it should work. Best, |
I have removed multiple rows for same position but it is still not working
I am doing this trial run for two samples to see if it works then do it for
rest of the files so I made subset of file for chromosome 2 & 3 and tried
to run them through and facing the error now I am attaching two bed files
for you to run through which have entries from one chromosome of my trial
file try to find the error if you can and please help me sort it out. Thank
you.
Bed_files.zip
<https://drive.google.com/file/d/1wV8WAnQntVWKV_kfV6Ltpgl7SyzkGyla/view?usp=drive_web>
…On Fri, 26 Jul 2024 at 06:10, jcolicchio-soundag ***@***.***> wrote:
The issue is indeed because there are multiple rows for the same position.
This leads to "LongMeth" and "LongPercent" not being the appropriate
lengths since they get created by using a grouping function. If you remove
duplicate positions from your methylframe it should work.
Best,
Jack
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BEEDUDIQMKGXELMJUNSVUEDZOGLI5AVCNFSM6AAAAABKSITWZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRG4YTEMJSGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Just confirmed, there are still rows with the same chromosome and positions:
If you run this distinct command first to remove duplicates (on the megaframe) your commands should work. |
Can you write the whole script which you used to do this because I am still getting errors. And if you were able to do complete analysis please provide the complete script on my email ID praddyumnapdm@bicpu.edu.in |
Hey Jack, can you share with me the R commands you used for running those bed files I gave you and how you removed duplicates from it. you can write those here or can mail me on above given mail ID, Thankyou. |
After removing multiple entries for chromosome and positions this is what I am getting can you help me solve this
|
Nice work! That means it is working, but the nice model cannot converge,
which is expected if there is only one individual per treatment group.
You made it as far as is possible with just a single sample in each group.
If you have two or more individuals per group this next step would work.
Jacm
On Fri, Aug 2, 2024 at 3:26 AM PraddyumnaR ***@***.***> wrote:
After removing multiple entries for chromosome and positions this is what
I am getting can you help me solve this
methyl_summary <- create_methyl_summary(dmr_obj,
-
control = 'C',
-
treated = 'T',
-
additional_summary_cols = list(
-
c('sd', 'Group')
-
))
[1] "Number of columns in methyl_summary is correct"
methyl_summary <- find_DMR(methyl_summary, dmr_obj, fixed = c('Group'),
-
random = c('Individual'), reads_threshold = 5,
-
control = 'C', model = 'beta-binomial',
-
analysis_type = 'individual')
cbind(Meth, UnMeth) ~ (1 | Individual) + Group
<environment: 0x55d7b284a510>
|======================================================================|
100%
There were 50 or more warnings (use warnings() to see the first 50)
warnings()
Warning messages:
1: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
Model convergence problem; . See vignette('troubleshooting'),
help('diagnose')
2: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
Model convergence problem; . See vignette('troubleshooting'),
help('diagnose')
3: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
Model convergence problem; . See vignette('troubleshooting'),
help('diagnose')
4: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
Model convergence problem; . See vignette('troubleshooting'),
help('diagnose')
5: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
Model convergence problem; . See vignette('troubleshooting'),
help('diagnose')
changepoint_cols = find_changepoint_col_options(methyl_summary)
character(0)
methyl_summary <- find_DMR(methyl_summary, dmr_obj, fixed = c('Group'),
-
random = c('Individual'), reads_threshold = 3,
-
control = 'C', model = 'binomial',
-
analysis_type = 'group')
cbind(Meth, UnMeth) ~ (1 | Individual) + Group
<environment: 0x55d745b5b8d8>
boundary (singular) fit: see help('isSingular')
[1] "37 No bobyqa Converge, trying Nelder"
[1] "37 No Converge"
[1] "78 No bobyqa Converge, trying Nelder"
[1] "78 No Converge"
[1] "80 No bobyqa Converge, trying Nelder"
[1] "80 No Converge"
[1] "95 No bobyqa Converge, trying Nelder"
[1] "95 No Converge"
[1] "172 No bobyqa Converge, trying Nelder"
[1] "172 No Converge"
[1] "174 No bobyqa Converge, trying Nelder"
[1] "174 No Converge"
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATXTYGJICQIKYOYXLHSJ7LDZPNNDRAVCNFSM6AAAAABKSITWZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGA3DINBUHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
*Jack Colicchio*
Research Scientist
Sound Agriculture
908 451 4985 | ***@***.***
sound.ag | Twitter <https://twitter.com/sound_ag>| LinkedIn
<https://www.linkedin.com/company/sound-agriculture/>
|
Thankyou for the insight Jack I am in process of generating rest of my files hope it works without any problem since methyl summary generation was successful. |
Now that I have 3 samples in each Control and treated group still I am getting this error what could be the reason. Methyl summary was generated successfully but now this problem is there.
[1] "Number of columns in methyl_summary is correct"
cbind(Meth, UnMeth) ~ (1 | Individual) + Group |
I am facing problem in doing group or individual DMR analysis that is why I
am attaching link for my bed files below please check them once and help me
resolve the issue.
https://drive.google.com/file/d/1G6iGbVv-GngvTxoMOFEHi43mWOL2Hqqw/view?usp=drive_link
,
https://drive.google.com/file/d/19uAc-zTp2oBot7c0Zibc8uaXY5wsFGg1/view?usp=drive_link
,
https://drive.google.com/file/d/1LNwmVZwFnsa6aSHWs0c4Ii-mRBBILZhh/view?usp=drive_link
,
https://drive.google.com/file/d/1RM2xPELraePGzvhSLTAhs7cgokxoh961/view?usp=drive_link
On Fri, 2 Aug 2024 at 20:33, jcolicchio-soundag ***@***.***>
wrote:
… Nice work! That means it is working, but the nice model cannot converge,
which is expected if there is only one individual per treatment group.
You made it as far as is possible with just a single sample in each group.
If you have two or more individuals per group this next step would work.
Jacm
On Fri, Aug 2, 2024 at 3:26 AM PraddyumnaR ***@***.***> wrote:
> After removing multiple entries for chromosome and positions this is
what
> I am getting can you help me solve this
>
> methyl_summary <- create_methyl_summary(dmr_obj,
>
>
> -
>
> control = 'C',
>
> -
>
> treated = 'T',
>
> -
>
> additional_summary_cols = list(
>
> -
>
> c('sd', 'Group')
>
> -
>
> ))
>
>
> [1] "Number of columns in methyl_summary is correct"
>
> methyl_summary <- find_DMR(methyl_summary, dmr_obj, fixed = c('Group'),
>
>
> -
>
> random = c('Individual'), reads_threshold = 5,
>
> -
>
> control = 'C', model = 'beta-binomial',
>
> -
>
> analysis_type = 'individual')
>
>
> cbind(Meth, UnMeth) ~ (1 | Individual) + Group
> <environment: 0x55d7b284a510>
> |======================================================================|
> 100%
> There were 50 or more warnings (use warnings() to see the first 50)
>
> warnings()
> Warning messages:
> 1: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
> Model convergence problem; . See vignette('troubleshooting'),
> help('diagnose')
> 2: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
> Model convergence problem; . See vignette('troubleshooting'),
> help('diagnose')
> 3: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
> Model convergence problem; . See vignette('troubleshooting'),
> help('diagnose')
> 4: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
> Model convergence problem; . See vignette('troubleshooting'),
> help('diagnose')
> 5: In finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old) :
> Model convergence problem; . See vignette('troubleshooting'),
> help('diagnose')
>
> changepoint_cols = find_changepoint_col_options(methyl_summary)
> character(0)
> methyl_summary <- find_DMR(methyl_summary, dmr_obj, fixed = c('Group'),
>
>
> -
>
> random = c('Individual'), reads_threshold = 3,
>
> -
>
> control = 'C', model = 'binomial',
>
> -
>
> analysis_type = 'group')
>
>
> cbind(Meth, UnMeth) ~ (1 | Individual) + Group
> <environment: 0x55d745b5b8d8>
> boundary (singular) fit: see help('isSingular')
> [1] "37 No bobyqa Converge, trying Nelder"
> [1] "37 No Converge"
> [1] "78 No bobyqa Converge, trying Nelder"
> [1] "78 No Converge"
> [1] "80 No bobyqa Converge, trying Nelder"
> [1] "80 No Converge"
> [1] "95 No bobyqa Converge, trying Nelder"
> [1] "95 No Converge"
> [1] "172 No bobyqa Converge, trying Nelder"
> [1] "172 No Converge"
> [1] "174 No bobyqa Converge, trying Nelder"
> [1] "174 No Converge"
>
> —
> Reply to this email directly, view it on GitHub
> <#65 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ATXTYGJICQIKYOYXLHSJ7LDZPNNDRAVCNFSM6AAAAABKSITWZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGA3DINBUHE>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
*Jack Colicchio*
Research Scientist
Sound Agriculture
908 451 4985 | ***@***.***
sound.ag | Twitter <https://twitter.com/sound_ag>| LinkedIn
<https://www.linkedin.com/company/sound-agriculture/>
—
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BEEDUDN5SVZBPBXVTXUG7MTZPONTFAVCNFSM6AAAAABKSITWZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGYYDAOJVGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
What is the error you're facing here? The output looks correct to me |
boundary (singular) fit: see help('isSingular') |
It is running for more than a day now is there any way that we can speed up this process ? I have provided you with the bed files if you can try running it on your end and figure out something it would be great. |
I am running two analysis from yesterday one on the bed file with entries on chromosome 1 & 2 and another on only chromosome 1 now the line count in the megaframe generated for those two sets are 3649274 for chr 1&2 and 1936024 for Chr1 how much time will it take I have to do the analysis for 12 chromosomes so with this speed it will take 24 days to complete the analysis. Is it possible to speed up the process of this Group DMR analysis. Hope you reply soon. |
For larger analysis we definitely recommend splitting the data into smaller chunks and running on that. For example with whole genome analysis we recommend splitting the data by chromosome or more across multiple machines to run in parallel. Unfortunately we don't have immediate plans to speed up the code on the back end as that would require a major overhaul of the code, but it is something we hope to get to in the next year |
Okay if that's the case then i think I will have to wait for it be done chromosome by chromosome. Also I completed analysis on one chromosome file but while doing that I faced yet another issue with following command:
So here what happened is it did changepoint analysis successfully but the volcano and line plots it was supossed to save were not saved also I tried replacing TRUE with T, the volcano plots were generated on R Graphics device but were not saved to the directory. whereas when a month before I ran analysis on demo data it worked perfectly fine. what can be the issue here? |
Do you have any target genes that you're working with? The issue with the plots is that we found the line plots took too much RAM to produce for the entire genome, crashing a lot of our machines. So any data with > 100,000 rows the plots will not be displayed. Are you running this on the data split by chromosome? If that's the case I would recommend setting |
I was not using the whole genome file instead I ran the analysis on just Chr1 data also i tried it with Whole_genome = FALSE but still it didn't save the plots line plots were not even visible on R Graphics Device only volcano plot was generated. Apart from this I and DMR_region_summary with headers: Can I know the interpretation of these headers, it would be very helpful if I get the context of what these column entries signify. |
Also can you tell me should we consider sample wise methylchange in our case S4_MethChange", S5_MethChange" and S6_MethChange" or should we consider Treat_V_Control as we are performing group wise analysis which one is significant. |
Sample wise shifts in methylation are the "MethChange" columns, group wise shifts in methylation are Treat_V_Control. As far as the other columns, they refer to dmr regions, or groups of regions with similar shifts in methylation. That file refers to these different regions and includes summary stats for each one. the "dmr_score" column refer to a scoring metric described in our publication, while the other columns reflect the control methylation, shifts in methylation, T statistic, size of the region, and its start and stop. I will look to add better documentation for this as soon as i can. |
Thank you for the reply. Hope you put out a detailed description of output in future which will be helpful for interpretation of users. also if you can update the code such that the experimental design starter is generated in the way it is supposed to be by taking input from user in R-environment itself. |
Methylframe <- generate_methylframe(
Creating the Megaframe
|--------------------------------------------------|
|==================================================|
Error in
[.data.frame
(Methyl_bed, , c(1, 2, 3, 4, 5, 6)) :undefined columns selected
This is the error I am getting after submiting the DeepSignal3 output bed file as input here what should be done to resolve it
The text was updated successfully, but these errors were encountered: