-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File check passed: FALSE #4
Comments
Your approach is correct and there is a bug in the |
Hi Simon!
Hope Natalia's suggestion solves your issue. ("Blank lines" not "Black
lines").
I will ask Natalia to be more verbose regarding chunks!
Best
Stuart
…On Tue, 16 Apr 2024 at 13:43, simonharnqvist ***@***.***> wrote:
Hi @StuartJEBaird <https://github.com/StuartJEBaird> and @nmartinkova
<https://github.com/nmartinkova>
I'm trying to run *DiemR* following the vignettes (
https://cran.r-project.org/web/packages/diemr/vignettes/Importing-data-for-genome-polarisation.html
and
https://cran.r-project.org/web/packages/diemr/vignettes/diemr-diagnostic-index-expecation-maximisation-in-r.html),
but I'm running into issues with converting my VCF to Diem format.
I've converted my (subset of a) VCF file to the Diem format:
vcf2diem(SNP="brenthis_chr1.vcf",
filename="brenthis",
chunk=10)
(Side question: what exactly does chunk set? It's clearly not the size of
chunks in markers, nor the number of chunks - would be great if
documentation could be a bit more explicit here)
I then went on to check if the first chunk is correctly formatted:
CheckDiemFormat(files="brenthis-01.txt",
ChosenInds = 1:13,
ploidy=list(rep(2,13)))
Which gives me:
File check passed: FALSE
Ploidy check passed: TRUE
I'm not getting any traceback, so it's hard to know what's wrong with my
file. Here's what the first few lines of *brenthis-01.txt* looks like:
S22___________
S
S______2_0____
S______2_0____
S00_0____0____
S
S
S21_0_01_0____
S21_0_01_0____
S______2_0____
S
S______0_00___
S______0_02___
S
S
S_2____121____
S_1____001____
Ignoring this check and running diem anyway, it tells me that I have
characters that are not allowed:
Error in emPolarise(origM = x[2:length(x)], changePolarity = x[1]) :
origM must contain only characters _, 0, 1, 2
I'm guessing it's not objecting to the 'S' beginning each line in the txt
file?
What am I doing wrong here?
Thanks in advance!
—
Reply to this email directly, view it on GitHub
<#4>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDCV54FHFLKDIZEQ2D4ZH3Y5UFF7AVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DKOBTGM3TGMY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just ask for tips on spreading load over cores (for big data), and
filtering output on DI (for any data).
S
On Tue, 16 Apr 2024 at 15:16, StuartJEBaird ***@***.***>
wrote:
… Hi Simon!
Hope Natalia's suggestion solves your issue. ("Blank lines" not "Black
lines").
I will ask Natalia to be more verbose regarding chunks!
Best
Stuart
On Tue, 16 Apr 2024 at 13:43, simonharnqvist ***@***.***>
wrote:
> Hi @StuartJEBaird <https://github.com/StuartJEBaird> and @nmartinkova
> <https://github.com/nmartinkova>
>
> I'm trying to run *DiemR* following the vignettes (
> https://cran.r-project.org/web/packages/diemr/vignettes/Importing-data-for-genome-polarisation.html
> and
> https://cran.r-project.org/web/packages/diemr/vignettes/diemr-diagnostic-index-expecation-maximisation-in-r.html),
> but I'm running into issues with converting my VCF to Diem format.
>
> I've converted my (subset of a) VCF file to the Diem format:
>
> vcf2diem(SNP="brenthis_chr1.vcf",
> filename="brenthis",
> chunk=10)
>
> (Side question: what exactly does chunk set? It's clearly not the size
> of chunks in markers, nor the number of chunks - would be great if
> documentation could be a bit more explicit here)
>
> I then went on to check if the first chunk is correctly formatted:
>
> CheckDiemFormat(files="brenthis-01.txt",
> ChosenInds = 1:13,
> ploidy=list(rep(2,13)))
>
> Which gives me:
>
> File check passed: FALSE
> Ploidy check passed: TRUE
>
> I'm not getting any traceback, so it's hard to know what's wrong with my
> file. Here's what the first few lines of *brenthis-01.txt* looks like:
>
> S22___________
> S
> S______2_0____
> S______2_0____
> S00_0____0____
> S
> S
> S21_0_01_0____
> S21_0_01_0____
> S______2_0____
> S
> S______0_00___
> S______0_02___
> S
> S
> S_2____121____
> S_1____001____
>
> Ignoring this check and running diem anyway, it tells me that I have
> characters that are not allowed:
>
> Error in emPolarise(origM = x[2:length(x)], changePolarity = x[1]) :
> origM must contain only characters _, 0, 1, 2
>
> I'm guessing it's not objecting to the 'S' beginning each line in the txt
> file?
>
> What am I doing wrong here?
>
> Thanks in advance!
>
> —
> Reply to this email directly, view it on GitHub
> <#4>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABDCV54FHFLKDIZEQ2D4ZH3Y5UFF7AVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DKOBTGM3TGMY>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Thanks for that lightning speed reply - sourcing the new version of Ignoring once again:
Gives me:
(14,26) is invariant to the @StuartJEBaird - yes please, will bother you with questions about how to do this on a genome scale once I have the subset test working. |
Simon, at this stage, I cannot see what the problem is right away, and would need your input file to debug. Are you open to emailing it? |
@nmartinkova That's probably the best way to go about it - I'll share a link with you later this week. Thanks! |
The link?
(I'm pissed you have a bug... want to smush it).
S
…On Wed, 17 Apr 2024 at 17:23, simonharnqvist ***@***.***> wrote:
@nmartinkova <https://github.com/nmartinkova> That's probably the best
way to go about it - I'll share a link with you later this week. Thanks!
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDCV563BJOVZE4Z27TOXMDY52HVPAVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRGUZTCNZRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Emailed it to Natalia earlier - have also sent to your email now @StuartJEBaird |
Hi Natalia!
Simon's vcf-extracted *diem* input includes lines like this:
3|30_03|30|323|30|3__3|3_
(Actually, I detect 1 line like this... looks unparsed)
... I am guessing this is what is causing the continuing bug.
Have a great weekend!
Bestest
Stuart
…On Wed, 17 Apr 2024 at 14:58, Natália Martínková ***@***.***> wrote:
Simon, at this stage, I cannot see what the problem is right away, and
would need your input file to debug. Are you open to emailing it?
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDCV54HPU6X47ZE4ZI7TSDY5ZWWHAVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRGIYDEOBVGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The unparsed vcf line is for site 27750 in the first file of sites
(brenthis-01.txt).
S
On Fri, 19 Apr 2024 at 18:02, StuartJEBaird ***@***.***>
wrote:
… Hi Natalia!
Simon's vcf-extracted *diem* input includes lines like this:
3|30_03|30|323|30|3__3|3_
(Actually, I detect 1 line like this... looks unparsed)
... I am guessing this is what is causing the continuing bug.
Have a great weekend!
Bestest
Stuart
On Wed, 17 Apr 2024 at 14:58, Natália Martínková ***@***.***>
wrote:
> Simon, at this stage, I cannot see what the problem is right away, and
> would need your input file to debug. Are you open to emailing it?
>
> —
> Reply to this email directly, view it on GitHub
> <#4 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABDCV54HPU6X47ZE4ZI7TSDY5ZWWHAVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRGIYDEOBVGQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Just to let you know: my immediate problem is fixed by using an alternative conversion script, so I'll leave it to you to decide when to close this issue |
Hi Simon,
Could you explain a little more? This may help us to avoid future issues.
My understanding was vcf2diem produced an error for site 27750 in the first
file of sites (brenthis-01.txt).
May we have that vcf input file? (for test purposes)
You mention an alternate conversion script... can you tell us what role
this script plays in your pipeline, and what the difference between the old
and alternate scripts is?
Many thanks!
S
…On Fri, 26 Apr 2024 at 11:39, simonharnqvist ***@***.***> wrote:
Just to let you know: my immediate problem is fixed by using an
alternative conversion script, so I'll leave it to you to decide when to
close this issue
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDCV546S42XG7MJ3N5EBBLY7IOFRAVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGAYTONZQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Stuart - if the Dropbox link I sent is still working you will have brenthis_ino_daphne.vcf.gz in there. I used a script by Sam Ebdon (I've asked him if I can share it with you) that replaces vcf2diem.R. The output of that scrip passes both checks and diem seems to be running happily. |
Perfect.
Cheers Simon!
S
…On Fri, 26 Apr 2024 at 12:33, simonharnqvist ***@***.***> wrote:
Hi Stuart - if the Dropbox link I sent is still working you will have
brenthis_ino_daphne.vcf.gz in there. I used a script by Sam Ebdon (I've
asked him if I can share it with you) that replaces vcf2diem.R. The output
of that scrip passes both checks and diem seems to be running happily.
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDCV52G36OUXLVKESSLFQLY7IURLAVCNFSM6AAAAABGJFP25KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGEZDKNZTGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Simon,
The function manual describes the reasoning for the |
Hi Simon, The issue with parsing the vcf files when the reference allele is not one of the two most frequent ones has now been solved and the update will be available from the version Best, |
Thanks - I'll give this a go when I get a chance, and then we can hopefully close this ticket |
Hi @StuartJEBaird and @nmartinkova
I'm trying to run DiemR following the vignettes (https://cran.r-project.org/web/packages/diemr/vignettes/Importing-data-for-genome-polarisation.html and https://cran.r-project.org/web/packages/diemr/vignettes/diemr-diagnostic-index-expecation-maximisation-in-r.html), but I'm running into issues with converting my VCF to Diem format.
I've converted my (subset of a) VCF file to the Diem format:
(Side question: what exactly does
chunk
set? It's clearly not the size of chunks in markers, nor the number of chunks - would be great if documentation could be a bit more explicit here)I then went on to check if the first chunk is correctly formatted:
Which gives me:
I'm not getting any traceback, so it's hard to know what's wrong with my file. Here's what the first few lines of brenthis-01.txt looks like:
Ignoring this check and running
diem
anyway, it tells me that I have characters that are not allowed:I'm guessing it's not objecting to the 'S' beginning each line in the txt file?
What am I doing wrong here?
Thanks in advance!
The text was updated successfully, but these errors were encountered: