Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterminism in PULSAR_1.0.pl across different runs / machines #1

Open
lilybhattacharjee5 opened this issue Oct 29, 2021 · 0 comments

Comments

@lilybhattacharjee5
Copy link

Hi,

First of all, thank you for writing these scripts! I'm a grad student at UC Davis who is currently using them for a research project. I noticed that for PULSAR_1.0.pl, running with the same parameters on the same input files (e.g. the given ones in Example/) lead to different Phased_chromosome.vcf outputs. Between runs, I deleted the old Phased_chromosome.vcf / Potential_private_variants.rv outputs.

Run 1:
command used: perl PULSAR_1.0.pl Example/Seven_sib_pedigree.csv Example/Genotypes.vcf 5 Example/Allele_frequency.csv 0.05

Run 2:
command used: same as Run 1

The diff for the generated VCF files is not empty:
Screen Shot 2021-10-28 at 4 59 08 PM

Comparing the two files in R gives different numbers of NaNs (& as shown in the diff, some column values are different as well):

all.equal(phased1@gt, phased2@gt)
"'is.NA' value mismatch: 45 in current 147 in target"

print(sum(is.na(phased1@gt)))
147
print(sum(is.na(phased2@gt)))
45

all.equal(phased1@fix, phased2@fix)
TRUE
all.equal(phased1@meta, phased2@meta)
TRUE

Similarly, I noticed that scp-ing the Example folder / given PULSAR_1.0.pl script to a remote machine gave even more different results:

Remote run:
command used: same as runs 1/2

all.equal(localPhased@gt, remotePhased@gt)
"'is.NA' value mismatch: 21 in current 147 in target"

print(sum(is.na(localPhased@gt)))
147
print(sum(is.na(remotePhased@gt)))
21

all.equal(localPhased@fix, remotePhased@fix)
TRUE
all.equal(localPhased@meta, remotePhased@meta)
"1 string mismatch"

As I'm not very familiar with Perl, I was wondering if this is expected behavior? Should there be some degree of nondeterminism in outputs across runs or do you have an idea of what may be causing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant