You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running into something strange and I sincerely appreciate any feedback. I am working on chromosome 22 (chr22) 1000 genomes which is a .vcf file and it is about 11GB. To make it smaller in size and speed things up I am trying to convert this to .adam. I used the adam-submit vcf2adam and it successfully creates the .adam files. The problem is that now when I run the adam-submit flagstat or popstrat I am getting the following error message:
java.io.FileNotFoundException: Couldn't find any files matching maprfs:///thelocation
But the files are there and you can see them if running hadoop fs -ls maprfs:///thelocation.
I initially thought that this was probably due to the large size of the chromosome 22 (11GB) so I tried the following:
adam-submit transform maprfs:///thelocation/small.sam maprfs:///thelocation/small.adam
This creates the small.adam files successfully
adam-submit flagstat maprfs:///thelocation/small.adam
This works beautifully no problem
adam-submit vcf2adam maprfs:///thelocation/small.vcf maprfs:///thelocation/small.adam
This creates the small.adam files successfully
adam-submit flagstat maprfs:///thelocation/small.adam
This does NOT work and I am getting the same error as I was getting with the chromosome 22
java.io.FileNotFoundException: Couldn't find any files matching maprfs:///thelocation
I looked at the structure of the two small.adam files and they are different!!
The question is, am I missing something here with the vcf2adam? Is this the right way to convert the vcf to adam? Why am I getting the error even though the files are there?
Thank you
The text was updated successfully, but these errors were encountered:
.adam files are Avro formatted records stored in Parquet format.
For BAM/CRAM/SAM files converted to "ADAM format" that means AlignmentRecords stored in Parquet. For VCF files converted to "ADAM format" it will be Genotypes or Variants stored in Parquet.
The flagstat command works on AlignmentRecords stored in Parquet, not Genotypes or Variants stored in Parquet. Thus an error is expected; a FileNotFoundException is probably not the correct one, however. We should see if a more user-friendly error could be thrown.
parquet-tools allows you to inspect the structure of Parquet files on disk.
Thank you heuermh for clarifying that :)
I also found the reason why popstrat was giving the error, it had been written for an older spark and I was able to modify the Scala code. The two similar errors with no correlation were creating even further confusion. Thanks
Hi
I am running into something strange and I sincerely appreciate any feedback. I am working on chromosome 22 (chr22) 1000 genomes which is a .vcf file and it is about 11GB. To make it smaller in size and speed things up I am trying to convert this to .adam. I used the adam-submit vcf2adam and it successfully creates the .adam files. The problem is that now when I run the adam-submit flagstat or popstrat I am getting the following error message:
java.io.FileNotFoundException: Couldn't find any files matching maprfs:///thelocation
But the files are there and you can see them if running hadoop fs -ls maprfs:///thelocation.
I initially thought that this was probably due to the large size of the chromosome 22 (11GB) so I tried the following:
adam-submit transform maprfs:///thelocation/small.sam maprfs:///thelocation/small.adam
This creates the small.adam files successfully
adam-submit flagstat maprfs:///thelocation/small.adam
This works beautifully no problem
adam-submit vcf2adam maprfs:///thelocation/small.vcf maprfs:///thelocation/small.adam
This creates the small.adam files successfully
adam-submit flagstat maprfs:///thelocation/small.adam
This does NOT work and I am getting the same error as I was getting with the chromosome 22
java.io.FileNotFoundException: Couldn't find any files matching maprfs:///thelocation
I looked at the structure of the two small.adam files and they are different!!
The question is, am I missing something here with the vcf2adam? Is this the right way to convert the vcf to adam? Why am I getting the error even though the files are there?
Thank you
The text was updated successfully, but these errors were encountered: