Fail to transform a VCF file containing multiple genome data (Muliple sample) #2029
I tried with both
$ hdfs dfs -ls -C data data/multiple_sample.chr1.vcf $ bin/adam-submit \ --master yarn \ --deploy-mode cluster \ --num-executors 2 \ --driver-memory 4g\ --executor-memory 40g \ --executor-cores 3 \ --verbose \ --conf spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console \ -- transformGenotypes /user/jmercier/data/multiple_sample.chr1.vcf /user/jmercier/multiple_sample.chr1.adam .... 18/08/27 16:59:10 INFO Client: client token: N/A diagnostics: User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/jmercier/multiple_sample.chr1.adam already exists
I do not understand what is the problem as the directory was created by the tools itself!
Did you have any tips?
Thanks for your help
Note: I use the release 0.24
The text was updated successfully, but these errors were encountered:
I have tried to set the output directory into
I will retry to split my single VCF file into multiple VCF file to run again
$ hdfs dfs -ls -d drwxr-xr-x - jmercier jmercier 4096 2018-08-27 17:28 . $ hdfs dfs -ls -d data drwxrwxrwx - jmercier jmercier 16384 2018-08-27 14:18 data
I have been unable to replicate such a problem. Does the error happen early or late in processing?
Thanks @heuermh ,
I finally successfully transform genotypes with the command:
bin/adam-submit --master yarn \ --deploy-mode cluster \ --driver-memory 4g \ --executor-memory 4g \ --executor-cores 3 \ --verbose \ --num-executors 10 \ --conf spark.yarn.submit.waitAppCompletion=false \ --conf spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console \ -- transformGenotypes /user/jmercier/1000G/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf /user/jmercier/ALL.chr1.genotypes.adam
I do not know why this time the command work, I need to do some tests.
Thanks a lot for your help it is a real pleasure.
Have a nice day