transformAlignments cannot repartition files #1808
Comments
What happens instead? What happens if you use |
I used repartition. The result was generating ~1200 partitions, instead of 200. It's hard to tell the starting partitions from a bam file, so I guess one would calculate the estimated partitions based on resources then call repartition or coalesce accordingly? |
Can you send me the application ID offline? I'd like to take a look.
You should get roughly |
Sent @fnothaft ! That would explain the large number of partitions then, given the file was ~120GB. |
I think we debugged this locally. Closing. Please reopen @akmorrow13 if I was wrong. |
~/ADAM/adam/bin/adam-submit --packages org.apache.parquet:parquet-avro:1.8.2 --master yarn-client --num-executors 8 --executor-memory 20g -- transformAlignments /data/platinum/NA12877_S1.bam /data/platinum/transformedAlignments/NA12877_S1.bam.adam -repartition 200
or ~/ADAM/adam/bin/adam-submit --master yarn-client --num-executors 8 --executor-memory 20g -- transformAlignments -repartition 200 /data/platinum/NA12877_S1.bam /data/platinum/transformedAlignments/NA12877_S1.bam.adam
do not repartition
The text was updated successfully, but these errors were encountered: