Skip to content

SAM file generated by lra CONTIG mode is truncated and cannot be compressed by samtools #41

@Milia1368

Description

@Milia1368

I'm encountering an issue with the CONTIG mode of lra: the SAM file produced by lra align cannot be processed (compressed/sorted) by samtools, as samtools reports the file as truncated. Below are the detailed reproduction steps and error logs.

Data Preparation

  1. Source of haplotype1.fasta:
    The original assembly data is from the public S3 bucket:
    aws s3 ls --no-sign-request s3://ont-open-data/londoncalling2024/assembly/assm/assembly.fasta
  2. Sequence Extraction:
    I extracted sequences containing "haplotype1" and "haplotype2" from the above FASTA file, saved as $HAP1 (haplotype1.fasta) and $HAP2 respectively.
  3. Reference Genome:
    Used chm13v2.0.fa as the reference genome ($REF), located at /io/chm13/chm13v2.0.fa.

Commands Executed & Corresponding Outputs

1. Run lra align (CONTIG mode)

Command:

conda run -n lra_env lra align -CONTIG /io/chm13/chm13v2.0.fa /io/haplotype1.fasta -t 8 -p s > ./lra_hap1.sam

lra Output Log:

lra aligned 3 from 10, 820M bases (291.8s).
lra aligned 5 from 12, 1075M bases (204.1s).
lra aligned 6 from 13, 1267M bases (395s).
lra aligned 7 from 14, 1401M bases (198.3s).
lra aligned 8 from 15, 1505M bases (255.5s).
lra aligned 9 from 16, 1605M bases (53.39s).
lra aligned 10 from 17, 1752M bases (436.3s).
lra aligned 11 from 18, 1953M bases (309.9s).
lra aligned 13 from 20, 2156M bases (740.9s).
lra aligned 14 from 21, 2400M bases (261.7s).
lra aligned 16 from 24, 2728M bases (674.6s).
lra aligned 17 from 25, 2859M bases (353.1s).

2. Attempt to compress SAM with samtools

Command:

samtools sort -m2G -@4 -o ./lra_hap1.bam ./lra_hap1.sam

samtools Error Output:

samtools sort: truncated file. Aborting

Additional Notes

  • The lra command completed without explicit error messages, only the progress logs shown above.
  • I confirmed the file path permissions and disk space are sufficient (no write failures).
  • The same samtools command works normally with SAM files generated by other alignment tools (e.g., minimap2).

Questions

  1. Could the truncated SAM file be caused by incomplete output from lra CONTIG mode?
  2. Are there any additional parameters or adjustments needed for lra CONTIG mode to generate valid SAM files compatible with samtools?

Thank you for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions