Skip to content

Could not retrieve index file for primary.bam #130

@eboileau

Description

@eboileau

Description

I wish to use the latest NanoSim v3.0.1 (supports reading .gz sequence files, and bam files), but read_analysis.py does not complete. The primary.bam file is not indexed, an that might be an issue for pysam, but there seem to be more, this might be related to #129.

Error

2021-07-14 12:59:14: Processing alignment file: bam
[W::hts_idx_load3] The index file is older than the data file: analysis/Nanopore.bam.bai
021-07-14 12:59:16: Aligned reads analysis
[E::idx_find_and_load] Could not retrieve index file for 'analysis/nanosim_model/sim_primary.bam'

and further down

2021-07-14 12:59:17: match and error models
[E::idx_find_and_load] Could not retrieve index file for 'analysis/nanosim_model/sim_primary.bam'
Traceback (most recent call last):
  File "/beegfs/homes/eboileau/.miniconda3/envs/scNapBar-dev/bin/besthit_to_histogram.py", line 318, in hist
    cs_string = alnm.get_tag('cs')
  File "pysam/libcalignedsegment.pyx", line 2399, in pysam.libcalignedsegment.AlignedSegment.get_tag
  File "pysam/libcalignedsegment.pyx", line 2438, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'cs' not present"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eboileau/.miniconda3/envs/scNapBar-dev/bin/read_analysis.py", line 720, in <module>
    main()
  File "/home/eboileau/.miniconda3/envs/scNapBar-dev/bin/read_analysis.py", line 710, in main
    error_model.hist(prefix, alnm_ext)
  File "/beegfs/homes/eboileau/.miniconda3/envs/scNapBar-dev/bin/besthit_to_histogram.py", line 320, in hist
    cs_string = get_cs(alnm.original_sam_line.split()[5], alnm.get_tag('MD'))
AttributeError: 'pysam.libcalignedsegment.AlignedSegment' object has no attribute 'original_sam_line'

Expected behavior

read_analysis.py completes successfully, and generates the model to be used as input for simulator.py.

To reproduce

read_analysis.py genome -i Nanopore.fq.gz -ga Nanopore.bam -o nanosim_model/sim

and this also occurs with all uncompressed input files (I guess this is expected, since NanoSim now outputs compressed files anyway)

read_analysis.py genome -i Nanopore.fq -ga Nanopore.sam -o nanosim_model/sim

However, using NanoSim 2.5.0, the latest command is successful.

Environment

Python 3.7.6
conda 4.9.2
NanoSim 3.0.1 ( but Note that the version has not been updated in some scripts, e.g. read_analysis.py --version return NanoSim 3.0.0, although I am using 3.0.1 )
pysam 0.16.0.1 (samtools 1.10, using htslib 1.10.2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions