We start by fetching chrom sizes, some peaks and some aligned reads:

In [None]:
!pip install bionumpy
!wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes
!wget -O - https://www.encodeproject.org/files/ENCFF843VHC/@@download/ENCFF843VHC.bed.gz | zcat | sort -k 1,1 -k2,2n > peaks.bed
!wget -O aligned_reads.bam https://www.encodeproject.org/files/ENCFF494VZW/@@download/ENCFF494VZW.bam

Reading a bed file is simple with BioNumPy. All data is stored in NumPy arrays, so e.g. getting average peak length is efficient and as simple as:

In [None]:
import bionumpy as bnp
import numpy as np
genome = bnp.Genome.from_file("hg38.chrom.sizes")
peaks = genome.read_intervals("peaks.bed")
print(np.mean(peaks.stop-peaks.start))

In the newest version of BioNumPy we have added a very simple way of dealing with large file that don't fit into memory. These can be "streamed" by adding stream=True. BioNumPy will then only read one chromosome at a time and handle all the complex stuff for you:

In [None]:
# Read peaks and alignments in "stream" mode
# This means BioNumPy will not read anything into memory
# before we call compute in the end
peaks = genome.read_intervals("peaks.bed", stream=True)
alignments = genome.read_intervals("aligned_reads.bam")

# Create a pileup of the alignments
alignment_pileup = alignments.get_pileup()

# fetching pileup values inside peaks
peaks_pileup = peaks[alignment_pileup]