<div style="color:rgb(254, 184, 60);font-size:40px">Pyranges
<span style="color:black;font-size:20px"> v0.0.52</span></div>
<hr>


<div style = "content: ''; display: table; clear: both">
    <div style = "float:left; width: 50%">
            <div style="color:rgb(240, 184, 60);font-size:30px; font-weight: bold">What is Pyranges?</div>
             <p style = "font-size:20px;" > Efficient comparison of genomic <br>
                                            intervals in Python</p>
             <div style = "font-size:20px; margin-right:100px">
                <p>Examples of operations:</p>
                <ul>
                    <li>Overlap</li>
                    <li>Intersect</li>
                    <li>Sort</li>
                    <li>Substract</li>
                    <li>Others...</li>
                </ul>
            </div>
            <div style = "color:rgb(240, 184, 60);font-size:30px; font-weight: bold">Advantages</div>
            <div style = "float:left;font-size:20px;">
              <ul>
                  <li> faster (also in single-core mode) </li>
                  <li> supports multiple cores </li>
                  <li> memory-efficient </li>
                  <li> prepared to read multiple formats: BED,GTF, BAM </li>
                  <li> no dependence on external Pythonnal tools </li>
                  <li> uses Pandas DataFrames </li>
                </ul>
            </div>
    </div>
    <div style = "float:left; width: 50%">
        <div style = "float:left; padding-bottom:200px; border: 1px solid white;">
           <div style = "padding-bottom:40px">
            <div style="color:rgb(240, 184, 60);font-size:30px; font-weight: bold; margin-left:35px">Other tools</div>
            <div style = "font-size:20px;">
                <ul>
                    <li><b>GenomicRanges</b> (R)</li>
                    <li><b>Bedtools</b> -> <b>pybedtools </b> (wrapper for Python)</li> 
                    <li><b>Bedops</b></li>
                </ul>
            </div>
            </div>
            <div style="border:1px solid black;padding-bottom:10px">
                <img style="display:block" src="statistics.png" alt="intersection" width="100%" height="100%" style="padding-left:15px">
                <p style="padding-left:15px; font-size:10px;">Stovner EB, Sætrom P. (2019)</p>
            </div>
        </div>
    </div>
 </div>

In [15]:
import os
import pandas as pd
import pyranges as pr
import pybedtools as pbt

In [2]:
# Create your ouwn Pyrange object
pr.PyRanges(chromosomes="chr1", starts=(1, 5), ends=[3, 149],
            strands=("+", "-"), int64=True)

+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         3 | +            |
| chr1         |         5 |       149 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

In [16]:
# Create Pyrange object from DataFrame
A_pr = pd.read_csv('A.bed', 
                  sep='\t', 
                  header=None, 
                  names = ["Chromosome", "Start", "End", \
                           "Name","Score","Strand"])
A_pr = pr.PyRanges(A_pr)
print(A_pr)

+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         1 |       100 | feature1   |         0 | +            |
| chr1         |       100 |       200 | feature2   |         0 | +            |
| chr1         |       200 |       500 | feature3   |         0 | -            |
| chr2         |       100 |       140 | feature1   |         0 | +            |
| chr2         |       100 |       200 | feature2   |         0 | +            |
| chr2         |       200 |       300 | feature3   |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 6 rows and 6 columns from 2 chromosomes.
For printing, the PyRanges was sorted o

In [17]:
# Create object file directly from a file 
B_pr = pr.read_bed("B.bed")
print(B_pr)

+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |       210 |       300 | feature3   |         0 | +            |
| chr1         |         1 |        84 | feature1   |         0 | -            |
| chr1         |       150 |       183 | feature2   |         0 | -            |
| chr2         |        63 |       140 | feature1   |         0 | +            |
| chr2         |       155 |       190 | feature2   |         0 | -            |
| chr2         |       280 |       310 | feature3   |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 6 rows and 6 columns from 2 chromosomes.
For printing, the PyRanges was sorted o

In [20]:
A_pbt = pbt.BedTool('A.bed') 
B_pbt = pbt.BedTool('B.bed')

In [21]:
%timeit -n 100 A_pbt.intersect(B_pbt)
df_pbt = A_pbt.intersect(B_pbt)
print(df_pbt)

100 loops, best of 5: 39.8 ms per loop
chr1	1	84	feature1	0	+
chr1	150	183	feature2	0	+
chr1	210	300	feature3	0	-
chr2	100	140	feature1	0	+
chr2	100	140	feature2	0	+
chr2	155	190	feature2	0	+
chr2	280	300	feature3	0	-



In [22]:
%timeit -n 100 A_pr.intersect(B_pr)
df_pr = A_pr.intersect(B_pr)
print(df_pr)

100 loops, best of 5: 62.3 ms per loop
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         1 |        84 | feature1   |         0 | +            |
| chr1         |       150 |       183 | feature2   |         0 | +            |
| chr1         |       210 |       300 | feature3   |         0 | -            |
| chr2         |       100 |       140 | feature1   |         0 | +            |
| chr2         |       100 |       140 | feature2   |         0 | +            |
| chr2         |       155 |       190 | feature2   |         0 | +            |
| chr2         |       280 |       300 | feature3   |         0 | -            |
+--------------+-----------+-----------+------------+-----------+-----

In [2]:
gtf_pr = pr.read_gtf('gencode.v36.annotation.gtf')
bed_pr = pr.read_bed('random.bed')
print(gtf_pr)

+--------------+------------+------------+-----------+-----------+------------+--------------+------------+-------+
| Chromosome   | Source     | Feature    | Start     | End       | Score      | Strand       | Frame      | +17   |
| (category)   | (object)   | (object)   | (int32)   | (int32)   | (object)   | (category)   | (object)   | ...   |
|--------------+------------+------------+-----------+-----------+------------+--------------+------------+-------|
| chr1         | HAVANA     | gene       | 11868     | 14409     | .          | +            | .          | ...   |
| chr1         | HAVANA     | transcript | 11868     | 14409     | .          | +            | .          | ...   |
| chr1         | HAVANA     | exon       | 11868     | 12227     | .          | +            | .          | ...   |
| chr1         | HAVANA     | exon       | 12612     | 12721     | .          | +            | .          | ...   |
| ...          | ...        | ...        | ...       | ...       | ...  

In [12]:
gtf_pr["chr1", 0:12000]

+--------------+------------+------------+-----------+-----------+------------+--------------+------------+-------+
| Chromosome   | Source     | Feature    |     Start |       End | Score      | Strand       | Frame      | +17   |
| (category)   | (object)   | (object)   |   (int32) |   (int32) | (object)   | (category)   | (object)   | ...   |
|--------------+------------+------------+-----------+-----------+------------+--------------+------------+-------|
| chr1         | HAVANA     | gene       |     11868 |     14409 | .          | +            | .          | ...   |
| chr1         | HAVANA     | transcript |     11868 |     14409 | .          | +            | .          | ...   |
| chr1         | HAVANA     | exon       |     11868 |     12227 | .          | +            | .          | ...   |
+--------------+------------+------------+-----------+-----------+------------+--------------+------------+-------+
Stranded PyRanges object has 3 rows and 25 columns from 1 chromosomes.
F

In [9]:
gtf_pbt = pbt.BedTool('gencode.v36.annotation.gtf')
bed_pbt = pbt.BedTool('random.bed')

In [11]:
filtered = gtf_pbt.filter(lambda gtf_pbt: gtf_pbt.chrom == 'chr1' and 
                                          gtf_pbt.start <= 12000) 
print(filtered)

chr1	HAVANA	gene	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";
chr1	HAVANA	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	11869	12227	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_tra

In [20]:
%timeit -n 20 gtf_pr.intersect(bed_pr)

20 loops, best of 5: 5.91 s per loop


In [13]:
%timeit -n 20 gtf_pbt.intersect(bed_pbt)

20 loops, best of 5: 10.8 s per loop
