# pyJASPAR Notebook

Once you have installed pyJASPAR, you can load the module and connect to the latest release of JASPAR.

In [1]:
from pyjaspar import jaspardb

Connect to the version of JASPAR you're interested in. This will return jaspardb class object.
For example here we're getting the JASPAR2020.

In [2]:
jdb_obj = jaspardb(release='JASPAR2024')

You can also check JASPAR version you are connected to using:

In [82]:
print(jdb_obj.release)

JASPAR2014


By default it is set to latest release/version of JASPAR database. For example.

In [4]:
jdb_obj = jaspardb()
print(jdb_obj.release)

JASPAR2024


### Get available releases
You can find the available releases/version of JASPAR using.

In [5]:
print(jdb_obj.get_releases())

['JASPAR2024', 'JASPAR2022', 'JASPAR2020', 'JASPAR2018', 'JASPAR2016', 'JASPAR2014']


### Get motif by using JASPAR ID
If you want to get the motif details for a specific TF using the JASPAR ID. If you skip the version of motif, it will return the latest version. 

In [6]:
motif = jdb_obj.fetch_motif_by_id('MA0006.1')

Printing the motif will all the associated meta-information stored in the JASPAR database cluding the matric counts.

In [51]:
{
    "name": motif.name,
    "matrix_id": f"{motif.base_id}.{motif.version}",
    "collection": motif.collection,
    "acc": motif.acc,
    "medline": motif.medline,
    "pazar_id": motif.pazar_id,
    "data_type": motif.data_type,
    "species": motif.species,
    "tf_family": motif.tf_family,
    "tax_group": motif.tax_group,
    "length": motif.length,
    "tf_class": motif.tf_class,
    "background": motif.pwm,
    "comment": motif.comment,
}

{'name': 'Ahr::Arnt',
 'matrix_id': 'MA0006.1',
 'collection': 'CORE',
 'acc': ['P30561', 'P53762'],
 'medline': '7592839',
 'pazar_id': None,
 'data_type': 'SELEX',
 'species': ['10090'],
 'tf_family': ['PAS domain factors', 'PAS domain factors'],
 'tax_group': 'vertebrates',
 'length': 6,
 'tf_class': ['Basic helix-loop-helix factors (bHLH)',
  'Basic helix-loop-helix factors (bHLH)'],
 'background': {'A': (0.125, 0.0, 0.0, 0.0, 0.0, 0.0),
  'C': (0.3333333333333333, 0.0, 0.9583333333333334, 0.0, 0.0, 0.0),
  'G': (0.08333333333333333,
   0.9583333333333334,
   0.0,
   0.9583333333333334,
   0.0,
   1.0),
  'T': (0.4583333333333333,
   0.041666666666666664,
   0.041666666666666664,
   0.041666666666666664,
   1.0,
   0.0)},
 'comment': 'dimer'}

In [83]:
motif_dict = vars(motif)
motif_dict

{'name': 'ZSCAN31',
 'alignment': None,
 'counts': {'A': [1041.0,
   51.0,
   6693.0,
   3.0,
   6722.0,
   7143.0,
   38.0,
   42.0,
   11.0,
   9.0,
   51.0,
   28.0,
   37.0,
   22.0,
   28.0,
   1811.0,
   1651.0,
   640.0],
  'C': [108.0,
   6741.0,
   1563.0,
   1.0,
   2.0,
   6.0,
   6451.0,
   11.0,
   80.0,
   7141.0,
   7066.0,
   7073.0,
   438.0,
   0.0,
   7114.0,
   173.0,
   1112.0,
   6344.0],
  'G': [6160.0,
   152.0,
   12.0,
   0.0,
   1.0,
   70.0,
   3.0,
   960.0,
   7077.0,
   9.0,
   14.0,
   122.0,
   26.0,
   7143.0,
   3.0,
   4197.0,
   3084.0,
   457.0],
  'T': [1529.0,
   1353.0,
   4.0,
   7148.0,
   1660.0,
   0.0,
   3376.0,
   6181.0,
   90.0,
   22.0,
   291.0,
   69.0,
   6989.0,
   10.0,
   120.0,
   2569.0,
   3109.0,
   990.0]},
 'length': 18,
 'alphabet': 'ACGT',
 '_pseudocounts': {'A': 0.0, 'C': 0.0, 'G': 0.0, 'T': 0.0},
 '_background': {'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25},
 '_Motif__mask': (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

Get the count matrix using `.counts`

In [90]:
print(motif.counts)
print(motif.pwm)

        0      1      2      3      4      5      6      7      8      9     10     11     12     13     14     15     16     17
A: 1041.00  51.00 6693.00   3.00 6722.00 7143.00  38.00  42.00  11.00   9.00  51.00  28.00  37.00  22.00  28.00 1811.00 1651.00 640.00
C: 108.00 6741.00 1563.00   1.00   2.00   6.00 6451.00  11.00  80.00 7141.00 7066.00 7073.00 438.00   0.00 7114.00 173.00 1112.00 6344.00
G: 6160.00 152.00  12.00   0.00   1.00  70.00   3.00 960.00 7077.00   9.00  14.00 122.00  26.00 7143.00   3.00 4197.00 3084.00 457.00
T: 1529.00 1353.00   4.00 7148.00 1660.00   0.00 3376.00 6181.00  90.00  22.00 291.00  69.00 6989.00  10.00 120.00 2569.00 3109.00 990.00

        0      1      2      3      4      5      6      7      8      9     10     11     12     13     14     15     16     17
A:   0.12   0.01   0.81   0.00   0.80   0.99   0.00   0.01   0.00   0.00   0.01   0.00   0.00   0.00   0.00   0.21   0.18   0.08
C:   0.01   0.81   0.19   0.00   0.00   0.00   0.65   0.00   0.01  

In [85]:
print(motif.format("jaspar"))

>MA1722.2 ZSCAN31
A [1041.00  51.00 6693.00   3.00 6722.00 7143.00  38.00  42.00  11.00   9.00  51.00  28.00  37.00  22.00  28.00 1811.00 1651.00 640.00]
C [108.00 6741.00 1563.00   1.00   2.00   6.00 6451.00  11.00  80.00 7141.00 7066.00 7073.00 438.00   0.00 7114.00 173.00 1112.00 6344.00]
G [6160.00 152.00  12.00   0.00   1.00  70.00   3.00 960.00 7077.00   9.00  14.00 122.00  26.00 7143.00   3.00 4197.00 3084.00 457.00]
T [1529.00 1353.00   4.00 7148.00 1660.00   0.00 3376.00 6181.00  90.00  22.00 291.00  69.00 6989.00  10.00 120.00 2569.00 3109.00 990.00]



### Search motifs by TF name
You can use the `fetch_motifs_by_name` function to find motifs by TF name. This method returns a list of motifs for the same TF name across taxonomic group. For example, below search will return two CTCF motifs one in vertebrates and another in plants taxon.

In [75]:
motifs = jdb_obj.fetch_motifs_by_name("CTCF")

In [76]:
print(len(motifs))

4


In [77]:
print(motifs)

TF name	CTCF
Matrix ID	MA0531.2
Collection	CORE
TF class	['C2H2 zinc finger factors']
TF family	['More than 3 adjacent zinc fingers']
Species	7227
Taxonomic group	insects
Accession	['Q9VS55']
Data type used	ChIP-chip
Medline	17616980
Matrix:
        0      1      2      3      4      5      6      7      8      9
A: 257.00 1534.00 202.00 987.00   2.00   0.00   2.00 124.00   1.00  79.00
C: 714.00   1.00   0.00   0.00   4.00   0.00   0.00 1645.00   0.00 1514.00
G:  87.00 192.00 1700.00 912.00 311.00 1902.00 1652.00   3.00 1807.00   8.00
T: 844.00 175.00   0.00   3.00 1585.00   0.00 248.00 130.00  94.00 301.00



TF name	CTCF
Matrix ID	MA0139.2
Collection	CORE
TF class	['C2H2 zinc finger factors']
TF family	['More than 3 adjacent zinc fingers']
Species	9606
Taxonomic group	vertebrates
Accession	['P49711']
Data type used	ChIP-seq
Medline	17512414
Comments	TF has several motif variants.
Matrix:
        0      1      2      3      4      5      6      7      8      9     10     11     12    

### Search motifs with 
A more commonly used function is `fetch_motifs` helps you to get motifs which match a specified set of criteria.
You can query the database based on the available meta-information in the database.

For example, here we are gettting the widely used CORE collection for vertebrates. It returns a list of non-redundent motifs. 

In [78]:
motifs = jdb_obj.fetch_motifs(
collection = ['CORE'],
tax_group = ['Vertebrates'],
all_versions = False,
)

In [79]:
print(len(motifs))

879


In [80]:
for motif in motifs:
    #print(motif.matrix_id)
    pass # do something with the motif

Get the number of non-redundent motifs from CORE collection per-release.

In [81]:
for release in jdb_obj.get_releases():
    print(release)
    jdb_obj = jaspardb(release=release)
    motifs = jdb_obj.fetch_motifs(
        collection = ["CORE"],
        all_versions = False,
        #species = '10090' # this is the mouse tax ID
    )
    print(len(motifs))

JASPAR2024
2346
JASPAR2022
1956
JASPAR2020
1646
JASPAR2018
1404
JASPAR2016
1082
JASPAR2014
593
