# Example: Pathway scoring

## Part I: Gene scoring

In case you have not downloaded and imported a reference panel yet, open a terminal and execute in the PascalX/misc folder:

```bash get1KGGRCh38.sh /yourfolder/ EUR```

This command will download and plink convert 1KG project data for the european subpolulation. The data will be stored in ```/yourfolder/```. 

#### Load the gene scorer:

In [1]:
from PascalX import genescorer

Gscorer = genescorer.chi2sum(window=50000,varcutoff=0.95)

#### Load the reference panel into the genescorer:

In [2]:
Gscorer.load_refpanel('/yourfolder/EUR.1KGphase3.GRCh38',parallel=1)
    

The first time this command is executed for a reference panel, an internal SNP database will be generated on disk. This process may take several hours. You can use the ```parallel=``` option to speed up via parallelization. Subsequent calls of this method will be very fast.

#### Load a gene annotation:

If you do not have a gene annotation yet, you can download automatically from BioMart via

In [3]:
from PascalX.genome import genome

G = genome()
G.get_ensembl_annotation('biomart_GRCh38.tsv')

Downloading gene annotation from ensembl.org BioMart [ protein_coding ] ( GRCh38 )


The annotation will be saved in the file ```biomart_GRCh38.tsv```. 

You still need to load the annotation into the genescorer as follows:

In [4]:
Gscorer.load_genome('biomart_GRCh38.tsv')

19024 active genes


#### Load a GWAS:

In [5]:
Gscorer.load_GWAS("path/gwasfilename",rscol=0,pcol=1,header=False)


32706 SNPs loaded


You can either load a raw text file or gzip compressed with file ending .gz

#### Start the scoring:

In [6]:
RS = Gscorer.score_all(parallel=1,nobar=False)

[chr1] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr2] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr3] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr4] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr5] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr6] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr7] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr8] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr9] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr10] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr11] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr12] (done          ): 100%|██████████ [ estimated time left: 00:00 ]   
[chr13] (done          ): 100%|██████████ [ estimated time left: 00:00 ] 
[chr14] (done          

Use the ```parallel=``` option to increase the number of cpu cores to use (make sure that you have sufficient memory). Note that for ```parallel!=1``` it is recommended to switch off the progress bar via setting ```nobar=True```

## Part II: Pathway scoring

In [7]:
from PascalX import pathway

#### Load a pathway scorer:

In [8]:
Pscorer = pathway.chi2rank(Gscorer)

Note that ```Scorer``` has to be a fully initialized genescorer, see part I above.

#### Load modules / pathways to score:

In [9]:
M = Pscorer.load_modules("filename.tsv",ncol=0,fcol=2)


2400 modules loaded


```ncol=``` has to be set to the column of the tab separated file containing the name of the module and ```fcol=``` the first column with a gene symbol. 

In [10]:
R = Pscorer.score(M,parallel=1)

Scoring 7423 missing (meta)-genes
(done          ): 100%|██████████ [ estimated time left: 00:00 ]                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
1400 genes scored
6023 genes can not be scored (check annotation)


#### List significant pathways:

In [16]:
Pscorer.get_sigpathways(R,cutoff=1e-5)

1403 Neuronal System | 1.4933631349374102e-09
1920 Signal Transduction | 4.794817236310414e-06
2252 Transmission across Chemical Synapses | 2.5467482889240234e-07
