# Example: Pathway scoring

## Part I: Gene scoring

In case you have not downloaded and imported a reference panel yet, open a terminal and execute in the PascalX/misc folder:

```bash get1KGGRCh38.sh /yourfolder/ EUR```

This command will download and plink convert 1KG project data for the european subpolulation. The data will be stored in ```/yourfolder/```. 

NOTE: The refpanel in the ```demo/``` folder is for chr 1 only and simulated !

#### Load the gene scorer:

In [1]:
from PascalX import genescorer

Gscorer = genescorer.chi2sum(window=50000,varcutoff=0.99)

#### Load the reference panel into the genescorer:

In [2]:
Gscorer.load_refpanel('../demo/EUR.simulated',parallel=1)
    

Reference panel data not imported. Trying to import...
ERROR:  ../demo/EUR.simulated.chr2.(tped|vcf).gz not found


The first time this command is executed for a reference panel, an internal SNP database will be generated on disk. This process may take several hours. You can use the ```parallel=``` option to speed up via parallelization. Subsequent calls of this method will be very fast.

#### Load a gene annotation:

If you do not have a gene annotation yet, you can download automatically from BioMart via

In [3]:
from PascalX.genome import genome

G = genome()
G.get_ensembl_annotation('biomart_GRCh38.tsv')

Downloading gene annotation from ensembl.org BioMart [ protein_coding ] ( GRCh38 )


The annotation will be saved in the file ```biomart_GRCh38.tsv```. 

You still need to load the annotation into the genescorer as follows:

In [3]:
Gscorer.load_genome('biomart_GRCh38.tsv')

18498 active genes


#### Load a GWAS:

In [4]:
Gscorer.load_GWAS("../demo/gwasA.tsv.gz",rscol=0,pcol=4,header=False)


331769 SNPs loaded


You can either load a raw text file or gzip compressed with file ending .gz

#### Start the scoring:

In [5]:
RS = Gscorer.score_all(parallel=1,nobar=False)

 

  0%|           [ estimated time left: ? ]

2010 genes scored
16488 genes can not be scored (check annotation)


Use the ```parallel=``` option to increase the number of cpu cores to use (make sure that you have sufficient memory). Note that you can switch off the progress bar via setting ```nobar=True```.

## Part II: Pathway scoring

In [6]:
from PascalX import pathway

#### Load a pathway scorer:

In [7]:
Pscorer = pathway.chi2rank(Gscorer)

Note that ```Scorer``` has to be a fully initialized genescorer, see part I above.

#### Load modules / pathways to score:

In [9]:
M = Pscorer.load_modules("../demo/pw_test.tsv",ncol=0,fcol=2)


1 modules loaded


```ncol=``` has to be set to the column of the tab separated file containing the name of the module and ```fcol=``` the first column with a gene symbol. 

In [10]:
R = Pscorer.score(M,parallel=1)

Scoring 2 missing (meta)-genes
 

  0%|           [ estimated time left: ? ]

2 genes scored


#### List significant pathways:

In [13]:
Pscorer.get_sigpathways(R,cutoff=1e-4)

0 PATHWAY_X | 6.219056261820743e-05
