# Run PrismEXP

PrismEXP loads ARCHS4 data to predict novel gene annotations. There are 4 steps in order to create a PrismEXP prediction:

- 

# Installing PrismEXP

Install PrismEXP directly form GitHub. 
```
pip install git+https://github.com/MaayanLab/prismexp.git
```

# Run PrismEXP using the Python package

Running PrismEXP on your own computer is possible by using the PrismEXP package. There are some significant hardware requirements:

- Sufficient hard disk space. Each correlation matrix can take up to 3GB. So when choosing number of clusters make sure there is enough space for them.
- PrismEXP is fairly memory hungry since it is working with large correlation matrices. Ideally the system has 64GB. In some instances 32GB might be sufficent to run.
- Compute time is significant, depending on the number of clusters chosen and number of available threads. PrismEXP relies heavily on multithreading and benefits from multiple cores.

In [None]:
import urllib.request
import prismx as px

urllib.request.urlretrieve("https://s3.dev.maayanlab.cloud/archs4/archs4_gene_human_v2.1.2.h5", "human_matrix.h5")

work_dir = "/home/maayanlab/code/prismexp"
h5_file = "human_matrix.h5"
gmt_file = px.load_library("GO_Biological_Process_2021")

cluster_number = 100

px.create_correlation_matrices(work_dir, h5_file, cluster_count=cluster_number, verbose=True)
px.features(work_dir, gmt_file, threads=4, verbose=True)
px.train(work_dir, gmt_file, verbose=True)
px.predict(work_dir, gmt_file, verbose=True)

# Reusing trained model

The correlation matrices only have to be computed once. Also the model can be reused for other gene set libraries than it was trained on. The performance of the model might be higher when trained and used to predict on the same library. Overfitting is not really possible due to the nature of the machine learning approach of PrismEXP.

To run the PrismEXP model on preexisting correlation matrices:
1. Run `px.features()` function
- builds feature matrices (sample number as clusters)
2. Run `px.predict()` function
- if not otherwise specified will use the existing model in the work directory

To reuse the existing model only the features need to be rebuilt and the prediction applied for each new gene set library.

In [None]:
import urllib.request
import prismx as px

work_dir = "/home/maayanlab/code/prismexp"
gmt_file = px.load_library("KEGG_2021_Human")

px.features(work_dir, gmt_file, threads=4, verbose=True)
px.predict(work_dir, gmt_file, verbose=True)