# VITAL Framework

The VITAL framework creates 3 text corpuses, where a text corpus describes every relevant research paper and document extracted from specific Medical Subject Headers (MeSH) terms.
The following text corpuses will be used for our experiment:
1. 6 heart diseases + subtypes of HF (HFpEF and HFrEF): CM, IHD, ARR, VD, CVA, CHD, HFpEF, HFrEF
2. Comorbidities: Aging, Hypertension, Hyperlipidemia, Diabetes/Obesity
3. Mechanisms: Mitochondria, Inflammation, Fibrosis

For each text corpuses, we will have unique categories.txt files and textcube_config files.

In [None]:
import shutil
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import sys
#this may not be an exhaustive list, you may need to add additional packages 

## <u>Part A: Running each Text Corpus<u>

The example below has guidance for running the first text corpus of the 6 cardiovascular diseases (CVDs) with HFpEF and HFrEF. After running the following steps below, you will need to repeat this twice more to generate the 3 text corpuses total. 
1. Changing the input categories and textcube_config file each time for corpus1, corpus2, and corpus3.
2. Changing the name of the output directory for the files we will save out.

### Step 6: Creating our Text Cube
The following files need to be edited to contain the categories that you are running your experiment with. Below are the categories for the HFpEF use-case experiment for the VITAL framework.
- input_file_root_cat = './input/corpus1categories.txt' #corpus2, corpus3 for those respective runs
- input_file_textcube_config = './config/corpus1textcube_config.json' #corpus2, corpus3 for those respective runs

In [None]:
!python run_textcube.py

### Step 7: Running Entity Count
We are using the list of 20,000 human proteins from the Uniprot website as the entity list for each text corpus.

https://www.uniprot.org/uniprotkb?query=*&facets=reviewed%3Atrue%2Cmodel_organism%3A9606

We need to change the file at this location: 
- input_file_user_entity_list = "./input/entities.txt"

In [None]:
!python run_entitycount.py

### Step 8: Running Metadata Update
This step updates the data structures we will be using to store our data.

In [None]:
!python run_metadata_update.py

### Step 9: Calculating CaseOLAP Score

In [None]:
!python run_caseolap_score.py

### Save out relevant output files per text corpus

In [None]:
f_names = ['./config/textcube_config.json', './data/textcube_pmid2cell.json', './data/textcube_cell2pmid.json', './result/cellpmids.json', './result/unique_proteins.json', './result/caseolap.csv', './input/entities.txt', './input/categories.txt', './result/result_stat.json', './log/textcube_log.txt']
destination_directory = "./CVDs" 
#destination_directory = "./Comorbidities"
#destination_directory = "./Mechanisms" 

for f in f_names:
    shutil.copy(f, destination_directory)
    print(f"File '{f}' copied to '{destination_directory}'")

Run the following code above 3 times for each of the 3 corpuses for part A. 

## <u>Part B: Creating the Diagrams<u>