Discovery Science 2023: Gene Interactions in Survival Data Analysis: A Data-driven Approach Using Restricted Mean Survival Time and Literature Mining
This repository has all the scripts and supporting data we used to analyze and generate the figures. The TCGA datasets are too big to store them here. To reproduce the results, please follow this guide to download the datasets and store them in the data
folder.
Scripts for calculating interactions and permutations (note that this takes a considerable amount of time and compute resources).
To analyze our results, they are available in computed_interactions. We have a separate .csv containing results for all interaction types for each dataset (note that permutation tests are not included because of the size limit).
Notebooks folder contains notebooks used to generate the figures and different types of analysis presented in the paper. If you have trouble running the notebooks, feel free to contact us (via email or repository issue tracker).
Finally, the implementation of the method to calculate interactions is located in method folder.
Dataset | Additive (+) | Competing (-) | XOR (*) |
---|---|---|---|
METABRIC | gpt3-summary | gpt3-summary | gpt3-summary |
BLCA | gpt3-summary | gpt3-summary | gpt3-summary |
BRCA | gpt3-summary | gpt3-summary | gpt3-summary |
CESC | gpt3-summary | gpt3-summary | gpt3-summary |
COAD | gpt3-summary | gpt3-summary | gpt3-summary |
GBM | gpt3-summary | gpt3-summary | gpt3-summary |
HNSC | gpt3-summary | gpt3-summary gpt4-summary |
gpt3-summary |
KIRC | gpt3-summary | gpt3-summary gpt4-summary |
gpt3-summary |
KIRP | gpt3-summary | gpt3-summary | gpt3-summary |
LAML | gpt3-summary | gpt3-summary | gpt3-summary |
LGG | gpt3-summary | gpt3-summary | gpt3-summary |
LIHC | gpt3-summary | gpt3-summary | gpt3-summary |
LUAD | gpt3-summary | gpt3-summary | gpt3-summary |
LUSC | gpt3-summary | gpt3-summary | gpt3-summary |
OV | gpt3-summary | gpt3-summary | gpt3-summary |
PRAD | gpt3-summary | gpt3-summary | gpt3-summary |
READ | gpt3-summary | gpt3-summary | gpt3-summary |
SKCM | gpt3-summary | gpt3-summary | gpt3-summary |
STAD | gpt3-summary | gpt3-summary | gpt3-summary |
THCA | gpt3-summary | gpt3-summary | gpt3-summary |
UCEC | gpt3-summary | gpt3-summary | gpt3-summary |
We used the following prompt:
You are a helpful domain expert with a background in biology. You know the biology of each genes known in the literature.
Cancer type: TCGA-{cancer_type}
Genes: {gene1} and {gene2}.
BioGRID protein interaction network; shortest path between {gene1} and {gene2}: {paths} .
Context: {context}
Based on what you know about these two genes and provided context. Describe briefly what specifically these genes do.
Can you reason about any possible functional associations between these two genes in specific biological terms?
Use context and your knowledge about biology to answer the question. Be specific in the processes where these genes are involved.
Be concise. Answer in 2-3 short sentences. Start with possible functional associations.
"""