This repository contains the code for our research paper "" (TODO: Add paper name and link)
Create and activate a new virtual environment for the project and install all the necessary packages using the provided requirements file.
conda create -n tda_ood python==3.9
conda activate tda_ood
pip install -r requirements.txt
Clone this directory, and set up two new directories:
- Model directory - this should contain a subdirectory:
fine_tuned_models
- Output directory - this is where results and intermediate files will be stored
Note: The locations of both directories should be specified in config.py
. If not, they will be auto-created in the parent directory of the repository root.
Both directories can be located anywhere but should be specified in config.py
. If not specified, the directories will be created one directory above the root.
if you wish to use the fine-tuned model used in the paper, download and extract the contents from Figshare https://figshare.com/s/89d2329654fd7bfedbb2 to the fine_tuned_models
subdirectory.
Adjust your experimental parameters in config.py
or use a specific config JSON file located in the config_files
directory.
To run experiments with default configurations from 'config.py', execute:
python run_tda.py
To specify a particular configuration from the config_files directory:
python run_tda.py --config= <confif_filename>.json
The JSON file should contain the custom experiment configurations with the following keys:
- "base": directory paths and basic configurations, for example:
MODEL_DIR
: Path to the Model directoryRESULTS_DIR
: Path to the Output directoryload_model_outputs_from_dir
: If true, persistence diagrams will be generated from saved model outputs (attentions and sentence embeddings) from theexperiment_loaddir_name
subdirectoryload_diagrams_from_dir
: If true, topological features will be calculated from the saved diagrams from theexperiment_loaddir_name
subdirectory
- "model": model specs
load_from_dir
: If true, fine-tuned model parameters will be loaded. Directory can be specified inmodel_path
- "training": fine-tuning model configurations
do_train
: If true, BERT model will be fine-tuned with in-distribution data with specified hyper-parameters (e.g.num_train_epochs
,batch_size
, etc.)
- "dataset": dictionary (for
id_dataset
) or list of dictionaries (forood_datasets
) with configurations to load datasets from HuggingFace. Specifications include:name
: name of the HuggingFace dataset- number of samples to load as
train_size
,val_size
andtest_size
labels_subset
(Optional): list of labels to load
- "tda": configurations to run Topological Data Analysis, for example:
do_id_tda
: If true, topological features will be calculated for the in-distribution datasetgraph_type
: 'undirected' or 'directed'symmetry
: 'mean' or 'max' attention for undirected graphshomology_dims
: list of dimensions (e.g. [0,1,2])amplitude_metrics
: list of amplitude metrics supported by gta-tda library
TODO: Add citations once paper is uploaded