Topological Data Analysis for OOD Detection using BERT

This repository contains the code for our research paper "" (TODO: Add paper name and link)

Getting Started

Create and activate a new virtual environment for the project and install all the necessary packages using the provided requirements file.

conda create -n tda_ood python==3.9
conda activate tda_ood
pip install -r requirements.txt

Clone this directory, and set up two new directories:

Model directory - this should contain a subdirectory: fine_tuned_models
Output directory - this is where results and intermediate files will be stored

Note: The locations of both directories should be specified in config.py. If not, they will be auto-created in the parent directory of the repository root.

Both directories can be located anywhere but should be specified in config.py. If not specified, the directories will be created one directory above the root. if you wish to use the fine-tuned model used in the paper, download and extract the contents from Figshare https://figshare.com/s/89d2329654fd7bfedbb2 to the fine_tuned_models subdirectory.

Running experiments

Configuration:

Adjust your experimental parameters in config.py or use a specific config JSON file located in the config_files directory.

Default Run

To run experiments with default configurations from 'config.py', execute: python run_tda.py

Custom run configuration

To specify a particular configuration from the config_files directory: python run_tda.py --config= <confif_filename>.json

The JSON file should contain the custom experiment configurations with the following keys:

"base": directory paths and basic configurations, for example:
- MODEL_DIR : Path to the Model directory
- RESULTS_DIR : Path to the Output directory
- load_model_outputs_from_dir : If true, persistence diagrams will be generated from saved model outputs (attentions and sentence embeddings) from the experiment_loaddir_name subdirectory
- load_diagrams_from_dir : If true, topological features will be calculated from the saved diagrams from the experiment_loaddir_name subdirectory
"model": model specs
- load_from_dir: If true, fine-tuned model parameters will be loaded. Directory can be specified in model_path
"training": fine-tuning model configurations
- do_train: If true, BERT model will be fine-tuned with in-distribution data with specified hyper-parameters (e.g. num_train_epochs, batch_size, etc.)
"dataset": dictionary (for id_dataset) or list of dictionaries (for ood_datasets) with configurations to load datasets from HuggingFace. Specifications include:
- name: name of the HuggingFace dataset
- number of samples to load as train_size, val_size and test_size
- labels_subset (Optional): list of labels to load
"tda": configurations to run Topological Data Analysis, for example:
- do_id_tda : If true, topological features will be calculated for the in-distribution dataset
- graph_type : 'undirected' or 'directed'
- symmetry: 'mean' or 'max' attention for undirected graphs
- homology_dims: list of dimensions (e.g. [0,1,2])
- amplitude_metrics: list of amplitude metrics supported by gta-tda library

Citation

TODO: Add citations once paper is uploaded

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BERT		BERT
MNIST		MNIST
config_files		config_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configs.py		configs.py
data_loader.py		data_loader.py
evaluate.py		evaluate.py
evaluation.py		evaluation.py
experiments.sh		experiments.sh
model.py		model.py
model_utils.py		model_utils.py
ood_detector.py		ood_detector.py
requirements.txt		requirements.txt
run_tda.py		run_tda.py
test.py		test.py
topological_feature_extractor.py		topological_feature_extractor.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topological Data Analysis for OOD Detection using BERT

Getting Started

Running experiments

Configuration:

Default Run

Custom run configuration

Citation

About

Releases

Packages

Contributors 2

Languages

License

andrespollano/neural_nets-tda

Folders and files

Latest commit

History

Repository files navigation

Topological Data Analysis for OOD Detection using BERT

Getting Started

Running experiments

Configuration:

Default Run

Custom run configuration

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages