SelfPAD:

Author: Talip Ucar (ucabtuc@gmail.com)

The official implementation of Improving Antibody Humanness Prediction using Patent Data

Model

Pre-training	Fine-tuning

Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

pip install pipenv             # To install pipenv if you don't have it already
pipenv install --skip-lock     # To install required packages. 
pipenv shell                   # To activate virtual env

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".

Configuration

There are two types of configuration files:

1. pad.yaml         # Defines parameters and options for pre-training
2. humanness.yaml   # Defines parameters and options for fine-training

Training and Evaluation

You can train and evaluate the model by using:

python selfpad_pretrain.py        # For pre-training
python selfpad_finetune.py        # For fine-tuning it for humanness
python selfpad_eval.py -ev test    # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

Structure of the repo

- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py

- src
    |-selfpad.py
    |-selfpad_humanness.py

- config
    |-pad.yaml
    |-humanness.yaml
    
- utils_common
    |-arguments.py
    |-utils.py
    |-tokenizer.py
    ...
    
- utils_pretrain
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- utils_finetune
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- data
    |-test.csv
    ...
    
- results
    |-pretraining
    |-humanness
    ...

Results

Results at the end of training is saved under ./results directory. Results directory structure is as following:

- results
    |-task e.g. humanness, or pretraining
            |-evaluation
                |-clusters (for plotting t-SNE and PCA plots of embeddings)
            |-training
                |-model
                |-plots
                |-loss

You can save results of evaluations under "evaluation" folder.

Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging

Citing the paper

@article{ucar2024SelfPAD,
  title={Improving Antibody Humanness Prediction using Patent Data},
  author={Ucar, Talip and 
          Ramon, Aubin and 
          Oglic, Dino and 
          Croasdale-Wood, Rebecca and 
          Diethe, Tom and 
          Sormanni, Pietro},
  journal={arXiv preprint arXiv:2110.04361},
  year={2024}
}

Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

@Misc{talip_ucar_2024_SelfPAD,
  author =   {Talip Ucar},
  title =    {{Improving Antibody Humanness Prediction using Patent Data}},
  howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
  month        = January,
  year = {since 2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
config		config
data		data
src		src
utils_common		utils_common
utils_finetune		utils_finetune
utils_pretrain		utils_pretrain
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
requirements.txt		requirements.txt
selfpad_eval.py		selfpad_eval.py
selfpad_finetune.py		selfpad_finetune.py
selfpad_pretrain.py		selfpad_pretrain.py

License

AstraZeneca/SelfPAD

Folders and files

Latest commit

History

Repository files navigation

SelfPAD:

Author: Talip Ucar (ucabtuc@gmail.com)

Table of Contents:

Model

Environment

Configuration

Training and Evaluation

Structure of the repo

Results

Experiment tracking

Citing the paper

Citing this repo

About

Topics

Resources

License

Stars

Watchers

Forks

Languages