BILN Language Model for describing modified and non-modified peptides
You will need to install the following packages:
pip install transformers[torch] datasets tokenizers mapchiral molfeat rdkit scipy scikit-learn tqdm optuna typer tensorboard lightgbm xgboost IPython hestia-oodpip install git+https://github.com/Boehringer-Ingelheim/pyPept.git
pip install git+https://github.com/novonordisk-research/pepfunn.git
pip install SmilesPE omegaconf mlflow
conda install dgl -c conda-forgeExecute the download_data.py script to download both the pretraining and benchmarking datasets. data_dir_path refers to the directory where you want
to save the files.
Both collections:
python code/download_data.py data_dir_path Only the pretraining data:
python code/download_data.py data_dir_path --collection pretrainingOnly the downstream data:
python code/download_data.py data_dir_path --collection pretrainingExecute the train.py script. log_dir refers to the directory where the training logs will be saved. --overwrite flag can be used if you want to overwrite the log_dir.
python code/run_hpo.py log_dir `data_dir_path`To evaluate a pretrained model, execute the fingerprint_evaluation.py.
python code/fingerprint_evaluation.py data_dir_path BILN-LM:log_dir