This project provides an overview and implementation of the work described in "HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names".
This repository will enable a user to recreate the Large Language Model (LLM) based naming method, given they have all the prerequired materials. If a user has an established implementation of the zone-based naming method, they should be able to connect it to this repository, thus enabling them to run the zone-based and hybrid naming methodologies in addition to the LLM-based naming.
| Method | Description | (Pre)Requirements |
|---|---|---|
| LLM-Based | LLM-assisted full monomer name generation | Full monomer database, Azure OpenAI Configuration variables, OpenEye license server |
| Zone-Based | Monomer name suggestion provided by zone-based structural decomposition | Zone-based naming URL |
| Hybrid | Monomer name generated through combination of zone-based and LLM method, using zone-based namer first and then LLM to help generate unknown substituents | Substituent dictionary, Azure OpenAI Configuration variables, OpenEye license server |
# Create conda environment
conda env create -f environment.yaml
conda activate helmify
Create a .env file within the /helmify directory, and fill the variables in with your own values. Reference the table above for the minimum necessary requirements for each naming method. These values are referenced in config.py
# Required for Azure Openai implementation
OPENAI_PROVIDER="azure"
OPENAI_API_KEY="your_azure_openai_api_key"
OPENAI_API_ROOT="https://your_resource.openai.azure.com"
OPENAI_API_VERSION="api_version"
OPENAI_MODEL="model_version"
# Database configuration required for nearest neighbor search
# A CSV file with columns: 'complete_smiles','name', and 'symbol'. An example can be found in /sample_database
MONOMER_DATABASE="path_to_monomer_database.csv"
# A CSV file with columns: 'smiles', 'name', 'symbol'. An example can be found in /sample_database
SUBSTITUENT_DICTIONARY="path_to_substituent_dictionary.csv"
# OpenEye License Server - Required for SMILES to IUPAC name conversion and additional cheminformatics functionalities.
OE_LICENSE_SERVER="path_to_openeye_license_server"
# Zone based naming implementation
# This is a service endpoint that provides zone-based naming functionality. The implementation of this method is described in the work: HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names. The output of this method is a string representation with various parse-able outputs as described in zone_module.py
ZONE_URL="https://zone_based_method_endpoint"
Run the API using uvicorn HTTP server. Execute the following command through the terminal, while you are in the /helmify directory:
uvicorn main:api --env-file .envAccess the API through a web-browser. Copy the address specified on the last line of the terminal, after execution of the uvicorn command(you can try http://127.0.0.1:8000/helm-api-root), or you can use any API testing tool (e.g. Python requests, curl, or Postman)
See the HELMify Demo Notebook for a comprehensive tutorial on how to call the API programmatically.