# BioLMTox:Toxin Similarity
 Enhance BioSecurity with the BioLMTox [embedding endpoint](https://api.biolm.ai/#723bb851-3fa0-40fa-b4eb-f56b16d954f5)

## Set your API Token

To use the BioLM API, you need an API token. You can get one from the [User API Tokens](https://biolm.ai/ui/accounts/user-api-tokens/) page.

Paste the API token in the cell below, as the value of the variable `BIOLMAI_TOKEN`.

In [1]:
BIOLMAI_TOKEN = " "  # !!! YOUR API TOKEN HERE !!!

## Make API Request

There is already a server on BioLM with ESMFold loaded into memory, so predictions should be fast. Let's import the `requests` library.

In [9]:
from IPython.display import JSON  # Helpful UI for JSON display

try:
    # Install packages to make API requests in JLite
    import micropip
    await micropip.install('requests')
    await micropip.install('pyodide-http')
    await micropip.install('numpy')
    # Patch requests for in-browser support
    import pyodide_http
    pyodide_http.patch_all()
except ModuleNotFoundError:
    pass  # Won't be using micropip outside of JLite

import requests  # Will use to make calls to BioLM.ai
import csv  # To read example data
import numpy as np
from numpy.linalg import norm

In [3]:
lines = []
with open('data/protein/data/PLA2.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        lines.append(row)
print(lines)

[['label', 'sequence'], ['toxin', 'MHPAHLLVLLAVCVSLLGASDIPPLPLNLAQFGFMIRCANGGSRSPLDYTDYGCYCGKGGRGTPVDDLDRCCQVHDECYGEAEKRLGCSPFVTLYSWKCYGKAPSCNTKTDCQRFVCNCDAKAAECFARSPYQKKNWNINTKARCK'], ['toxin', 'MRTLWIMAVLLVGVEGSLVELGKMILQETGKNPVTSYGAYGCNCGVLGRGKPKDATDRCCYVHKCCYKKLTDCNPKKDRYSYSWKDKTIVCGENNSCLKELCECDKAVAICLRENLDTYNKKYKNNYLKPFCKKADPC']]


Load the example toxin sequences from the CSV file

In [4]:
SEQ1 = lines[1][1]
SEQ2 = lines[2][1]

In [5]:
print("Sequence length 1: {}".format(len(SEQ1)))
print("Sequence length 2: {}".format(len(SEQ2)))

Sequence length 1: 146
Sequence length 2: 138


SEQ1 is https://www.uniprot.org/uniprotkb/Q45Z47/entry and SEQ2 is https://www.uniprot.org/uniprotkb/P82114/entry. Both are snake venoms

## Define Endpoint Params

In [6]:
SLUG = "biolmtox_v1"  # Model endpoint to hit on BioLM.ai
ACTION = "transform"
# Follow the link to the docs for the endpoint above to see this definition
data = {
  "instances": [{
    "data": {"text": SEQ1}
  },
  {
    "data": {"text": SEQ2}
  }]
}

In [7]:
url = f"https://biolm.ai/api/v1/models/{SLUG}/{ACTION}/"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Token {BIOLMAI_TOKEN.strip()}",
}

Let's make a secure REST API request to BioLM API to quickly make the prediction on GPU.

In [8]:
# Make the request - let's time it!
import time

s = time.time()  # Start time
response = requests.post(
    url=url,
    headers=headers,
    json=data,
)

e = time.time()  # End time
d = e - s  # Duration

print(f'Response time: {d:.4}s')

result = response.json()

# If you wish to view the full result, you can expand the tree in the cell below
JSON(result)

Response time: 0.298s


<IPython.core.display.JSON object>

SEQ1 is https://www.uniprot.org/uniprotkb/Q45Z47/entry and SEQ2 is https://www.uniprot.org/uniprotkb/P82114/entry. Both are snake venoms

The response is a list of embedding vectors, one vector of static length 640 for each input instance

In [10]:
# Define similarity measure
def cos_similarity(a, b):
    return np.dot(a,b)/(norm(a)*norm(b))

In [11]:
# convert sequence embeddings to numpy arrays
em_1 = np.asarray(result["predictions"][0])
em_2 = np.asarray(result["predictions"][1])

In [12]:
# compute similarity measures
em_similarity = cos_similarity(em_1, em_2)
print(f'sequence embedding cosine similarity:\n{em_similarity}')

sequence embedding cosine similarity:
0.952454195675824


The cosine similarity between the two toxin sequences is quite high as expected since one sequence is Phospholipase A2 OS2 and the other is Basic phospholipase A2 homolog MjTX-I. Both are related to Phospholipases and are snake venoms. 