# BioLMTox:Toxin Classification and Similarity
 Enhance BioSecurity with the BioLMTox [classification endpoint](https://api.biolm.ai/#8616fff6-33c4-416b-9557-429da180ef92) and [embedding endpoint](https://api.biolm.ai/#723bb851-3fa0-40fa-b4eb-f56b16d954f5)

In [None]:
from helpers import api_caller
import pandas as pd

TOKEN = 'f4352171d6c93b1b1cf8c9ead1d20c60210a8e3e67c383e1f458824381a1d19b'
df = pd.read_csv('data/protein/data/PLA2.csv')
seq_1 = df.sequence.iloc[0]
seq_2 = df.sequence.iloc[1]

In [None]:
print("Sequence length 1: {}".format(len(seq_1)))
print("Sequence length 2: {}".format(len(seq_2)))

## Define Endpoint Params

In [None]:
SLUG = 'biolmtox_v1'  # Model endpoint to hit on BioLM.ai

# Follow the link to the docs for the endpoint above to see this definition
data = {
  "instances": [{
    "data": {"text": seq_1}
  }]
}

Let's make a secure REST API request to BioLM API to quickly make the prediction on GPU.

In [None]:
import time

s = time.time()  # Start time

result = await api_caller(
    SLUG,
    'predict',
    data,
    TOKEN
)

e = time.time()  # End time
d = e - s  # Duration

print(f'Response time: {d:.4}s')

In [None]:
from IPython.display import JSON

JSON(result)

There are keys for each input instance containing:

 * 'label', the predicted class label either 'toxin' or 'not toxin'
 * `score`, the model score for the outputed label, the closer to one the more confident the model is in its predction

In [None]:
# FOR IN-BROWSER JUPYTER-LITE ONLY #
import micropip  # Install with `pip install` if running notebook elsewhere
await micropip.install('levenshtein')
await micropip.install('numpy')

In [None]:
# Define similarity measures
from Levenshtein import ratio as levenshtein_ratio
def cos_similarity(a, b):
    return np.dot(a,b)/(norm(a)*norm(b))

In [None]:
data2 = {
  "instances": [{
    "data": {"text": seq_1}
  },
  {
    "data": {"text": seq_2}
  }]
}

In [None]:
result = await api_caller(
    SLUG,
    'transform',
    data2,
    TOKEN
)

In [None]:
# convert sequence embeddings to numpy arrays
em_1 = np.asarray(result["predictions"][0])
em_2 = np.asarray(result["predictions"][0])

In [None]:
# compute similarity measures
em_similarity = cos_similarity(em_1, em_2)
seq_similarity = levenshtein_ratio(seq_1, seq_2)
print(f'sequence embedding cosine similarity:\n{em_similarity}')
print(f'sequence Levenshtein ratio:\n{seq_similarity}')