# Protein Solubility Prediction: A Walkthrough

This tutorial will guide you through the process of wrapping a protein solubility prediction model and testing it with the OpenAD Toolkit.

## 1. Setup

First, let's install the necessary libraries. This includes `openad-service-utils` for wrapping the model, `openad-toolkit` for testing, `torch` for the model, and the specific dependencies for the protein solubility model.

In [None]:
%pip install openad-service-utils openad git+https://github.com/BiomedSciAI/biomed-multi-alignment.git

## 2. Model Wrapping

Now, we will create a Python script to wrap the protein solubility model. We will write this file directly from the notebook.

In [None]:
%%writefile protein_solubility_implementation.py


from openad_service_utils import SimplePredictor, PredictorTypes, PropertyInfo, DomainSubmodule
from typing import Optional, List, Any
from pydantic.v1 import Field
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp
from mammal.examples.protein_solubility.task import ProteinSolubilityTask
from mammal.keys import CLS_PRED, SCORES
from mammal.model import Mammal

class ProteinSolubility(SimplePredictor):
    domain: DomainSubmodule = DomainSubmodule("properties")
    algorithm_name: str = "mammal"
    algorithm_application: str = "protein_solubility"
    algorithm_version: str = "v0"
    property_type: PredictorTypes = PredictorTypes.PROTEIN

    def setup(self):
        self.model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.protein_solubility")
        self.model.eval()
        self.tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.protein_solubility")

    def predict(self, sample: Any):
        sample_dict = {"protein_seq": sample}
        sample_dict = ProteinSolubilityTask.data_preprocessing(
            sample_dict=sample_dict,
            protein_sequence_key="protein_seq",
            tokenizer_op=self.tokenizer_op,
            device=self.model.device,
        )
        batch_dict = self.model.generate(
            [sample_dict],
            output_scores=True,
            return_dict_in_generate=True,
            max_new_tokens=5,
        )
        result = ProteinSolubilityTask.process_model_output(
            tokenizer_op=self.tokenizer_op,
            decoder_output=batch_dict[CLS_PRED][0],
            decoder_output_scores=batch_dict[SCORES][0],
        )
        return result

# Register model to serve
# This is a no_model=True registration, meaning it will not use the aws model registry.
ProteinSolubility.register(no_model=True)

if __name__ == "__main__":
    from openad_service_utils import start_server
    start_server()


## 3. Running the Service

Now, open a new terminal and run the following command to start the service:

```bash
python tutorials/protein_solubility_implementation.py
```

## 4. Testing the Service with OpenAD Toolkit

Once the service is running, we can test it using the OpenAD Toolkit's magic commands.

In [None]:
!init_magic

In [None]:
%openad catalog model service from remote 'http://localhost:8081' as 'protsol'

In [None]:
%openad protsol ?

In [None]:
%openad protsol get protein property protein_solubility FOR 'MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK'

In [None]:
%openad uncatalog model service 'protsol'