# Protein Language Models (PLM) Batch Queries

This notebook demonstrates how to use KBPLMUtils for batch querying protein language models.

## Overview

KBPLMUtils provides:
- Batch querying of protein sequences against PLM models
- Efficient processing of multiple sequences
- Integration with KBase protein data

## 1. Setup

In [None]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")

## 2. Initialize KBPLMUtils

In [None]:
from kbutillib import KBPLMUtils, SharedEnvUtils

# Load KBase token
env_util = SharedEnvUtils()
token = env_util.get_token('kbase')

if token:
    util = KBPLMUtils(token=token)
    print("KBPLMUtils initialized successfully!")
else:
    print("Warning: No KBase token found!")
    print("Set token: env_util.set_token('your_token', 'kbase')")

## 3. Prepare Protein Sequences

Create a list of protein sequences for batch query:

In [None]:
# Example protein sequences
sequences = [
    {"id": "protein1", "sequence": "MKTAYIAKQRQISFVKSHFSRQ"},
    {"id": "protein2", "sequence": "MNKFVLVGALCTLAAEVFLTK"},
    {"id": "protein3", "sequence": "MKKLVLSLSLVLAFSSATAAF"}
]

print(f"Prepared {len(sequences)} protein sequences for analysis")
for seq in sequences:
    print(f"  {seq['id']}: {len(seq['sequence'])} amino acids")

## 4. Batch Query PLM

Query the protein language model with batch of sequences:

In [None]:
# Example: Batch query (requires valid KBase token and PLM service)
'''
results = util.batch_query_plm(
    sequences=sequences,
    model="esm2",  # or other available model
    batch_size=10
)

print(f"Received {len(results)} results")
for result in results:
    print(f"  {result['id']}: embeddings shape {result['embeddings'].shape}")
'''

print("Example: Batch PLM query")
print("Uncomment with valid token to run analysis")

## Summary

KBPLMUtils enables:
- **Batch processing** - Efficient handling of multiple proteins
- **PLM integration** - Access to protein language models
- **KBase integration** - Works with KBase protein data

### Next Steps
- Query your protein sequences
- Use embeddings for downstream analysis
- Integrate with protein function prediction