# BentoML
- 2019년 출시, 개발사 BentoML.ai(Atalaya Inc.)
- 목적
  - 효율적인 다양한 ML 모델 아키텍처의 빌드, 파이프라인 관리, 서빙
- 특징
  - 폭 넓은 ML 프레임 워크 지원, 손쉬운 파이프라인 정의, Transformer-friendly


### vs TorchServe
- 다양한 프레임워크 지원 (PyTorch, Tensorflow, ...)
- 컨테이너 배포 쉽게 되어 있다
- 기본 최적화 지원 (TorchServe는 고급 최적화)
- 학습 곡선 쉬움

### Initial

!pip install datasets fastapi==0.103.0 kaleido bentoml==1.1.11

In [None]:
import bentoml
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
import seaborn as sns

from torch.utils.data import DataLoader
from transformers import BatchEncoding, BertTokenizer, BertForSequenceClassification, AdamW, pipeline
from sklearn.metrics import confusion_matrix
from datasets import load_dataset
from tqdm import tqdm
from typing import TypedDict

### 06 모델 학습 및 평가 과정

In [None]:
dataset = load_dataset("ag_news")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=4)
optimizer = AdamW(model.parameters(), lr=5e-5)
criterion = torch.nn.CrossEntropyLoss()

class DatasetItem(TypedDict):
    text: str
    label: str


def preprocess_data(dataset_item: DatasetItem) -> dict[str, torch.Tensor]:
    return tokenizer(dataset_item["text"], truncation=True, padding="max_length", return_tensors="pt")


train_dataset = dataset["train"].select(range(1200)).map(preprocess_data, batched=True)
test_dataset = dataset["test"].select(range(800)).map(preprocess_data, batched=True)

train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])
test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

num_epochs = 3
losses: list[float] = []

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for batch in tqdm(train_loader, desc=f"Epoch {epoch + 1}"):
        inputs = {key: batch[key].to(device) for key in batch}
        labels = inputs.pop("label")
        outputs = model(**inputs, labels=labels)
        loss = outputs.loss
        total_loss += loss.item()
        losses.append(loss.item())

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    average_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch + 1}, Average Loss: {average_loss}")

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(losses, color="#fc1c49", linewidth=2)
plt.xlabel("Step")
plt.ylabel("Loss")
plt.title("Training Loss per Step Across Epochs")
plt.show()

model.eval()
correct = 0
total = 0

with torch.no_grad():
    for batch in tqdm(test_loader, desc="Evaluating"):
        inputs = {key: batch[key].to(device) for key in batch}
        labels = inputs.pop("label")
        outputs = model(**inputs, labels=labels)
        logits = outputs.logits
        predicted_labels = torch.argmax(logits, dim=1)
        correct += (predicted_labels == labels).sum().item()
        total += labels.size(0)

accuracy = correct / total

print("")
print(f"Test Accuracy: {accuracy * 100:.2f}%")

In [None]:
all_predictions: list[int] = []
all_labels: list[int] = []

with torch.no_grad():
    for batch in tqdm(test_loader, desc="Evaluating"):
        inputs = {key: batch[key].to(device) for key in batch}
        labels = inputs.pop("label")
        outputs = model(**inputs)
        logits = outputs.logits
        predicted_labels = torch.argmax(logits, dim=1)

        all_predictions.extend(predicted_labels.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

In [None]:
conf_matrix = confusion_matrix(all_labels, all_predictions)
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt="g", cmap=sns.light_palette("#fc1c49", as_cmap=True))
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
plt.title("Confusion Matrix Heatmap")
plt.show()

### 모델 서빙

Transformer Artifacts -아티팩트 패키징-> BentoML Serving

#### 모델 저장 및 패키징

In [None]:
# 모델 저장.
name = "bert_news_classification"
bentoml.transformers.save_model(
    name,
    pipeline("text-classification", model=model, tokenizer=tokenizer)
)

#### 파일 작성

In [None]:
%%shell
# service.py

cat > service.py <<EOF
import bentoml

runner = bentoml.models.get("bert_news_classification:latest").to_runner()
svc = bentoml.Service(
    name="bert_news_classification", runners=[runner]
)

@svc.api(input=bentoml.io.Text(), output=bentoml.io.JSON())
async def classify(text: str) -> dict[str, int|float]:
    output = await runner.async_run(text, max_length=512)
    return output[0]
EOF

#### 서빙

In [None]:
%%script bash --bg
bentoml serve service:svc

In [None]:
# 아래와 같이 뜨면 정상 실행. (Health check)
# HTTP/1.1 200 OK
# date: Sat, DD MM YYYY HH:mm:ss GMT
# server: uvicorn
# content-length: 1
# content-type: text/plain; charset=utf-8

!curl -I -X GET localhost:3000/healthz

### 모니터링

In [None]:
!curl http://loclahost:3000/metrics

### 서빙 테스트

In [None]:
# 스포츠 기사 평가 (레이블: 1)
!curl -X "POST" \
   "http://0.0.0.0:3000/classify" \
   -H "accept: application/json" \
   -H "Content-Type: text/plain" \
   -d "Bleary-eyed from 16 hours on a Greyhound bus, he strolled into the stadium running on fumes. He’d barely slept in two days. The ride he was supposed to hitch from Charlotte to Indianapolis canceled at the last minute, and for a few nervy hours, Antonio Barnes started to have his doubts. The trip he’d waited 40 years for looked like it wasn’t going to happen.ADVERTISEMENTBut as he moved through the concourse at Lucas Oil Stadium an hour before the Colts faced the Raiders, it started to sink in. His pace quickened. His eyes widened. His voice picked up.“I got chills right now,” he said. “Chills.”Barnes, 57, is a lifer, a Colts fan since the Baltimore days. He wore No. 25 on his pee wee football team because that’s the number Nesby Glasgow wore on Sundays. He was a talent in his own right, too: one of his old coaches nicknamed him “Bird” because of his speed with the ball.Back then, he’d catch the city bus to Memorial Stadium, buy a bleacher ticket for $5 and watch Glasgow and Bert Jones, Curtis Dickey and Glenn Doughty. When he didn’t have any money, he’d find a hole in the fence and sneak in. After the game was over, he’d weasel his way onto the field and try to meet the players. “They were tall as trees,” he remembers.He remembers the last game he went to: Sept. 25, 1983, an overtime win over the Bears. Six months later the Colts would ditch Baltimore in the middle of the night, a sucker-punch some in the city never got over. But Barnes couldn’t quit them. When his entire family became Ravens fans, he refused. “The Colts are all I know,” he says.For years, when he couldn’t watch the games, he’d try the radio. And when that didn’t work, he’d follow the scroll at the bottom of a screen.“There were so many nights I’d just sit there in my cell, picturing what it’d be like to go to another game,” he says. “But you’re left with that thought that keeps running through your mind: I’m never getting out.”It’s hard to dream when you’re serving a life sentence for conspiracy to commit murder.It started with a handoff, a low-level dealer named Mickey Poole telling him to tuck a Ziploc full of heroin into his pocket and hide behind the Murphy towers. This was how young drug runners were groomed in Baltimore in the late 1970s. This was Barnes’ way in.ADVERTISEMENTHe was 12.Back then he idolized the Mickey Pooles of the world, the older kids who drove the shiny cars, wore the flashy jewelry, had the girls on their arms and made any working stiff punching a clock from 9 to 5 look like a fool. They owned the streets. Barnes wanted to own them, too.“In our world,” says his nephew Demon Brown, “the only successful people we saw were selling drugs and carrying guns.”So whenever Mickey would signal for a vial or two, Barnes would hurry over from his hiding spot with that Ziploc bag, out of breath because he’d been running so hard."

In [None]:
# 비즈니스 기사 평가 (레이블: 2)
!curl -X "POST" \
   "http://0.0.0.0:3000/classify" \
   -H "accept: application/json" \
   -H "Content-Type: text/plain" \
   -d "DETROIT – America maintained its love affair with pickup trucks in 2023 — but a top-selling vehicle from Toyota Motor nearly ruined their tailgate party.Sales of the Toyota RAV4 compact crossover came within 10,000 units of Stellantis’ Ram pickup truck last year, a near-No. 3 ranking that would have marked the first time since 2014 that a non-pickup claimed one of the top three U.S. sales podium positions.The RAV4 has rapidly closed the gap: In 2020, the vehicle undersold the Ram truck by more than 133,000 units. Last year, it lagged by just 9,983. Stellantis sold 444,926 Ram pickups last year, a 5% decline from 2022.“Trucks are always at the top because they’re bought by not only individuals, but also fleet buyers and we saw heavy fleet buying last year,” said Michelle Krebs, an executive analyst at Cox Automotive. “The RAV4 shows that people want affordable, smaller SUVs, and the fact that there’s also a hybrid version of that makes it popular with people.”"

In [None]:
# 테크 기사 평가 (레이블: 3)
!curl -X "POST" \
   "http://0.0.0.0:3000/classify" \
   -H "accept: application/json" \
   -H "Content-Type: text/plain" \
   -d "OpenVoice comprises two AI models working together for text-to-speech conversion and voice tone cloning.The first model handles language style, accents, emotion, and other speech patterns. It was trained on 30,000 audio samples with varying emotions from English, Chinese, and Japanese speakers. The second “tone converter” model learned from over 300,000 samples encompassing 20,000 voices.By combining the universal speech model with a user-provided voice sample, OpenVoice can clone voices with very little data. This helps it generate cloned speech significantly faster than alternatives like Meta’s Voicebox.Californian startup OpenVoice comes from California-based startup MyShell, founded in 2023. With $5.6 million in early funding and over 400,000 users already, MyShell bills itself as a decentralised platform for creating and discovering AI apps.  In addition to pioneering instant voice cloning, MyShell offers original text-based chatbot personalities, meme generators, user-created text RPGs, and more. Some content is locked behind a subscription fee. The company also charges bot creators to promote their bots on its platform.By open-sourcing its voice cloning capabilities through HuggingFace while monetising its broader app ecosystem, MyShell stands to increase users across both while advancing an open model of AI development."

### ngrok을 통한 외부 연결

In [None]:
!pip install pyngrok

In [None]:
from pyngrok import ngrok

ngrok.set_auth_token("2aT0pKF7B5b3u8cQbsDoj12qdZs_52zPrJtqRW9dAGC3w6VFW")
inference_tunnel = ngrok.connect("3000")
inference_tunnel

In [None]:
%%script bash --bg
bentoml serve service:svc

In [None]:
!curl -X "POST" \
   "https://375f-34-83-7-74.ngrok-free.app:3000/classify" \
   -H "accept: application/json" \
   -H "Content-Type: text/plain" \
   -d "OpenVoice comprises two AI models working together for text-to-speech conversion and voice tone cloning.The first model handles language style, accents, emotion, and other speech patterns. It was trained on 30,000 audio samples with varying emotions from English, Chinese, and Japanese speakers. The second “tone converter” model learned from over 300,000 samples encompassing 20,000 voices.By combining the universal speech model with a user-provided voice sample, OpenVoice can clone voices with very little data. This helps it generate cloned speech significantly faster than alternatives like Meta’s Voicebox.Californian startup OpenVoice comes from California-based startup MyShell, founded in 2023. With $5.6 million in early funding and over 400,000 users already, MyShell bills itself as a decentralised platform for creating and discovering AI apps.  In addition to pioneering instant voice cloning, MyShell offers original text-based chatbot personalities, meme generators, user-created text RPGs, and more. Some content is locked behind a subscription fee. The company also charges bot creators to promote their bots on its platform.By open-sourcing its voice cloning capabilities through HuggingFace while monetising its broader app ecosystem, MyShell stands to increase users across both while advancing an open model of AI development."

### 서빙 종료

In [None]:
!lsof -i tcp:3000

In [None]:
!kill -9 ~~~~ # Process Id