### Tabular binary classification task with FLaVor inference service

* This guide will walk you through tailoring the FLaVor inference service for tabular binary classification tasks using seaborn dataset and sklearn inference model trained from `sklearn`.

### Prerequisite

As for the working environment, please ensure you have the following dependencies installed:

```
python >= 3.9
scikit-learn >= 1.5.1
seaborn >= 0.13.2
```

or simply run:

In [None]:
!poetry install --with tabular_example

#### Setp1. Train the model

In [1]:
# Import necessary libraries
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier

# Load the Titanic dataset
titanic = sns.load_dataset("titanic")

## Data Preprocessing

# Drop columns that won"t be used for prediction
titanic = titanic.drop(["deck", "embark_town", "alive"], axis=1)

# Fill missing values
titanic["age"] = titanic["age"].fillna(titanic["age"].median())
titanic["embarked"] = titanic["embarked"].fillna(titanic["embarked"].mode()[0])

# Encode categorical variables
titanic = pd.get_dummies(titanic, columns=["sex", "embarked", "class", "who", "adult_male", "alone"], drop_first=True)

# Split data into features and target variable
X = titanic.drop("survived", axis=1)
y = titanic["survived"]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the preprocessing steps for numerical and categorical features
numerical_features = ["age", "fare"]
numerical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

categorical_features = X.select_dtypes(include=["uint8"]).columns
categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

# Combine preprocessing steps into a ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numerical_transformer, numerical_features),
        ("cat", categorical_transformer, categorical_features)
    ])

# Define the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create a pipeline that combines preprocessing and model
pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("model", model)
])

# Train the model
pipeline.fit(X_train, y_train)

#### Step 2: Save the model

In [2]:
import joblib

# Save the pipeline (model and preprocessor)
joblib.dump(pipeline, "titanic_model.pkl")

['titanic_model.pkl']

#### Step 3: Implement the inference model

In [3]:
from typing import Any, Callable, Dict, List, Optional, Sequence

from flavor.serve.apps import InferAPP
from flavor.serve.inference.data_models.api import (
    BaseAiCOCOTabularInputDataModel,
    BaseAiCOCOTabularOutputDataModel,
)
from flavor.serve.inference.data_models.functional import AiTable
from flavor.serve.inference.inference_models import BaseAiCOCOTabularInferenceModel
from flavor.serve.inference.strategies import AiCOCOTabularClassificationOutputStrategy

class ClassificationInferenceModel(BaseAiCOCOTabularInferenceModel):
    def __init__(self, model_path):
        self.model_path = model_path
        self.formatter = AiCOCOTabularClassificationOutputStrategy()
        super().__init__()

    def define_inference_network(self) -> Callable:
        pipeline = joblib.load(self.model_path)
        return pipeline

    def set_categories(self) -> List[Dict[str, Any]]:
        categories = [{"name": "survived"}] # binary classification
        return categories

    def set_regressions(self) -> None:
        return None

    def data_reader(self, tables: Dict[str, Any], files: Sequence[str], **kwargs) -> List[pd.DataFrame]:
        table_names = [table["file_name"].replace("/", "_") for table in tables]

        file_names = sorted(files, key=lambda s: s[::-1])
        table_names = sorted(table_names, key=lambda s: s[::-1])
        
        dataframes = []
        for file, table in zip(file_names, table_names):
            if not file.endswith(table):
                raise ValueError(f"File names do not match table names: {file} vs {table}")
            
            df = pd.read_csv(file)
            dataframes.append(df)
        
        return dataframes

    def preprocess(self, data: List[pd.DataFrame]) -> pd.DataFrame:
        return pd.concat(data)

    def inference(self, x: pd.DataFrame):
        out = self.network.predict(x).reshape(-1, 1)
        return out

    def postprocess(self, model_out: np.ndarray, **kwargs) -> np.ndarray:
        return model_out

    def output_formatter(
        self,
        model_out: Any,
        tables: Sequence[AiTable],
        dataframes: Sequence[pd.DataFrame],
        meta: Dict[str, Any],
        categories: Optional[Sequence[Dict[str, Any]]] = None,
        **kwargs
    ) -> BaseAiCOCOTabularOutputDataModel:

        output = self.formatter(
                    model_out=model_out,
                    tables=tables,
                    dataframes=dataframes,
                    categories=categories,
                    meta=meta,
                )
        return output

  from .autonotebook import tqdm as notebook_tqdm


#### Step 4: Initiate the service

In [None]:
# This block is only for jupyter notebook. You don"t need this in stand-alone script.
import nest_asyncio
nest_asyncio.apply()

In [None]:
app = InferAPP(
    infer_function=ClassificationInferenceModel("titanic_model.pkl"),
    input_data_model=BaseAiCOCOTabularInputDataModel,
    output_data_model=BaseAiCOCOTabularOutputDataModel,
)

In [None]:
import os
app.run(port=int(os.getenv("PORT", 9111)))

### Send request
We can send request to the running server by `send_request.py` which opens the input files and the corresponding JSON file and would be sent via formdata. We expect to have response in AiCOCO tabular format.

```bash
# pwd: examples/inference
python send_request.py -f test_data/tabular/cls/test_cls.csv -d test_data/tabular/input.json
```

## Setup Dockerfile
In order to interact with other services, we have to wrap the inference model into a docker container. 
Here"s an example of the dockerfile. Please put your python dependencies into `requirements.txt` first.

```dockerfile
FROM python:3.9-slim

RUN pip install -r requirements.txt

RUN pip install https://github.com/ailabstw/FLaVor/archive/refs/heads/release/stable.zip

WORKDIR /app

COPY your_script.py  /app/

CMD ["python", "your_script.py"]

```