Skip to content

Commit

Permalink
Docker version of production serving (#551)
Browse files Browse the repository at this point in the history
* Optimize dockerfile

* add docker-compose file

* change directory in which feedback.csv is added to

* update docker-compose file to run on port 8080

* Updated docs for docker run and docker-compose

* add github actions workflow to build docker image

* implement evaluate function

* Change to POST, add json input

* Add serialization for Dataset

* make eval() and organize_metrics() staticmethod

* Add option to save train_set together with model

* evaluate API is working now

* Update README.md

* Fix docstring typo

* add security check, enhance error message

* fix typo

* _safe_eval() only allows correct metric names

* Changed reader format to UIR for evalution

---------

Co-authored-by: tqtg <tuantq.vnu@gmail.com>
Co-authored-by: Quoc-Tuan Truong <tqtg@users.noreply.github.com>
  • Loading branch information
3 people committed Dec 6, 2023
1 parent fb83104 commit 190c5a1
Show file tree
Hide file tree
Showing 10 changed files with 449 additions and 63 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/image-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Publish Container Image

on:
push:
tags:
- '*'
workflow_dispatch:

jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Prepare
id: prep
run: |
TIME=$(date +%s)
VERSION=dev-$TIME
if [[ $GITHUB_REF == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/}
VERSION="${VERSION:1}"
fi
IMAGE="registry.preferred.ai/cornac/cornac-server"
echo ::set-output name=tagged_image::${IMAGE}:${VERSION},${IMAGE}:latest
shell: bash

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Set up Buildx
id: buildx
uses: docker/setup-buildx-action@v3

- name: Login to registry
uses: docker/login-action@v3
with:
registry: registry.preferred.ai
username: ${{ secrets.PREFERRED_REGISTRY_USERNAME }}
password: ${{ secrets.PREFERRED_REGISTRY_PASSWORD }}

- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.prep.outputs.tagged_image }}
cache-from: type=gha
cache-to: type=gha,mode=max
22 changes: 10 additions & 12 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# BUILDER #
###########

FROM python:3.11.6-slim-bullseye AS builder
FROM python:3.11-slim AS builder

# Set working directory
WORKDIR /app
Expand All @@ -12,19 +12,18 @@ COPY ./setup.py setup.py
COPY ./cornac cornac
COPY ./README.md README.md

RUN pip install --upgrade pip
RUN pip install Cython numpy scipy

RUN apt-get update && \
apt-get -y --no-install-recommends install gcc g++
apt-get -y --no-install-recommends install gcc g++ && \
pip install --no-cache-dir Cython numpy scipy && \
pip install --no-cache-dir .

RUN pip install --no-cache-dir . # install cornac
# RUN pip install --no-cache-dir cornac # install cornac

##########
# RUNNER #
##########

FROM python:3.11.6-slim-bullseye AS runner
FROM python:3.11-slim AS runner

WORKDIR /app

Expand All @@ -33,14 +32,13 @@ ENV MODEL_PATH=""
ENV MODEL_CLASS=""
ENV PORT=5000

COPY --from=builder /app/cornac cornac
COPY --from=builder /app/cornac/serving cornac/serving
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages

RUN apt-get update && \
apt-get -y --no-install-recommends install gcc g++ && \
rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir Flask gunicorn
apt-get -y --no-install-recommends install libgomp1 && \
rm -rf /var/lib/apt/lists/* && \
pip install --no-cache-dir Flask gunicorn

WORKDIR /app/cornac/serving

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ $ pip3 install Flask
```
Supposed that we want to serve the trained BPR model from previous example, we need to save it:
```python
bpr.save("save_dir")
bpr.save("save_dir", save_trainset=True)
```
After that, the model can be deployed easily by running Cornac serving app as follows:
```bash
Expand All @@ -126,7 +126,7 @@ $ curl -X GET "http://localhost:8080/recommend?uid=63&k=5&remove_seen=false"

# Response: {"recommendations": ["50", "181", "100", "258", "286"], "query": {"uid": "63", "k": 5, "remove_seen": false}}
```
If we want to remove seen items during training, we need to provide `TRAIN_SET` when starting the serving app. We can also leverage [WSGI](https://flask.palletsprojects.com/en/3.0.x/deploying/) server for model deployment in production. Please refer to [this](https://cornac.readthedocs.io/en/latest/user/iamadeveloper.html#running-an-api-service) guide for more details.
If we want to remove seen items during training, we need to provide `TRAIN_SET` which has been saved with the model earlier, when starting the serving app. We can also leverage [WSGI](https://flask.palletsprojects.com/en/3.0.x/deploying/) server for model deployment in production. Please refer to [this](https://cornac.readthedocs.io/en/latest/user/iamadeveloper.html#running-an-api-service) guide for more details.

## Efficient retrieval with ANN search

Expand Down
55 changes: 54 additions & 1 deletion cornac/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
# limitations under the License.
# ============================================================================

import os
import copy
import pickle
import warnings
from collections import Counter, OrderedDict, defaultdict

Expand Down Expand Up @@ -97,7 +100,6 @@ def __init__(

self.__user_ids = None
self.__item_ids = None

self.__user_data = None
self.__item_data = None
self.__chrono_user_data = None
Expand All @@ -106,6 +108,18 @@ def __init__(
self.__csc_matrix = None
self.__dok_matrix = None

self.ignored_attrs = [
"__user_ids",
"__item_ids",
"__user_data",
"__item_data",
"__chrono_user_data",
"__chrono_item_data",
"__csr_matrix",
"__csc_matrix",
"__dok_matrix",
]

@property
def user_ids(self):
"""Return the list of raw user ids"""
Expand Down Expand Up @@ -563,6 +577,45 @@ def add_modalities(self, **kwargs):
self.sentiment = kwargs.get("sentiment", None)
self.review_text = kwargs.get("review_text", None)

def __deepcopy__(self, memo):
cls = self.__class__
result = cls.__new__(cls)
for k, v in self.__dict__.items():
if k in self.ignored_attrs:
continue
setattr(result, k, copy.deepcopy(v))
return result

def save(self, fpath):
"""Save a dataset to the filesystem.
Parameters
----------
fpath: str, required
Path to a file for the dataset to be stored.
"""
os.makedirs(os.path.dirname(fpath), exist_ok=True)
dataset = copy.deepcopy(self)
pickle.dump(dataset, open(fpath, "wb"), protocol=pickle.HIGHEST_PROTOCOL)

@staticmethod
def load(fpath):
"""Load a dataset from the filesystem.
Parameters
----------
fpath: str, required
Path to a file where the dataset is stored.
Returns
-------
self : object
"""
dataset = pickle.load(open(fpath, "rb"))
dataset.load_from = fpath # for further loading
return dataset


class BasketDataset(Dataset):
"""Training set contains history baskets
Expand Down
81 changes: 56 additions & 25 deletions cornac/eval_methods/base_method.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,8 @@ def _reset(self):
self.rng = get_rng(self.seed)
self.test_set = self.test_set.reset()

def _organize_metrics(self, metrics):
@staticmethod
def organize_metrics(metrics):
"""Organize metrics according to their types (rating or raking)
Parameters
Expand All @@ -460,26 +461,27 @@ def _organize_metrics(self, metrics):
"""
if isinstance(metrics, dict):
self.rating_metrics = metrics.get("rating", [])
self.ranking_metrics = metrics.get("ranking", [])
rating_metrics = metrics.get("rating", [])
ranking_metrics = metrics.get("ranking", [])
elif isinstance(metrics, list):
self.rating_metrics = []
self.ranking_metrics = []
rating_metrics = []
ranking_metrics = []
for mt in metrics:
if isinstance(mt, RatingMetric):
self.rating_metrics.append(mt)
rating_metrics.append(mt)
elif isinstance(mt, RankingMetric) and hasattr(mt.k, "__len__"):
self.ranking_metrics.extend(
ranking_metrics.extend(
[mt.__class__(k=_k) for _k in sorted(set(mt.k))]
)
else:
self.ranking_metrics.append(mt)
ranking_metrics.append(mt)
else:
raise ValueError("Type of metrics has to be either dict or list!")

# sort metrics by name
self.rating_metrics = sorted(self.rating_metrics, key=lambda mt: mt.name)
self.ranking_metrics = sorted(self.ranking_metrics, key=lambda mt: mt.name)
rating_metrics = sorted(rating_metrics, key=lambda mt: mt.name)
ranking_metrics = sorted(ranking_metrics, key=lambda mt: mt.name)
return rating_metrics, ranking_metrics

def _build_datasets(self, train_data, test_data, val_data=None):
self.train_set = Dataset.build(
Expand Down Expand Up @@ -645,39 +647,52 @@ def build(self, train_data, test_data, val_data=None):

return self

def _eval(self, model, test_set, val_set, user_based):
@staticmethod
def eval(
model,
train_set,
test_set,
val_set,
rating_threshold,
exclude_unknowns,
user_based,
rating_metrics,
ranking_metrics,
verbose,
):
"""Running evaluation for rating and ranking metrics respectively."""
metric_avg_results = OrderedDict()
metric_user_results = OrderedDict()

avg_results, user_results = rating_eval(
model=model,
metrics=self.rating_metrics,
metrics=rating_metrics,
test_set=test_set,
user_based=user_based,
verbose=self.verbose,
verbose=verbose,
)
for i, mt in enumerate(self.rating_metrics):
for i, mt in enumerate(rating_metrics):
metric_avg_results[mt.name] = avg_results[i]
metric_user_results[mt.name] = user_results[i]

avg_results, user_results = ranking_eval(
model=model,
metrics=self.ranking_metrics,
train_set=self.train_set,
metrics=ranking_metrics,
train_set=train_set,
test_set=test_set,
val_set=val_set,
rating_threshold=self.rating_threshold,
exclude_unknowns=self.exclude_unknowns,
verbose=self.verbose,
rating_threshold=rating_threshold,
exclude_unknowns=exclude_unknowns,
verbose=verbose,
)
for i, mt in enumerate(self.ranking_metrics):
for i, mt in enumerate(ranking_metrics):
metric_avg_results[mt.name] = avg_results[i]
metric_user_results[mt.name] = user_results[i]

return Result(model.name, metric_avg_results, metric_user_results)

def evaluate(self, model, metrics, user_based, show_validation=True):
"""Evaluate given models according to given metrics
"""Evaluate given models according to given metrics. Supposed to be called by Experiment.
Parameters
----------
Expand All @@ -704,7 +719,6 @@ def evaluate(self, model, metrics, user_based, show_validation=True):
raise ValueError("test_set is required but None!")

self._reset()
self._organize_metrics(metrics)

###########
# FITTING #
Expand All @@ -722,13 +736,21 @@ def evaluate(self, model, metrics, user_based, show_validation=True):
if self.verbose:
print("\n[{}] Evaluation started!".format(model.name))

rating_metrics, ranking_metrics = self.organize_metrics(metrics)

start = time.time()
model.transform(self.test_set)
test_result = self._eval(
test_result = self.eval(
model=model,
train_set=self.train_set,
test_set=self.test_set,
val_set=self.val_set,
rating_threshold=self.rating_threshold,
exclude_unknowns=self.exclude_unknowns,
rating_metrics=rating_metrics,
ranking_metrics=ranking_metrics,
user_based=user_based,
verbose=self.verbose,
)
test_time = time.time() - start
test_result.metric_avg_results["Train (s)"] = train_time
Expand All @@ -738,8 +760,17 @@ def evaluate(self, model, metrics, user_based, show_validation=True):
if show_validation and self.val_set is not None:
start = time.time()
model.transform(self.val_set)
val_result = self._eval(
model=model, test_set=self.val_set, val_set=None, user_based=user_based
val_result = self.eval(
model=model,
train_set=self.train_set,
test_set=self.val_set,
val_set=None,
rating_threshold=self.rating_threshold,
exclude_unknowns=self.exclude_unknowns,
rating_metrics=rating_metrics,
ranking_metrics=ranking_metrics,
user_based=user_based,
verbose=self.verbose,
)
val_time = time.time() - start
val_result.metric_avg_results["Time (s)"] = val_time
Expand Down

0 comments on commit 190c5a1

Please sign in to comment.