bug: memory leak when I am using bentoml>=1.2 #4760

gusghrlrl101 · 2024-05-28T16:34:16Z

Describe the bug

Hello!

It seems that I am experiencing a memory leak issue when using BentoML version 1.2.
I even tested it with an API that contains no logic.
I deployed it in a Kubernetes environment using containerization and did performance testing with Locust, but the memory keeps increasing.

Has no one else experienced this issue?
I can't figure out what's going wrong.
It was fine with version 1.1.

To reproduce

service.py

import bentoml

@bentoml.service
class TestService:
    @bentoml.api
    def predict(self, input: list) -> list:
        return []

bentofile.yaml

service: "service:TestService"

containerize

bentoml build -f bentofile.yaml --containerize

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bentoml-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: bentoml-test
  template:
    spec:
      containers:
        - name: bentoml-test
          image: "bentoml-test"
          imagePullPolicy: Always
          command:
            - bentoml
            - serve
            - --port
            - "8080"
            - "--timeout-keep-alive"
            - "65"
          lifecycle:
            preStop:
              exec:
                command:
                  - sleep
                  - "30"
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /livez
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 2Gi

Expected behavior

No response

Environment

$ bentoml env

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.2.16
python: 3.9.6
platform: Linux-5.10.162-141.675.amzn2.x86_64-x86_64-with-glibc2.26
uid_gid: 1000:1000

pip_packages

absl-py==2.1.0
aiodogstatsd==0.16.0.post0
aiohttp==3.9.5
aiosignal==1.3.1
alembic==1.13.1
annotated-types==0.6.0
anyio==3.7.1
appdirs==1.4.4
asgiref==3.8.1
astroid==2.15.8
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.2.16
bidict==0.22.1
black==22.12.0
blinker==1.8.2
boto3==1.26.115
botocore==1.29.165
Brotli==1.1.0
build==1.2.1
CacheControl==0.14.0
cachetools==5.3.3
catboost==1.2.3
cattrs==23.1.2
cbor2==5.4.6
certifi==2024.2.2
cffi==1.16.0
cfgv==3.4.0
charset-normalizer==3.3.2
circus==0.18.0
cleo==2.1.0
click==8.1.3
click-option-group==0.5.6
cloudpickle==3.0.0
cmaes==0.10.0
cmake==3.28.1
colorlog==6.7.0
ConfigArgParse==1.7
contextlib2==21.6.0
contourpy==1.2.1
coverage==7.5.1
crashtest==0.4.1
cryptography==42.0.7
cssselect==1.2.0
cycler==0.12.1
Cython==3.0.6
databricks-cli==0.18.0
datasets==2.14.6
deepl==1.16.1
deepmerge==1.1.1
deepspeed==0.12.6
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.8
distro==1.9.0
docker==6.1.3
dulwich==0.21.7
easyocr==1.7.1
emoji==2.2.0
entrypoints==0.4
et-xmlfile==1.1.0
evaluate==0.4.2
exceptiongroup==1.2.1
faiss-cpu==1.7.3
fakeredis==1.9.2
fastapi==0.103.2
fastjsonschema==2.19.1
feature-engine==1.6.0
filelock==3.14.0
Flask==3.0.3
Flask-Cors==4.0.0
Flask-Login==0.6.3
flatbuffers==24.3.25
fonttools==4.51.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.10.0
fuzzyset2==0.2.2
gast==0.4.0
gevent==24.2.1
geventhttpclient==2.2.1
gitdb==4.0.11
GitPython==3.1.43
google-auth==2.29.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
graphviz==0.20.1
greenlet==3.0.3
grpcio==1.64.0
gspread==5.12.0
gunicorn==21.2.0
h11==0.14.0
h3==3.7.6
h5py==3.11.0
hdrpy==0.3.3
hjson==3.1.0
httpcore==1.0.5
httplib2==0.22.0
httpx==0.27.0
huggingface-hub==0.23.0
humanize==4.9.0
identify==2.5.36
idna==3.7
imageio==2.34.1
importlib-metadata==6.11.0
importlib_resources==6.4.0
inflection==0.5.1
influxdb==5.3.1
iniconfig==2.0.0
installer==0.7.0
isort==5.13.2
itsdangerous==2.2.0
jaraco.classes==3.4.0
jeepney==0.8.0
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
kafka-python==2.0.2
keras==2.13.1
keyring==24.3.1
kiwipiepy==0.16.1
kiwipiepy_model==0.17.0
kiwisolver==1.4.5
lazy-object-proxy==1.10.0
lazy_loader==0.4
libclang==18.1.1
lightgbm==3.3.3
lightning-utilities==0.11.2
lit==17.0.6
llvmlite==0.42.0
locust==2.25.0
lxml==4.9.3
Mako==1.3.5
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.7.1
mccabe==0.7.0
mdurl==0.1.2
memory-profiler==0.61.0
mlflow==2.9.2
molotov==2.6
more-itertools==10.2.0
moto==4.2.14
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
nodeenv==1.8.0
numba==0.59.1
numpy==1.22.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
oauth2client==4.1.3
oauthlib==3.2.2
objgraph==3.6.1
openai==1.6.0
opencv-python-headless==4.9.0.80
openpyxl==3.1.2
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
opt-einsum==3.3.0
optuna==3.2.0
packaging==23.2
pandas==2.1.3
pathspec==0.12.1
patsy==0.5.3
pexpect==4.9.0
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.4.1
pkginfo==1.10.0
platformdirs==4.2.2
plotly==5.18.0
pluggy==1.5.0
poetry==1.8.3
poetry-core==1.9.0
poetry-plugin-export==1.8.0
polars==0.19.19
pre-commit==3.6.0
progressbar33==2.4
prometheus_client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.6
ptyprocess==0.7.0
py-cpuinfo==9.0.0
pyarrow==11.0.0
pyarrow-hotfix==0.6
pyasn1==0.6.0
pyasn1_modules==0.4.0
pyclipper==1.3.0.post5
pycparser==2.22
pydantic==2.4.2
pydantic_core==2.10.1
Pygments==2.18.0
PyJWT==2.8.0
pylint==2.17.7
pynvml==11.5.0
pyparsing==3.1.2
pyproject_hooks==1.1.0
pytesseract==0.3.10
pytest==7.3.2
pytest-cov==4.0.0
python-bidi==0.4.2
python-dateutil==2.9.0.post0
python-engineio==4.8.0
python-json-logger==2.0.7
python-multipart==0.0.9
python-socketio==5.10.0
pytorch-lightning==2.2.4
pytz==2023.4
PyVirtualDisplay==3.0
PyYAML==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
rapidfuzz==3.5.2
redis==4.3.4
regex==2024.5.15
requests==2.31.0
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
responses==0.25.0
rich==13.7.1
roundrobin==0.0.4
rsa==4.9
s3transfer==0.6.2
safetensors==0.4.3
schema==0.7.7
scikit-image==0.22.0
scikit-learn==1.3.0
scipy==1.13.0
SecretStorage==3.3.3
sentence-transformers==2.2.2
sentencepiece==0.2.0
sentry-sdk==1.40.6
seqeval==1.2.2
shapely==2.0.4
shellingham==1.5.4
simple-di==0.1.5
simple-websocket==1.0.0
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
soynlp==0.0.493
SQLAlchemy==2.0.24
sqlparse==0.5.0
starlette==0.27.0
statsmodels==0.14.0
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
tensorboard==2.13.0
tensorboard-data-server==0.7.2
tensorflow-estimator==2.13.0
tensorflow-io-gcs-filesystem==0.37.0
termcolor==2.4.0
terminaltables==3.1.10
threadpoolctl==3.5.0
tifffile==2024.5.10
tokenizers==0.13.3
tomli==2.0.1
tomli_w==1.0.0
tomlkit==0.12.5
torch==2.0.1
torchmetrics==1.4.0.post0
torchvision==0.15.2
tornado==6.4
tqdm==4.66.4
transformers==4.31.0
trino==0.324.0
triton==2.0.0
trove-classifiers==2024.5.17
typing_extensions==4.12.0
tzdata==2024.1
tzlocal==5.2
urllib3==1.26.18
urwid==2.1.2
uvicorn==0.29.0
virtualenv==20.26.2
watchfiles==0.21.0
wcwidth==0.2.12
websocket-client==1.8.0
Werkzeug==3.0.3
wrapt==1.16.0
wsproto==1.2.0
xgboost==1.6.2
xmltodict==0.13.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.18.2
zope.event==5.0
zope.interface==6.3

The text was updated successfully, but these errors were encountered:

frostming · 2024-05-29T00:51:26Z

Can you observe the same when running locally?

gusghrlrl101 · 2024-05-29T05:50:53Z

Can you observe the same when running locally?

I am not using bentoml in local, because I am using it in production level now.

even if I only change the version from 1.1 to 1.2 with same environment, it occurs.

do you have any idea why the memory is going up!?

gusghrlrl101 · 2024-05-29T08:15:51Z

Same issue occured when I served in local (my macbook). (70 minutes, 1.6m requests)

my environment

m1 max (sonoma 14.4.1)

Bentoml

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.2.16
python: 3.9.6
platform: macOS-14.4.1-arm64-arm-64bit
uid_gid: 502:20

pip_packages

aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
alembic==1.13.1
altair==5.3.0
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
astroid==2.15.8
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.2.16
black==22.12.0
blinker==1.8.2
boto3==1.26.115
botocore==1.29.165
build==1.2.1
catboost==1.2.1
cattrs==23.1.2
cbor2==5.4.6
certifi==2024.2.2
cffi==1.16.0
cfgv==3.4.0
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.3
click-option-group==0.5.6
cloudpickle==3.0.0
colorlog==6.7.0
contextlib2==21.6.0
contourpy==1.2.1
coverage==7.5.1
cryptography==42.0.7
cycler==0.12.1
databricks-cli==0.18.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.8
docker==6.1.3
easyocr==1.7.1
entrypoints==0.4
exceptiongroup==1.2.1
faiss-cpu==1.8.0
fakeredis==1.9.2
fastapi==0.110.2
feature-engine==1.6.0
ffmpy==0.3.2
filelock==3.14.0
Flask==3.0.3
fonttools==4.51.0
frozenlist==1.4.1
fs==2.4.16
gitdb==4.0.11
GitPython==3.1.43
gradio==3.41.0
gradio_client==0.5.0
graphviz==0.20.1
gunicorn==21.2.0
h11==0.14.0
h3==3.7.6
httpcore==1.0.5
httpx==0.27.0
identify==2.5.36
idna==3.7
imageio==2.34.1
importlib-metadata==6.11.0
importlib_resources==6.4.0
inflection==0.5.1
iniconfig==2.0.0
isort==5.13.2
itsdangerous==2.2.0
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kafka-python==2.0.2
kiwisolver==1.4.5
lazy-object-proxy==1.10.0
lazy_loader==0.4
llvmlite==0.42.0
Mako==1.3.5
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
mccabe==0.7.0
mdurl==0.1.2
mlflow==2.9.2
moto==4.2.14
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
nodeenv==1.8.0
numba==0.59.1
numpy==1.26.4
nvidia-ml-py==11.525.150
oauthlib==3.2.2
opencv-python-headless==4.5.5.64
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
orjson==3.10.1
packaging==23.2
pandas==2.1.3
pathspec==0.12.1
patsy==0.5.4
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.4.1
platformdirs==4.2.2
plotly==5.18.0
pluggy==1.5.0
polars==0.19.19
pre-commit==3.6.0
prometheus_client==0.20.0
protobuf==4.25.3
psutil==5.9.8
pyarrow==14.0.2
pyclipper==1.3.0.post5
pycparser==2.22
pydantic==2.4.2
pydantic_core==2.10.1
pydub==0.25.1
Pygments==2.18.0
PyJWT==2.8.0
pylint==2.17.7
pyparsing==3.1.2
pyproject_hooks==1.1.0
pytest==7.3.2
pytest-cov==4.0.0
python-bidi==0.4.2
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2023.4
PyYAML==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
redis==4.3.4
referencing==0.35.0
requests==2.31.0
responses==0.25.0
rich==13.7.1
rpds-py==0.18.0
s3transfer==0.6.2
schema==0.7.7
scikit-image==0.22.0
scikit-learn==1.4.2
scipy==1.13.0
semantic-version==2.10.0
sentry-sdk==1.40.6
shapely==2.0.4
simple-di==0.1.5
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.30
sqlparse==0.5.0
starlette==0.37.2
statsmodels==0.14.1
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
threadpoolctl==3.5.0
tifffile==2024.5.10
tomli==2.0.1
tomli_w==1.0.0
tomlkit==0.12.5
toolz==0.12.1
torch==2.0.1
torchvision==0.15.2
tornado==6.4
trino==0.324.0
typing_extensions==4.11.0
tzdata==2024.1
tzlocal==5.2
urllib3==1.26.18
uvicorn==0.29.0
virtualenv==20.26.2
watchfiles==0.21.0
websocket-client==1.8.0
websockets==11.0.3
Werkzeug==3.0.3
woowa_ml_sdk==0.9.6
wrapt==1.16.0
xgboost==1.6.2
xmltodict==0.13.0
yarl==1.9.4
zipp==3.18.2

bentoml containerize

bentoml build -f bentofile.yaml --containerize

docker run

docker run -it --rm -p 3000:3000 --cpus 1 --memory 2g test_service:mghyrjq5q2dlztwo

frostming · 2024-05-29T08:25:52Z

Can you reproduce it without containerizing?

Run bentoml serve to start the service.

gusghrlrl101 · 2024-05-29T08:46:58Z

I think it might be difficult to monitor memories with out containerizing, because of other systems. I will try in empty ec2 instance.

But just for now, it is bug for me because I'm using conatainerized bentoml. Could you check containerized image first?

frostming · 2024-05-29T09:08:38Z

I tried with docker container and test with locust using 100 peak concurrency and 10 ramp. The memory usage is stable on my side, no obvious leakage is seen.

gusghrlrl101 · 2024-05-29T09:24:44Z

Same with bentoml serve in local (ec2 instance).

how many request did you test? In my case, it was around 1 million requests.

frostming · 2024-05-29T09:39:55Z

200k reqs and the memory usage doesn't change too much.

To rule out other issues, can you first upgrade Python to 3.9.18(which i am using)?

gusghrlrl101 · 2024-05-29T10:52:47Z

same..

result
- 200k request
- 500MB memory up
locust setting
- users=48
- host=http://localhost:3000
host
- c6i.large ec2 instance
- cpu: 4, memory: 8GB

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.2.16
python: 3.9.18
platform: Linux-6.1.91-99.172.amzn2023.x86_64-x86_64-with-glibc2.34
uid_gid: 1000:1000

pip_packages

aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
appdirs==1.4.4
asgiref==3.8.1
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.2.16
blinker==1.8.2
Brotli==1.1.0
build==1.2.1
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
ConfigArgParse==1.7
deepmerge==1.1.1
Deprecated==1.2.14
exceptiongroup==1.2.1
Flask==3.0.3
Flask-Cors==4.0.1
Flask-Login==0.6.3
frozenlist==1.4.1
fs==2.4.16
gevent==24.2.1
geventhttpclient==2.3.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
importlib-metadata==6.11.0
inflection==0.5.1
itsdangerous==2.2.0
Jinja2==3.1.4
locust==2.28.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
msgpack==1.0.8
multidict==6.0.5
numpy==1.26.4
nvidia-ml-py==11.525.150
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
packaging==24.0
pathspec==0.12.1
pip-requirements-parser==32.0.1
pip-tools==7.4.1
prometheus_client==0.20.0
psutil==5.9.8
pydantic==2.7.2
pydantic_core==2.18.3
Pygments==2.18.0
pyparsing==3.1.2
pyproject_hooks==1.1.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
PyYAML==6.0.1
pyzmq==26.0.3
requests==2.32.2
rich==13.7.1
schema==0.7.7
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
tomli==2.0.1
tomli_w==1.0.0
tornado==6.4
typing_extensions==4.12.0
urllib3==2.2.1
uvicorn==0.30.0
watchfiles==0.22.0
Werkzeug==3.0.3
wrapt==1.16.0
yarl==1.9.4
zipp==3.19.0
zope.event==5.0
zope.interface==6.4.post2

bentoml

service.py

import bentoml

@bentoml.service
class TestService:
    @bentoml.api
    def predict(self, input: list) -> list:
        return []

bentofile.yaml

service: "service:TestService"

run bentoml

bentoml serve

locust

locust.py

from locust import HttpUser, task, constant

sample_data = {"input": []}

class Predict(HttpUser):
    wait_time = constant(0.05)

    @task
    def predict(self):
        self.client.post("/predict", json=sample_data)

run locust

locust -f locust.py

frostming · 2024-05-29T11:31:20Z

Can't reproduce either, can you use a memory profiler to figure it out? I recommend memray

gusghrlrl101 · 2024-05-30T06:12:07Z

In memray result, python process's memory is not encreasing. (it is only total 34Mb)

And in screen shot that I shared before, memory encreased (593MB -> 1.10GB) but process memory was same (1.0% -> 1.1%)

It might be encreased outside of python process,, Do you have any idea..?

frostming · 2024-05-30T10:23:13Z

encode/httpx#978 (comment)
aio-libs/aiohttp#4833

Would that be related? (It is surprising that it exists in both client libraries)

Zheaoli · 2024-05-30T13:45:47Z

Would you mind to follow the step?

run the bentoml serve on your ec2
pmap $(pid), and record the result
run the requests test
and pmap the pid and record it again

Zheaoli · 2024-05-30T14:18:22Z

I can reproduce it locally. you can see the 000055e1b280f000 still growing.

I think it the glibc malloc issue, the memory which is requested from Python is too small and to split to be merged into a big chunk. So after when we call free, the memory is still in used and can't be freed to system

Here's two ways to solve this

Run malloc_trim background periodically
Use tcmalloc instead of the glibc malloc

refer to: bentoml/BentoML#4760 (comment)

gusghrlrl101 · 2024-05-31T05:46:15Z

I tried to use tcmalloc instead of the glibc malloc.

But in my case, memory still encreased in my local with containerizing. (139MB -> 609MB)

Dockerfile.template

{% extends bento_base_template %}
{% block SETUP_BENTO_COMPONENTS %}
{{ super() }}

RUN apt-get update && apt-get install -y \
    google-perftools libgoogle-perftools-dev \
    && rm -rf /var/lib/apt/lists/*
ENV LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc.so"

{% endblock %}

bentofile.yaml

service: "service:TestService"
include:
  - service.py
docker:
  dockerfile_template: Dockerfile.template

containerize bento

bentoml build -f bentofile.yaml --containerize

run docker

docker run -it --rm -p 3000:3000 test_service:4f25sna7bw6njtwo

Zheaoli · 2024-05-31T07:19:41Z

Interesting

You are using ARM service?

gusghrlrl101 · 2024-05-31T07:36:09Z

The test was in my local (m1 max), but it was same in ec2 instance. ("/usr/lib/x86_64-linux-gnu/libtcmalloc.so")

Zheaoli · 2024-05-31T09:10:30Z

Can reproduced locally， new issue lol

let me find it out

Assign this to me plz cc @frostming

frostming · 2024-06-03T03:37:48Z

@gusghrlrl101 Try upgrading the dependencies by pip install bentoml -U --upgrade-strategy eager and run again.

gusghrlrl101 · 2024-06-03T04:49:20Z

After that, it was same. (400MB encreases after 300k requests)

$ pip list
Package                                      Version
-------------------------------------------- -----------
aiohttp                                      3.9.5
aiosignal                                    1.3.1
annotated-types                              0.7.0
anyio                                        4.4.0
appdirs                                      1.4.4
asgiref                                      3.8.1
async-timeout                                4.0.3
attrs                                        23.2.0
bentoml                                      1.2.16
build                                        1.2.1
cattrs                                       23.1.2
certifi                                      2024.6.2
circus                                       0.18.0
click                                        8.1.7
click-option-group                           0.5.6
cloudpickle                                  3.0.0
deepmerge                                    1.1.1
Deprecated                                   1.2.14
exceptiongroup                               1.2.1
frozenlist                                   1.4.1
fs                                           2.4.16
h11                                          0.14.0
httpcore                                     1.0.5
httpx                                        0.27.0
idna                                         3.7
importlib-metadata                           6.11.0
inflection                                   0.5.1
Jinja2                                       3.1.4
markdown-it-py                               3.0.0
MarkupSafe                                   2.1.5
mdurl                                        0.1.2
multidict                                    6.0.5
numpy                                        1.26.4
nvidia-ml-py                                 11.525.150
opentelemetry-api                            1.20.0
opentelemetry-instrumentation                0.41b0
opentelemetry-instrumentation-aiohttp-client 0.41b0
opentelemetry-instrumentation-asgi           0.41b0
opentelemetry-sdk                            1.20.0
opentelemetry-semantic-conventions           0.41b0
opentelemetry-util-http                      0.41b0
packaging                                    24.0
pathspec                                     0.12.1
pip                                          24.0
pip-requirements-parser                      32.0.1
pip-tools                                    7.4.1
prometheus_client                            0.20.0
psutil                                       5.9.8
pydantic                                     2.7.2
pydantic_core                                2.18.3
Pygments                                     2.18.0
pyparsing                                    3.1.2
pyproject_hooks                              1.1.0
python-dateutil                              2.9.0.post0
python-json-logger                           2.0.7
python-multipart                             0.0.9
PyYAML                                       6.0.1
pyzmq                                        26.0.3
rich                                         13.7.1
schema                                       0.7.7
setuptools                                   70.0.0
simple-di                                    0.1.5
six                                          1.16.0
sniffio                                      1.3.1
starlette                                    0.37.2
tomli                                        2.0.1
tomli_w                                      1.0.0
tornado                                      6.4
typing_extensions                            4.12.1
uvicorn                                      0.30.1
watchfiles                                   0.22.0
wheel                                        0.43.0
wrapt                                        1.16.0
yarl                                         1.9.4
zipp                                         3.19.1

Zheaoli · 2024-06-03T14:19:56Z

After debug, @frostming and me confirmed that this bug has been introduced into codebase in #4337

TL;DR;

In #4337 , @frostming made a new feature: make a tmp directory per request and use the tmp directory to cache all necessary files during the request

        with tempfile.TemporaryDirectory(prefix="bentoml-request-") as temp_dir:
            dir_token = request_directory.set(temp_dir)
            try:
                yield self
            finally:
                self._request_var.reset(request_token)
                self._response_var.reset(response_token)
                request_directory.reset(dir_token)

But there is a problem, when we make a new directory, the process may trigger a page cache action in kernel. The cache may be not released in time. This means that we will have a lot cache here. The docker stats will collect the page cache in container and the process memory together as the memory usage field which is displayed into the console.

So you will see the memory will continue rowing up util the os refresh the cache page

We can use bpftrace to verify this

tracepoint:kmem:mm_page_free_batched {
    if (pid == 4015 || pid == 4016) {
        @free_batched_count[pid] = @free_batched_count[pid] +1;
        if (@free_batched_count[pid] % 1000 == 0) {
            printf("mm_page_free_batched, Pid=%d, Count=%d\n", pid, @free_batched_count[pid])
        }
    }
}

tracepoint:kmem:mm_page_free {
    if (pid == 4015 || pid == 4016) {
        @free_count[pid] = @free_count[pid] +1;
        if (@free_count[pid] % 1000 == 0) {
            printf("mm_page_free, Pid=%d, Count=%d\n", pid, @free_count[pid])
        }
    }
}

tracepoint:kmem:mm_page_alloc {
    if (pid == 4015 || pid == 4016) {
        @alloc_count[pid] = @alloc_count[pid] +1;
        if (@alloc_count[pid] % 1000 == 0) {
            printf("mm_page_alloc, Pid=%d, Count=%d\n", pid, @alloc_count[pid])
        }
    }
}

The results here

mm_page_alloc, Pid=4015, Count=1000
mm_page_alloc, Pid=4016, Count=1000
mm_page_alloc, Pid=4016, Count=2000
mm_page_alloc, Pid=4016, Count=3000
mm_page_alloc, Pid=4015, Count=2000
mm_page_alloc, Pid=4015, Count=3000
mm_page_alloc, Pid=4016, Count=4000
mm_page_alloc, Pid=4015, Count=4000
mm_page_alloc, Pid=4016, Count=5000
mm_page_alloc, Pid=4015, Count=5000
mm_page_alloc, Pid=4015, Count=6000
mm_page_alloc, Pid=4016, Count=6000
mm_page_alloc, Pid=4016, Count=7000
mm_page_alloc, Pid=4015, Count=7000
mm_page_free, Pid=4016, Count=1000
mm_page_alloc, Pid=4015, Count=8000
mm_page_alloc, Pid=4016, Count=8000
mm_page_free, Pid=4015, Count=1000
mm_page_alloc, Pid=4015, Count=9000
mm_page_alloc, Pid=4016, Count=9000
mm_page_free, Pid=4016, Count=2000
mm_page_alloc, Pid=4015, Count=10000
mm_page_alloc, Pid=4016, Count=10000
mm_page_free, Pid=4015, Count=2000
mm_page_alloc, Pid=4015, Count=11000
mm_page_alloc, Pid=4016, Count=11000
mm_page_free, Pid=4016, Count=3000
mm_page_free, Pid=4015, Count=3000
mm_page_alloc, Pid=4015, Count=12000
mm_page_alloc, Pid=4016, Count=12000
mm_page_alloc, Pid=4015, Count=13000
mm_page_alloc, Pid=4016, Count=13000
mm_page_free, Pid=4016, Count=4000
mm_page_free, Pid=4015, Count=4000
mm_page_alloc, Pid=4015, Count=14000
mm_page_alloc, Pid=4016, Count=14000
mm_page_alloc, Pid=4015, Count=15000
mm_page_alloc, Pid=4016, Count=15000
mm_page_alloc, Pid=4015, Count=16000
mm_page_alloc, Pid=4016, Count=16000
mm_page_free, Pid=4016, Count=5000
mm_page_free, Pid=4015, Count=5000
mm_page_alloc, Pid=4015, Count=17000
mm_page_alloc, Pid=4016, Count=17000
mm_page_alloc, Pid=4015, Count=18000
mm_page_alloc, Pid=4016, Count=18000
mm_page_free, Pid=4016, Count=6000
mm_page_alloc, Pid=4015, Count=19000
mm_page_alloc, Pid=4016, Count=19000
mm_page_free, Pid=4015, Count=6000
mm_page_alloc, Pid=4015, Count=20000
mm_page_alloc, Pid=4016, Count=20000
mm_page_alloc, Pid=4015, Count=21000
mm_page_free, Pid=4016, Count=7000
mm_page_alloc, Pid=4016, Count=21000
mm_page_alloc, Pid=4015, Count=22000
mm_page_alloc, Pid=4016, Count=22000
mm_page_free, Pid=4015, Count=7000
mm_page_alloc, Pid=4015, Count=23000
mm_page_free, Pid=4016, Count=8000
mm_page_alloc, Pid=4016, Count=23000
mm_page_alloc, Pid=4015, Count=24000
mm_page_alloc, Pid=4016, Count=24000
mm_page_free, Pid=4015, Count=8000
mm_page_alloc, Pid=4015, Count=25000
mm_page_alloc, Pid=4016, Count=25000
mm_page_alloc, Pid=4015, Count=26000
mm_page_free, Pid=4015, Count=9000
mm_page_alloc, Pid=4016, Count=26000
mm_page_alloc, Pid=4015, Count=27000
mm_page_free, Pid=4016, Count=9000
mm_page_alloc, Pid=4016, Count=27000
mm_page_alloc, Pid=4015, Count=28000
mm_page_alloc, Pid=4015, Count=29000
mm_page_free, Pid=4015, Count=10000
mm_page_alloc, Pid=4016, Count=28000
mm_page_free, Pid=4016, Count=10000
mm_page_alloc, Pid=4015, Count=30000
mm_page_alloc, Pid=4016, Count=29000
mm_page_free, Pid=4016, Count=11000
mm_page_free, Pid=4015, Count=11000
mm_page_alloc, Pid=4015, Count=31000
mm_page_alloc, Pid=4016, Count=30000
mm_page_free, Pid=4015, Count=12000

We can see the process allocate a lots of page cahe and just release a little bit.

Zheaoli · 2024-06-03T14:21:34Z

For now, I think this is not a bug. It could be treated as a normal action. You can set the memory limit for your container. and the page cache would be release automaticly when the container memory usage is high under your limit

gusghrlrl101 · 2024-06-03T17:30:51Z

Thank you.
So there are no way to not increase the memory?
I think it is not stable to make deployment's memory full in production level.
How about adding the option to turn on/off that? or are there better way for me?

Zheaoli · 2024-06-03T18:05:38Z

So there are no way to not increase the memory?

In most circumstances, it's not necessary to flush the page cache manually. If you want to do this, run echo 3 > /proc/sys/vm/drop_caches

How about adding the option to turn on/off that? or are there better way for me?

Emmmm, I'm not sure about this, cc @frostming

gusghrlrl101 · 2024-06-03T18:47:42Z

How do you think about this one?

I think it is not stable to make deployment's memory full in production level.

It will look like the schreen shot below.

Fixes bentoml#4760 Signed-off-by: Frost Ming <me@frostming.com>

gusghrlrl101 · 2024-06-11T05:19:35Z

@frostming Hello. How is it going?

* fix: bug: memory leak when using bentoml>=1.2 Fixes #4760 Signed-off-by: Frost Ming <me@frostming.com> * add docstring Signed-off-by: Frost Ming <me@frostming.com> * fix: access attribute Signed-off-by: Frost Ming <me@frostming.com> * fix: typo Signed-off-by: Frost Ming <me@frostming.com> * fix: use a directory pool Signed-off-by: Frost Ming <me@frostming.com> * typo Signed-off-by: Frost Ming <me@frostming.com> * fix: clean dir out of mutex Signed-off-by: Frost Ming <me@frostming.com> --------- Signed-off-by: Frost Ming <me@frostming.com>

parano · 2024-06-13T01:47:22Z

@gusghrlrl101 the PR has just been merged, will be available in the next release. Thank you for reporting this issue!

It would be great if you could help verify the fix using the main branch or with the next release 🙏

gusghrlrl101 · 2024-06-13T11:42:48Z

Thank you!

The memory no longer increases.

I look forward to the new version being released soon so that I can use it.

gusghrlrl101 added the bug Something isn't working label May 28, 2024

Zheaoli mentioned this issue May 30, 2024

feature: Use tcmalloc in supported platform as default memory allocator #4766

Open

IceCodeNew added a commit to IceCodeNew/docker-collections that referenced this issue May 30, 2024

fix 1/2: introduce scudo-malloc

ab64e98

refer to: bentoml/BentoML#4760 (comment)

IceCodeNew added a commit to IceCodeNew/docker-collections that referenced this issue May 30, 2024

fix 1/2: introduce scudo-malloc

e4f8c1a

refer to: bentoml/BentoML#4760 (comment)

IceCodeNew added a commit to IceCodeNew/docker-collections that referenced this issue May 30, 2024

fix 1/3: introduce scudo-malloc

7ab4761

refer to: bentoml/BentoML#4760 (comment)

frostming assigned Zheaoli May 31, 2024

Zheaoli mentioned this issue Jun 3, 2024

Support o_direct flag for tempfile.TemporaryFile python/cpython#119994

Open

frostming self-assigned this Jun 4, 2024

frostming added a commit to frostming/BentoML that referenced this issue Jun 4, 2024

fix: bug: memory leak when using bentoml>=1.2

15dc29b

Fixes bentoml#4760 Signed-off-by: Frost Ming <me@frostming.com>

frostming mentioned this issue Jun 4, 2024

fix: bug: memory leak when using bentoml>=1.2 #4775

Merged

frostming added a commit to frostming/BentoML that referenced this issue Jun 7, 2024

fix: bug: memory leak when using bentoml>=1.2

badde9f

Fixes bentoml#4760 Signed-off-by: Frost Ming <me@frostming.com>

frostming closed this as completed in #4775 Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: memory leak when I am using bentoml>=1.2 #4760

bug: memory leak when I am using bentoml>=1.2 #4760

gusghrlrl101 commented May 28, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 30, 2024

frostming commented May 30, 2024

Zheaoli commented May 30, 2024 •

edited

Loading

Zheaoli commented May 30, 2024

gusghrlrl101 commented May 31, 2024

Zheaoli commented May 31, 2024

gusghrlrl101 commented May 31, 2024

Zheaoli commented May 31, 2024

frostming commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

gusghrlrl101 commented Jun 11, 2024

parano commented Jun 13, 2024 •

edited

Loading

gusghrlrl101 commented Jun 13, 2024

bug: memory leak when I am using bentoml>=1.2 #4760

bug: memory leak when I am using bentoml>=1.2 #4760

Comments

gusghrlrl101 commented May 28, 2024

Describe the bug

To reproduce

Expected behavior

Environment

Environment variable

System information

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

gusghrlrl101 commented May 29, 2024

my environment

m1 max (sonoma 14.4.1)

Bentoml

System information

bentoml containerize

docker run

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

frostming commented May 29, 2024

gusghrlrl101 commented May 29, 2024

Environment variable

System information

bentoml

locust

frostming commented May 29, 2024

gusghrlrl101 commented May 30, 2024

frostming commented May 30, 2024

Zheaoli commented May 30, 2024 • edited Loading

Zheaoli commented May 30, 2024

gusghrlrl101 commented May 31, 2024

Zheaoli commented May 31, 2024

gusghrlrl101 commented May 31, 2024

Zheaoli commented May 31, 2024

frostming commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

Zheaoli commented Jun 3, 2024

gusghrlrl101 commented Jun 3, 2024

gusghrlrl101 commented Jun 11, 2024

parano commented Jun 13, 2024 • edited Loading

gusghrlrl101 commented Jun 13, 2024

Zheaoli commented May 30, 2024 •

edited

Loading

parano commented Jun 13, 2024 •

edited

Loading