`dagster-webserver` memory leak #18997

aaaaahaaaaa · 2024-01-03T16:38:26Z

Dagster version

1.5.13

What's the issue?

dagster-webserver 1.5.13 seems to have some kind of memory leak. Since we updated to that version, we can observe a steady increase in memory usage over the last couple of weeks.

The increase in memory usage correlates to the change of version, without any other change being introduced.
We observe the same behaviour on 2 different GKE clusters.
Reverting to 1.5.12 resolves the issue.

What did you expect to happen?

No response

How to reproduce?

No response

Deployment type

Dagster Helm chart

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

The text was updated successfully, but these errors were encountered:

alangenfeld · 2024-01-03T17:36:44Z

I don't see any notable commits in 1.5.13 on initial inspection

Reverting to 1.5.12 resolves the issue.

How exactly did you do this? Can you report the python environments in the two containers (pip list / pip freeze) ? Trying to discern if its possible that the leak is from a dependency that also changed between the two container images.

aaaaahaaaaa · 2024-01-03T17:57:21Z

How exactly did you do this?

We changed the helm chart version. We literally just reverted the Renovate bot commit.

`1.5.12`

`pip list`

Package                     Version
--------------------------- ------------
alembic                     1.13.0
amqp                        5.2.0
aniso8601                   9.0.1
annotated-types             0.6.0
anyio                       4.1.0
async-timeout               4.0.3
azure-core                  1.29.5
azure-identity              1.15.0
azure-storage-blob          12.19.0
azure-storage-file-datalake 12.14.0
backoff                     2.2.1
billiard                    4.2.0
boto3                       1.33.12
botocore                    1.33.12
cachetools                  5.3.2
celery                      5.3.6
certifi                     2023.11.17
cffi                        1.16.0
charset-normalizer          3.3.2
click                       8.1.7
click-didyoumean            0.3.0
click-plugins               1.1.1
click-repl                  0.3.0
coloredlogs                 14.0
croniter                    2.0.1
cryptography                41.0.7
dagster                     1.5.12
dagster-aws                 0.21.12
dagster-azure               0.21.12
dagster-celery              0.21.12
dagster-celery-k8s          0.21.12
dagster-gcp                 0.21.12
dagster-graphql             1.5.12
dagster-k8s                 0.21.12
dagster-pandas              0.21.12
dagster-pipes               1.5.12
dagster-postgres            0.21.12
dagster-webserver           1.5.12
db-dtypes                   1.1.1
docstring-parser            0.15
exceptiongroup              1.2.0
flower                      2.0.1
fsspec                      2023.12.2
google-api-core             2.15.0
google-api-python-client    2.110.0
google-auth                 2.25.2
google-auth-httplib2        0.1.1
google-cloud-bigquery       3.13.0
google-cloud-core           2.4.1
google-cloud-storage        2.13.0
google-crc32c               1.5.0
google-resumable-media      2.6.0
googleapis-common-protos    1.62.0
gql                         3.4.1
graphene                    3.3
graphql-core                3.2.3
graphql-relay               3.2.0
greenlet                    3.0.2
grpcio                      1.60.0
grpcio-health-checking      1.60.0
grpcio-status               1.60.0
h11                         0.14.0
httplib2                    0.22.0
httptools                   0.6.1
humanfriendly               10.0
humanize                    4.9.0
idna                        3.6
isodate                     0.6.1
Jinja2                      3.1.2
jmespath                    1.0.1
kombu                       5.3.4
kubernetes                  28.1.0
Mako                        1.3.0
MarkupSafe                  2.1.3
msal                        1.26.0
msal-extensions             1.1.0
multidict                   6.0.4
numpy                       1.26.2
oauth2client                4.1.3
oauthlib                    3.2.2
packaging                   23.2
pandas                      2.1.4
pendulum                    2.1.2
pip                         23.0.1
portalocker                 2.8.2
prometheus-client           0.19.0
prompt-toolkit              3.0.41
proto-plus                  1.23.0
protobuf                    4.25.1
psycopg2-binary             2.9.9
pyarrow                     14.0.1
pyasn1                      0.5.1
pyasn1-modules              0.3.0
pycparser                   2.21
pydantic                    2.5.2
pydantic_core               2.14.5
PyJWT                       2.8.0
pyparsing                   3.1.1
python-dateutil             2.8.2
python-dotenv               1.0.0
pytz                        2023.3.post1
pytzdata                    2020.1
PyYAML                      6.0.1
redis                       5.0.1
requests                    2.31.0
requests-oauthlib           1.3.1
requests-toolbelt           0.10.1
rsa                         4.9
s3transfer                  0.8.2
setuptools                  65.5.1
six                         1.16.0
sniffio                     1.3.0
SQLAlchemy                  2.0.23
starlette                   0.33.0
tabulate                    0.9.0
tomli                       2.0.1
toposort                    1.10
tornado                     6.4
tqdm                        4.66.1
typing_extensions           4.9.0
tzdata                      2023.3
universal-pathlib           0.1.4
uritemplate                 4.1.1
urllib3                     1.26.18
uvicorn                     0.24.0.post1
uvloop                      0.19.0
vine                        5.1.0
watchdog                    3.0.0
watchfiles                  0.21.0
wcwidth                     0.2.12
websocket-client            1.7.0
websockets                  12.0
wheel                       0.42.0
yarl                        1.9.4

`pip freeze`

alembic==1.13.0
amqp==5.2.0
aniso8601==9.0.1
annotated-types==0.6.0
anyio==4.1.0
async-timeout==4.0.3
azure-core==1.29.5
azure-identity==1.15.0
azure-storage-blob==12.19.0
azure-storage-file-datalake==12.14.0
backoff==2.2.1
billiard==4.2.0
boto3==1.33.12
botocore==1.33.12
cachetools==5.3.2
celery==5.3.6
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
coloredlogs==14.0
croniter==2.0.1
cryptography==41.0.7
dagster==1.5.12
dagster-aws==0.21.12
dagster-azure==0.21.12
dagster-celery==0.21.12
dagster-celery-k8s==0.21.12
dagster-gcp==0.21.12
dagster-graphql==1.5.12
dagster-k8s==0.21.12
dagster-pandas==0.21.12
dagster-pipes==1.5.12
dagster-postgres==0.21.12
dagster-webserver==1.5.12
db-dtypes==1.1.1
docstring-parser==0.15
exceptiongroup==1.2.0
flower==2.0.1
fsspec==2023.12.2
google-api-core==2.15.0
google-api-python-client==2.110.0
google-auth==2.25.2
google-auth-httplib2==0.1.1
google-cloud-bigquery==3.13.0
google-cloud-core==2.4.1
google-cloud-storage==2.13.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.62.0
gql==3.4.1
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
greenlet==3.0.2
grpcio==1.60.0
grpcio-health-checking==1.60.0
grpcio-status==1.60.0
h11==0.14.0
httplib2==0.22.0
httptools==0.6.1
humanfriendly==10.0
humanize==4.9.0
idna==3.6
isodate==0.6.1
Jinja2==3.1.2
jmespath==1.0.1
kombu==5.3.4
kubernetes==28.1.0
Mako==1.3.0
MarkupSafe==2.1.3
msal==1.26.0
msal-extensions==1.1.0
multidict==6.0.4
numpy==1.26.2
oauth2client==4.1.3
oauthlib==3.2.2
packaging==23.2
pandas==2.1.4
pendulum==2.1.2
portalocker==2.8.2
prometheus-client==0.19.0
prompt-toolkit==3.0.41
proto-plus==1.23.0
protobuf==4.25.1
psycopg2-binary==2.9.9
pyarrow==14.0.1
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.5.2
pydantic_core==2.14.5
PyJWT==2.8.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
pytzdata==2020.1
PyYAML==6.0.1
redis==5.0.1
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
rsa==4.9
s3transfer==0.8.2
six==1.16.0
sniffio==1.3.0
SQLAlchemy==2.0.23
starlette==0.33.0
tabulate==0.9.0
tomli==2.0.1
toposort==1.10
tornado==6.4
tqdm==4.66.1
typing_extensions==4.9.0
tzdata==2023.3
universal-pathlib==0.1.4
uritemplate==4.1.1
urllib3==1.26.18
uvicorn==0.24.0.post1
uvloop==0.19.0
vine==5.1.0
watchdog==3.0.0
watchfiles==0.21.0
wcwidth==0.2.12
websocket-client==1.7.0
websockets==12.0
yarl==1.9.4

`1.5.13`

`pip list`

Package                     Version
--------------------------- ------------
alembic                     1.13.0
amqp                        5.2.0
aniso8601                   9.0.1
annotated-types             0.6.0
anyio                       4.1.0
async-timeout               4.0.3
azure-core                  1.29.5
azure-identity              1.15.0
azure-storage-blob          12.19.0
azure-storage-file-datalake 12.14.0
backoff                     2.2.1
billiard                    4.2.0
boto3                       1.34.0
botocore                    1.34.0
cachetools                  5.3.2
celery                      5.3.6
certifi                     2023.11.17
cffi                        1.16.0
charset-normalizer          3.3.2
click                       8.1.7
click-didyoumean            0.3.0
click-plugins               1.1.1
click-repl                  0.3.0
coloredlogs                 14.0
croniter                    2.0.1
cryptography                41.0.7
dagster                     1.5.13
dagster-aws                 0.21.13
dagster-azure               0.21.13
dagster-celery              0.21.13
dagster-celery-k8s          0.21.13
dagster-gcp                 0.21.13
dagster-graphql             1.5.13
dagster-k8s                 0.21.13
dagster-pandas              0.21.13
dagster-pipes               1.5.13
dagster-postgres            0.21.13
dagster-webserver           1.5.13
db-dtypes                   1.2.0
docstring-parser            0.15
exceptiongroup              1.2.0
flower                      2.0.1
fsspec                      2023.12.2
google-api-core             2.15.0
google-api-python-client    2.111.0
google-auth                 2.25.2
google-auth-httplib2        0.2.0
google-cloud-bigquery       3.14.1
google-cloud-core           2.4.1
google-cloud-storage        2.14.0
google-crc32c               1.5.0
google-resumable-media      2.7.0
googleapis-common-protos    1.62.0
gql                         3.4.1
graphene                    3.3
graphql-core                3.2.3
graphql-relay               3.2.0
greenlet                    3.0.2
grpcio                      1.60.0
grpcio-health-checking      1.60.0
h11                         0.14.0
httplib2                    0.22.0
httptools                   0.6.1
humanfriendly               10.0
humanize                    4.9.0
idna                        3.6
isodate                     0.6.1
Jinja2                      3.1.2
jmespath                    1.0.1
kombu                       5.3.4
kubernetes                  28.1.0
Mako                        1.3.0
MarkupSafe                  2.1.3
msal                        1.26.0
msal-extensions             1.1.0
multidict                   6.0.4
numpy                       1.26.2
oauth2client                4.1.3
oauthlib                    3.2.2
packaging                   23.2
pandas                      2.1.4
pendulum                    2.1.2
pip                         23.0.1
portalocker                 2.8.2
prometheus-client           0.19.0
prompt-toolkit              3.0.43
protobuf                    4.25.1
psycopg2-binary             2.9.9
pyarrow                     14.0.1
pyasn1                      0.5.1
pyasn1-modules              0.3.0
pycparser                   2.21
pydantic                    2.5.2
pydantic_core               2.14.5
PyJWT                       2.8.0
pyparsing                   3.1.1
python-dateutil             2.8.2
python-dotenv               1.0.0
pytz                        2023.3.post1
pytzdata                    2020.1
PyYAML                      6.0.1
redis                       5.0.1
requests                    2.31.0
requests-oauthlib           1.3.1
requests-toolbelt           0.10.1
rsa                         4.9
s3transfer                  0.9.0
setuptools                  65.5.1
six                         1.16.0
sniffio                     1.3.0
SQLAlchemy                  2.0.23
starlette                   0.33.0
tabulate                    0.9.0
tomli                       2.0.1
toposort                    1.10
tornado                     6.4
tqdm                        4.66.1
typing_extensions           4.9.0
tzdata                      2023.3
universal-pathlib           0.1.4
uritemplate                 4.1.1
urllib3                     1.26.18
uvicorn                     0.24.0.post1
uvloop                      0.19.0
vine                        5.1.0
watchdog                    3.0.0
watchfiles                  0.21.0
wcwidth                     0.2.12
websocket-client            1.7.0
websockets                  12.0
wheel                       0.42.0
yarl                        1.9.4

`pip freeze`

alembic==1.13.0
amqp==5.2.0
aniso8601==9.0.1
annotated-types==0.6.0
anyio==4.1.0
async-timeout==4.0.3
azure-core==1.29.5
azure-identity==1.15.0
azure-storage-blob==12.19.0
azure-storage-file-datalake==12.14.0
backoff==2.2.1
billiard==4.2.0
boto3==1.34.0
botocore==1.34.0
cachetools==5.3.2
celery==5.3.6
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
coloredlogs==14.0
croniter==2.0.1
cryptography==41.0.7
dagster==1.5.13
dagster-aws==0.21.13
dagster-azure==0.21.13
dagster-celery==0.21.13
dagster-celery-k8s==0.21.13
dagster-gcp==0.21.13
dagster-graphql==1.5.13
dagster-k8s==0.21.13
dagster-pandas==0.21.13
dagster-pipes==1.5.13
dagster-postgres==0.21.13
dagster-webserver==1.5.13
db-dtypes==1.2.0
docstring-parser==0.15
exceptiongroup==1.2.0
flower==2.0.1
fsspec==2023.12.2
google-api-core==2.15.0
google-api-python-client==2.111.0
google-auth==2.25.2
google-auth-httplib2==0.2.0
google-cloud-bigquery==3.14.1
google-cloud-core==2.4.1
google-cloud-storage==2.14.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
gql==3.4.1
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
greenlet==3.0.2
grpcio==1.60.0
grpcio-health-checking==1.60.0
h11==0.14.0
httplib2==0.22.0
httptools==0.6.1
humanfriendly==10.0
humanize==4.9.0
idna==3.6
isodate==0.6.1
Jinja2==3.1.2
jmespath==1.0.1
kombu==5.3.4
kubernetes==28.1.0
Mako==1.3.0
MarkupSafe==2.1.3
msal==1.26.0
msal-extensions==1.1.0
multidict==6.0.4
numpy==1.26.2
oauth2client==4.1.3
oauthlib==3.2.2
packaging==23.2
pandas==2.1.4
pendulum==2.1.2
portalocker==2.8.2
prometheus-client==0.19.0
prompt-toolkit==3.0.43
protobuf==4.25.1
psycopg2-binary==2.9.9
pyarrow==14.0.1
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.5.2
pydantic_core==2.14.5
PyJWT==2.8.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
pytzdata==2020.1
PyYAML==6.0.1
redis==5.0.1
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
rsa==4.9
s3transfer==0.9.0
six==1.16.0
sniffio==1.3.0
SQLAlchemy==2.0.23
starlette==0.33.0
tabulate==0.9.0
tomli==2.0.1
toposort==1.10
tornado==6.4
tqdm==4.66.1
typing_extensions==4.9.0
tzdata==2023.3
universal-pathlib==0.1.4
uritemplate==4.1.1
urllib3==1.26.18
uvicorn==0.24.0.post1
uvloop==0.19.0
vine==5.1.0
watchdog==3.0.0
watchfiles==0.21.0
wcwidth==0.2.12
websocket-client==1.7.0
websockets==12.0
yarl==1.9.4

alangenfeld · 2024-01-03T20:19:04Z

Thanks for following up, not much interesting in the dependency changes.

I spent some time with memray looking for leaks and have so far not been able to turn anything up.

Do you have anything like automated recurring queries against the webserver?

aaaaahaaaaa · 2024-01-04T09:01:53Z

Do you have anything like automated recurring queries against the webserver?

Well only the readinessProbe from your chart.

Turns out we actually still observe the same behaviour after rolling back to 1.5.12. So it's not related to the new version. I'm puzzled now. I'll try to investigate further and close the issue.

alangenfeld · 2024-01-04T15:37:14Z

I've had luck using this tool to get a memory profile of a running process https://github.com/facebookarchive/memory-analyzer and this https://github.com/kmaork/madbg for interactive poking around at the active process. I believe these both need SYS_PTRACE capabilities given on the k8s pod spec.

Given its a webserver its also susceptible to the "type 3" leaks described here https://blog.nelhage.com/post/three-kinds-of-leaks/ python allocator arena fragmentation, but the very smooth gradient of your graphs makes me skeptical thats the cause without some sort of recurring large query causing the fragmentation.

jvyoralek · 2024-01-19T08:29:50Z

@aaaaahaaaaa did you find any reason why memory started growing? We have a similar issue and switching between versions didn't help yet - tried from 1.5.14 to 1.5.12.

The memory increase is quite noticeable, showing up even in daily granularity.

This issue seems to be isolated to the webserver component. Both the daemon and code servers are exhibiting stable memory usage. We are operating these as three separate containers within AWS ECS.

We have only one scheduled job active, no sensors, auto-materialized so far. Assets are loaded from dbt.

aaaaahaaaaa · 2024-01-19T08:35:28Z

@jvyoralek No I didn't find the source of the problem and the issue is still occurring for us as well. Unfortunately I didn't have time to investigate further. I think there's clearly something up with the workload, we're not doing anything special either aside from deploying the helm chart.

salazarm · 2024-01-19T15:16:54Z

@alangenfeld found a memory leak that could be the cause of this, I'll let him comment but here is the PR that attempts to fix it #19298

alangenfeld · 2024-01-19T15:30:18Z

#19298 is a fix for a problem that manifests as very rapid unbounded memory growth resulting in process termination. I don't believe its related to this slower memory growth.

noam-jacobson · 2024-01-25T15:27:08Z

I appear to have a similar problem after upgrading to 1.6. I run Dagster on AWS ECS using Fargate. Hence I don't believe it is my jobs causing it since the code runs on a separate task. Both the Daemon and Dagit/Web server, services, are slowly creeping up. The drops in the following chart is due to restarts. Before the upgrade to 1.6 on the 11th this problem didn't exist.

alangenfeld · 2024-01-25T16:41:41Z

@noam-jacobson what version were you upgrading from?

noam-jacobson · 2024-01-25T17:13:14Z

@noam-jacobson what version were you upgrading from?

I was on version 1.5.10

jackwillisupside · 2024-01-30T15:48:54Z

@noam-jacobson We're having the same issue on ECS/Fargate on 1.5.7

will-regal-voice · 2024-01-30T16:34:19Z

We are also having the same issue on 1.6.0, also ECS/Fargate

gasgallo · 2024-02-02T07:41:11Z

Same here in our k8s deployment cluster. Any clue?

jackwillisupside · 2024-02-08T19:37:29Z

We think we might? have solved it on our end -- we didn't have a strict retention policy on logs set in our dagster.yml and once we set it to below our memory stopped growing:

retention:
  schedule:
    purge_after_days: 90 # sets retention policy for schedule ticks of all types
  sensor:
    purge_after_days:
      skipped: 7
      failure: 90
      success: 365

gasgallo · 2024-02-16T08:58:49Z

We think we might? have solved it on our end -- we didn't have a strict retention policy on logs set in our dagster.yml and once we set it to below our memory stopped growing:
retention:
  schedule:
    purge_after_days: 90 # sets retention policy for schedule ticks of all types
  sensor:
    purge_after_days:
      skipped: 7
      failure: 90
      success: 365

How did that impact your memory usage? Technically you'll still retain ticks for up to 365 days, thus you should not see a change in behavior in just a few days. Or did I miss something?

I've applied a similar setting on my deployment as well (way stricter than yours, for testing) and my memory is still going up, same as before.

alexknorr · 2024-02-23T19:21:45Z

Same problem here on Open-Shift with nearly same packages (dagster 1.6.5), also PostgreSQL and slim-buster images on both daemon and dagster-webserver (separate pods).
Tried with python 3.10, 3.11 and sqlalchemy<2.0 + >2.0, no luck so far, crashes every 3-4 days.
Currently trying with python 3.12, dagster 1.6.6 and slim-bookworm, will see more next days...

stasharrofi · 2024-02-28T19:53:34Z

EDIT: We found out that the following is actually not working. The initial indication might have just been a fluke.

We were having this issue and I believe that we have found the root cause to be a bug in anyio which leaked processes. The bug was introduced in 4.1.0 and fixed in 4.3.0 (last week): agronholm/anyio#669

Dagster has a dependency on anyio through the following chain: dagit --> dagster-webserver --> starlette --> anyio and I believe that this issue started to appear for people whenever they rebuilt their Dagster image during the time that bug was present because a newer but buggy version of anyio would have been included in their docker image.

~~So, the solution could be to either explicitly require anyio >= 4.3.0 or to wait until people rebuild their docker images and automatically get the bug-fixed version.~~

jvyoralek · 2024-03-01T13:17:58Z

Has anyone had success with the solution recommended by @stasharrofi ?

We have made changes, but it appears that the memory usage is still increasing.

I see anyio 4.3 in log

#12 1.757 Collecting dagster==1.6.6
#12 1.810   Downloading dagster-1.6.6-py3-none-any.whl (1.4 MB)
#12 1.852      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 36.1 MB/s eta 0:00:00
#12 2.037 Collecting dagster-aws==0.22.6
#12 2.042   Downloading dagster_aws-0.22.6-py3-none-any.whl (109 kB)
#12 2.048      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.8/109.8 kB 32.6 MB/s eta 0:00:00
#12 2.214 Collecting dagster-postgres==0.22.6
#12 2.219   Downloading dagster_postgres-0.22.6-py3-none-any.whl (20 kB)
#12 2.259 Collecting anyio==4.3.0
#12 2.263   Downloading anyio-4.3.0-py3-none-any.whl (85 kB)

noam-jacobson · 2024-03-01T14:14:37Z

@jvyoralek It hasn't worked for me. Deployed the newest Dagster version 1.6.6 with anyio-4.3.0.

stasharrofi · 2024-03-01T14:21:01Z

@jvyoralek : No, we found out that it's not working for us either. The initial indication that it was working was probably just a fluke.

shivonchain · 2024-04-02T15:25:27Z

Same issue here with an ECS deployment, packages and versions included below

dagster==1.6.10
dagster-graphql==1.6.10
dagster-webserver==1.6.10
dagster-postgres==0.22.10
dagster-docker==0.22.10

jobicarter · 2024-04-08T19:15:40Z

My team experienced this issue in an OSS ECS deployment after an upgrade from 1.5.9 -> 1.6.8. It impacted the dagit/webserver and daemon services, but not independent grpc/code location services. It presented as a slow leak that would increase memory utilization over a week or so until hitting critical thresholds / crashing the service, with 1gb memory allocated to services.

We "resolved" the issue in our environments by downgrading and pinning the grpcio python package to 1.57.0.

In incremental tests we downgraded our docker image base to the image version/sha we used for our 1.5.9 deployment, reverted dagster packages from 1.6.8 back to 1.5.9, and updated python from 3.10 -> 3.11. None of these changes resolved the memory leak.

Sharing this context as it supports root cause being related to an unpinned package dependency, and not necessarily an issue with the core dagster packages. It also ruled out interaction with OS libs/OS version causing the leak.

We selected grpcio 1.57.0 because it was the version of the dep that was solved for at the time when we originally deployed 1.5.9. It's possible a more recent version would work as well.

jvyoralek · 2024-04-10T09:31:17Z

Thank you, @jobicarter, for the effective workaround. We deployed it yesterday, and although it's only been a short time, we're already seeing promising changes.

Tested with these versions:

dagster==1.7.0
dagster-webserver==1.7.0
dagster-graphql==1.7.0
dagster-aws==0.23.0
dagster-postgres==0.23.0
grpcio==1.57.0

csomh · 2024-04-18T20:16:57Z

I can confirm that downgrading grpcio to 1.57.0 stops the leak.

dagster==1.5.14
dagster-aws==0.21.14
dagster-azure==0.21.14
dagster-celery==0.21.14
dagster-celery-k8s==0.21.14
dagster-gcp==0.21.14
dagster-graphql==1.5.14
dagster-k8s==0.21.14
dagster-pandas==0.21.14
dagster-pipes==1.5.14
dagster-postgres==0.21.14
dagster-webserver==1.5.14
grpcio==1.57.0
grpcio-health-checking==1.57.0

We also did try to upgrade it to 1.62.1, but that didn't seem to work.

G14rb · 2024-04-19T11:42:20Z

Thanks for the solution, I think this could be related to the dagster issue, grpc/grpc#36117

p-y-t-h-e-c · 2024-05-14T12:51:34Z

Hi All, Having similar issue with the Dagster Docker deployment to Oracle VM. Unfortunately downgrading grpcio to 1.57.0 version hasn't resolved the issue. Currently using following setup for the Dagster image.

VM seems to get to OOM state circa every 8hrs now.

rensoostenbachBL · 2024-08-02T09:47:44Z

We are running into the same issue on our Kubernetes cluster, having installed Dagster via the Helm chart.

Is the solution to downgrade grpcio for the dagster-webserver pod? In that case, we should build a custom Dockerfile that changes the dependencies and point to that Dockerfile in the Helm chart right?

I don't understand why Dagster hasn't pinned the grpcio version themselves to prevent this issue from happening, it seems a little strange that they are expecting users to either live with the memory leak, or manually fix the dependencies themselves.

JanEgner · 2024-09-03T18:36:04Z

Just to add my 2 cents': running dagster 1.7.16/dbt/dagster-webserver all in one k8s pod.

I admit that it is somewhat inconclusive since some memory increase (but also a kind of garbage collection releasing much of the extra memory at a point) was visible before the last restart while using grpcio 1.57.0. Still, overall it looks way better than with grpcio 1.60.

It seems to be a workaround for now, but with at least two drawbacks (other than using an outdated component at all):

grpcio 1.57.0 does not support python 3.12
grpcio 1.57.0 has at least one known vulnerability (CVE-2024-7246) that might or might not affect you, depending on your setup.

bolinzzz · 2024-10-14T10:51:12Z

We started noticing memory leaks in certain code locations after upgrading to Dagster 1.8. Could grpcio potentially be contributing to these leaks?

We're still investigating, but I’d like to rule out this possibility.

aaaaahaaaaa added the type: bug Something isn't working label Jan 3, 2024

aaaaahaaaaa changed the title ~~dagster-webserver 1.5.13 memory leak~~ dagster-webserver memory leak Feb 9, 2024

garethbrickman mentioned this issue Apr 19, 2024

Possible memory leak on very long running dagster-webserver/Daemon containers #21307

Closed

louis-jaris mentioned this issue May 8, 2024

Memory leaks occur when the grpc client connection fails grpc/grpc#36117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`dagster-webserver` memory leak #18997

`dagster-webserver` memory leak #18997

aaaaahaaaaa commented Jan 3, 2024

alangenfeld commented Jan 3, 2024 •

edited

Loading

aaaaahaaaaa commented Jan 3, 2024

alangenfeld commented Jan 3, 2024

aaaaahaaaaa commented Jan 4, 2024

alangenfeld commented Jan 4, 2024 •

edited

Loading

jvyoralek commented Jan 19, 2024

aaaaahaaaaa commented Jan 19, 2024

salazarm commented Jan 19, 2024

alangenfeld commented Jan 19, 2024

noam-jacobson commented Jan 25, 2024 •

edited

Loading

alangenfeld commented Jan 25, 2024

noam-jacobson commented Jan 25, 2024

jackwillisupside commented Jan 30, 2024

will-regal-voice commented Jan 30, 2024

gasgallo commented Feb 2, 2024

jackwillisupside commented Feb 8, 2024

gasgallo commented Feb 16, 2024 •

edited

Loading

alexknorr commented Feb 23, 2024

stasharrofi commented Feb 28, 2024 •

edited

Loading

jvyoralek commented Mar 1, 2024

noam-jacobson commented Mar 1, 2024

stasharrofi commented Mar 1, 2024

shivonchain commented Apr 2, 2024

jobicarter commented Apr 8, 2024

jvyoralek commented Apr 10, 2024 •

edited

Loading

csomh commented Apr 18, 2024

G14rb commented Apr 19, 2024 •

edited by garethbrickman

Loading

p-y-t-h-e-c commented May 14, 2024 •

edited

Loading

rensoostenbachBL commented Aug 2, 2024 •

edited

Loading

JanEgner commented Sep 3, 2024

bolinzzz commented Oct 14, 2024

dagster-webserver memory leak #18997

dagster-webserver memory leak #18997

Comments

aaaaahaaaaa commented Jan 3, 2024

Dagster version

What's the issue?

What did you expect to happen?

How to reproduce?

Deployment type

Deployment details

Additional information

Message from the maintainers

alangenfeld commented Jan 3, 2024 • edited Loading

aaaaahaaaaa commented Jan 3, 2024

1.5.12

pip list

pip freeze

1.5.13

pip list

pip freeze

alangenfeld commented Jan 3, 2024

aaaaahaaaaa commented Jan 4, 2024

alangenfeld commented Jan 4, 2024 • edited Loading

jvyoralek commented Jan 19, 2024

aaaaahaaaaa commented Jan 19, 2024

salazarm commented Jan 19, 2024

alangenfeld commented Jan 19, 2024

noam-jacobson commented Jan 25, 2024 • edited Loading

alangenfeld commented Jan 25, 2024

noam-jacobson commented Jan 25, 2024

jackwillisupside commented Jan 30, 2024

will-regal-voice commented Jan 30, 2024

gasgallo commented Feb 2, 2024

jackwillisupside commented Feb 8, 2024

gasgallo commented Feb 16, 2024 • edited Loading

alexknorr commented Feb 23, 2024

stasharrofi commented Feb 28, 2024 • edited Loading

jvyoralek commented Mar 1, 2024

noam-jacobson commented Mar 1, 2024

stasharrofi commented Mar 1, 2024

shivonchain commented Apr 2, 2024

jobicarter commented Apr 8, 2024

jvyoralek commented Apr 10, 2024 • edited Loading

csomh commented Apr 18, 2024

G14rb commented Apr 19, 2024 • edited by garethbrickman Loading

p-y-t-h-e-c commented May 14, 2024 • edited Loading

rensoostenbachBL commented Aug 2, 2024 • edited Loading

JanEgner commented Sep 3, 2024

bolinzzz commented Oct 14, 2024

`dagster-webserver` memory leak #18997

`dagster-webserver` memory leak #18997

alangenfeld commented Jan 3, 2024 •

edited

Loading

`1.5.12`

`pip list`

`pip freeze`

`1.5.13`

`pip list`

`pip freeze`

alangenfeld commented Jan 4, 2024 •

edited

Loading

noam-jacobson commented Jan 25, 2024 •

edited

Loading

gasgallo commented Feb 16, 2024 •

edited

Loading

stasharrofi commented Feb 28, 2024 •

edited

Loading

jvyoralek commented Apr 10, 2024 •

edited

Loading

G14rb commented Apr 19, 2024 •

edited by garethbrickman

Loading

p-y-t-h-e-c commented May 14, 2024 •

edited

Loading

rensoostenbachBL commented Aug 2, 2024 •

edited

Loading