Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory profiling for users of Portable Beam Python #20298

Open
damccorm opened this issue Jun 4, 2022 · 4 comments
Open

Improve memory profiling for users of Portable Beam Python #20298

damccorm opened this issue Jun 4, 2022 · 4 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

There are several tools one can use to investigate memory leaks in Beam Python, but they are not straightforward to use, especially for people who don't work on Beam.

  • If you set the --experiments=enable_heap_dump option, heap dumps will be appended to the SDK status responses, which SDK can provide to the runner. Dataflow workers serve the SDK status page on localhost:8081/sdk_status, and can be queried via: gcloud compute ssh --zone "xx-somezone-z" "some-dataflow-gce-worker-01300848-wqox-harness-bvf7" --project "some-project-id" --command "curl localhost:8081/sdk_status" .

  • The per-workitem heap profiling options

    parser.add_argument(
    '--profile_memory',
    action='store_true',
    help='Enable work item heap profiling.')
    parser.add_argument(
    '--profile_location',
    default=None,
    help='path for saving profiler data.')
    parser.add_argument(
    '--profile_sample_rate',
    type=float,
    default=1.0,
    help='A number between 0 and 1 indicating the ratio '
    'of bundles that should be profiled.')
    could be used to inspect the objects that are left in the heap after a bundle execution.

Attaching off-the-shelf profiler is possible but requires instrumentation, and fetching profiles and analyzing is not convenient, example: #28246 (comment)

We should see whether we can instrument Beam to make profile collection easier, both for leaks in pure Python as well as leaks in native code that can be caught. We should make it possible to easily integrate beam with external profiler like memray

Ideally, it should also be possible to export memory profiles to a cloud profiler with as little effort from the user as possible.

@blazingbhavneek

This comment was marked as outdated.

@damccorm

This comment was marked as outdated.

@blazingbhavneek

This comment was marked as off-topic.

@RhysJohnLewis

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants