Skip to content

Conversation

@TJX2014
Copy link

@TJX2014 TJX2014 commented Dec 3, 2024

What is the purpose of the change

Fix memory lease in flink-s3-fs-hadoop

From https://issues.apache.org/jira/browse/HADOOP-13028

introduce org.apache.hadoop.fs.s3a.S3AInstrumentation, it will be created every time org.apache.hadoop.fs.s3a.S3AFileSystem#initialize

and be registerd to static field meticsystem by org.apache.hadoop.fs.s3a.S3AInstrumentation#registerAsMetricsSource,but no one call
metricsSystem.unregisterSource(metricsSourceName); during runtime, it will cause memory lease when use filesystem plugin to write s3 storage

Brief change log

It is hard to downgrade hadoop version to 2.8 prior
So the simple way is to
overwrite S3AInstrumentation of hadoop-aws not to register anymore

Verifying this change

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): on
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: yes

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 3, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@davidradl
Copy link
Contributor

Reviewed by Chi on 05/12/24. Asked submitter questions

* limitations under the License.
*/

package org.apache.hadoop.fs.s3a;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be a new class that is not called. Also this does not have any unit tests.
This seems to be a fixup in Hadoop - is this the appropriate repo to make this fix?
Is there an upstream repo to fix this and then for us to pick up the change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it just overwrite the S3AInstrumentation class of hadoop-aws, and the memory use I think is bad usage of flink, not to close FileSystem or unregister S3A metrics, the arch of flink not to close filesystem but only the output stream;
So the upper hadoop common seems not need to correct, there are two simple chooses for once, one is degress hadoop-aws to prior 2.8, or overwrite S3A.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flink should close() the fs and provided you have a hadoop version with HADOOP-18526 in then it'll call close() on the instrumentation which will do the unregistering.

If flink isn't closing filesystems then it is leaking a lot more than just the metrics...I'm thinking of thread pools in particular.

Copy link
Contributor

@davidradl davidradl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be a new class that is not called. Also this does not have any unit tests.
This seems to be a fixup in Hadoop - is this the appropriate repo to make this fix?
Is there an upstream repo to fix this and then for us to pick up the change.

@afedulov
Copy link
Contributor

afedulov commented Jan 7, 2025

This looks like something that needs to be fixed in upstream Hadoop. Closing unless there are further details provided why Flink has to contain this class as a fix.

@afedulov afedulov closed this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants