Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-7521] Add config option to set the content length limit of REST server and client #4654

Closed
wants to merge 1 commit into from

Conversation

FangYongs
Copy link
Contributor

Brief change log

Add config option to set the content length limit of REST server and client

Verifying this change

(Please pick either of the following options)

Add config option to set the content length, no test case

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@FangYongs
Copy link
Contributor Author

@kl0u I have tried to fix https://issues.apache.org/jira/browse/FLINK-7521 in this PR, could you please have a look when you're free, thanks

@kl0u
Copy link
Contributor

kl0u commented Sep 7, 2017

Hi @zjureel ! Thanks for the work!

This can be a temporary fix, but I was thinking more of a long term one where there is no limit.
The problems that I can find with such temporary solution are:

  1. if we add this as a configuration parameter, and then remove it when we have a proper fix, this may result in confusion for the users.

  2. with this fix, the aggregator is going to allocate the specified limit every time there is an element to parse, even though the element may be small. This can be a waste of resources.

I am also including @zentol in the discussion as he is currently working on the REST code. What do you think @zentol ?

@zentol
Copy link
Contributor

zentol commented Sep 7, 2017

This doesn't solve the problem of the JIRA. For the job submission we have to send jars to the cluster, which may be multiple 100mb large; we obviously can't allocate that much memory in general.

As @kl0u said, setting this value any higher (it is already very high) will cause insane memory consumption in the object aggregator in any case where concurrent requests are in progress.

As such this isn't even a temporary fix, as this is not a viable fix for the use-cases that are affected by the limit, so I'm afraid we'll have to reject this PR.

@FangYongs
Copy link
Contributor Author

@kl0u @zentol Thank you for your suggestions. So remove the code .addLast(new HttpObjectAggregator(100*1024*1024) will be a temporary fix?

In fact it surprises me that the client sends the jar file of job to the rest server instead of block file system, such as hdfs. As @zentol said, the jars may be multiple 100mb large and we obviously can't allocate that much memory in general, I think the client upload the jars to a block file system may be a better way, or we should find a way to create a stream service in the rest server and save the input stream of jar as soon as possible instead of aggregate it in the memory. What do you think? thanks

@zentol
Copy link
Contributor

zentol commented Sep 11, 2017

removing the object aggregator will break the remaining pipeline, so that option is out too.

As for the client upload, that's exactly whats being done right now. The jar is uploaded to the blob server, and only the blob key is transmitted via REST. The problem here is that this also requires any client to be written in java and rely on Flink, whereas we would prefer if any client, written in any language, can communicate with the REST APIs.

@zentol
Copy link
Contributor

zentol commented Sep 11, 2017

The actual solution that I'm thinking of currently is to have a modified HttpObjectAggregator implementation that spills larger messages to disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants