Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-3545] save all tables to ResourcePool #3024

Closed
wants to merge 1 commit into from

Conversation

Savalek
Copy link
Contributor

@Savalek Savalek commented Jun 15, 2018

What is this PR for?

Now if paragraph's output contains more than one table in ResourcePool saves only last table.
It would be desirable that in ResoursePool stores all tables.

What type of PR is it?

Improvement

What is the Jira issue?

ZEPPELIN-3545

Screenshots

target p
result ps

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? no

@zjffdu
Copy link
Contributor

zjffdu commented Jun 21, 2018

Thanks @Savalek for this contribution, but I think putting all tables into ResourcePool doesn't make sense. As it would occupy lots of memory. I plan to introduce paragraph level properties (ZEPPELIN-3348), so that user can control whether to put the interpreter result into ResourcePool.

@mebelousov
Copy link
Contributor

mebelousov commented Jun 21, 2018

@zjffdu thank you about ResourcePool improving.

Share please your vision how it would be. For example, a paragraph has 5 table results. How user will define which of them would be added into ResourcePool?

@zjffdu
Copy link
Contributor

zjffdu commented Jun 21, 2018

I plan to introduce one paragraph property to indicate whether the result should be put into ResourcePool (Because I think most of time people don't want to save it into ResourcePool, so it doesn't make sense to save it into ResourcePool by default). The following is what I imagine.

%spark(saveToResourcePool=true)

...
spark code
...

Regarding your scenario of multiple tables, I am not sure the exact scenario, But at least we could introduce more fine grained properties to control that. It would be better to share your real scenario, so that we can see which approach is better.

@mebelousov
Copy link
Contributor

@zjffdu
I support adding only selected table results to Resource Pool.
As paragraph can have multiple results than I propose to add result level properties.

@zjffdu
Copy link
Contributor

zjffdu commented Jun 26, 2018

@mebelousov There're many options for how to specify which result to be stored into resource pool.
e.g.

%spark(saveToResourcePool=1,2,4)

Or

%spark(1.saveToResourcePool=true, 2.saveToResourcePool=true, 4.saveToResourcePool=true)

We can discuss more about which is the best approach, the key point here is to allow user to customize it via paragraph levle properties.

@Savalek Savalek closed this Jun 27, 2018
@Savalek Savalek deleted the ZEPPELIN-3545 branch January 11, 2019 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants