Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced resource pool serialization #782

Closed
wants to merge 8 commits into from

Conversation

fireboy1919
Copy link
Contributor

What is this PR for?

This is a step towards creating a REST endpoint from resource pools https://issues.apache.org/jira/browse/ZEPPELIN-699

It allows resources to be fully serializable so that the object that they contain can be passed around using thrift.

What type of PR is it?

Improvement

Todos

  • - Add serialization for rest endpoint

What is the Jira issue?

Partially implements this: https://issues.apache.org/jira/browse/ZEPPELIN-699

Note, that this pull request is dependent upon this one:
#781

Before that change is made, this will not compile.

How should this be tested?

There is a test class "org.apache.zeppelin.ResourceSerializerTest", but with no front-end, there isn't an easy manual way to test this.

However, these steps will work:

  1. Start Zeppelin
  2. Run a paragraph.
  3. Inspect "org.apache.zeppelin.resource.ResourcePoolUtils.getAllResources()". You will see that this paragraph's results are included in this resource pool.

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No.

@fireboy1919 fireboy1919 changed the title Persist pool Enhanced resource pool serialization Mar 16, 2016
@fireboy1919
Copy link
Contributor Author

@Leemoonsoo: This is ready to go, and is based upon your previous work.

@felixcheung
Copy link
Member

@fireboy1919 are the ToDos list of items completed?

@fireboy1919
Copy link
Contributor Author

I was going to make a different request for each of those. I have one ready for the second item, too, but i figured that smaller was better.

Should I change the PR or the description of it?

@felixcheung
Copy link
Member

It might help to break this up into smaller PR to help review and getting it merged.

@fireboy1919
Copy link
Contributor Author

That's why I only have the first TODO done: it's the smallest case I could think of.

It's only the serializer and the place where it gets used and nothing else.

@fireboy1919
Copy link
Contributor Author

I have finished work on the other three tasks on the list that I mentioned above - which are not included to make the request easier to review.

I have also removed them so that we can see that this pull request is completely done.

@felixcheung
Copy link
Member

looks good, @Leemoonsoo what do you think?

@Leemoonsoo
Copy link
Member

ResourcePool supposed to allow any resource. From not serializable object such as SparkContext, to very large object. Object r field in class Resource holds user object.

This PR makes Object r serialized when class Resource is serialized. I think it will brings following problems,

  • DistributedResourcePool.getAll() method returns list of Resource of all remote interpreters. And this change will serialize all user object pointed by Object r, regardless of user want to access or not.
  • It'll make some error on DistributedResourcePool.getAll() when user put some unserializable object.

Current implementation supposed to read user object from remote process only when Resource.get() method is called explicitly. please see https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/resource/DistributedResourcePool.java#L72

I think field Object r in the class Resource remains as transitive and shouldn't be serialized with class Resource for efficiency.

@fireboy1919
Copy link
Contributor Author

That makes sense. The issue is that "ResourcePoolUtils" currently doesn't use that enhanced method, and if you're trying to get a specific resource, that looks like the proper path to do it.

How about I create a pull request that modified "ResourcePoolUtils.getAllResourcesExcept" so that the returned resources from the remote interpreter process are actually remote resources?

@Leemoonsoo
Copy link
Member

Thanks for the understanding.

I think term 'resource', 'remote resource' are bit confusing in this conversation. if i define,

Resource - Represents a single resource where user object exists in current interpreter process Resource.java
RemoteResource - Represents a single resource where user object exists in the other interpreter process RemoteResource.java
User object - user object that Resource holds. Internally field Object r

Then, can i understand your last sentence as

Modify "ResourcePoolUtils.getAllResourcesExcept" to return set of User objects instead of set of Resources?

or

Modify "ResourcePoolUtils.getAllResourceExcept" to return set of RemoteResource instead of set of Resources?

which one is correct?

@fireboy1919
Copy link
Contributor Author

The second one.

And I have a solution that does that in this pull request now.

I didn't remove the interpreter serializer from this pull request because we may still want something like that to save resources to persistent storage. Since User objects are to be transient, we'd need something like that to explicitly make them not transient for when we want to store them.

I therefore modified the serializer to that purpose.

@fireboy1919
Copy link
Contributor Author

@Leemoonsoo: does this address everything?

This was referenced Apr 5, 2016
This was referenced Apr 18, 2016
@asfgit asfgit closed this in b11d355 May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants