-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-2098 - Udf that given a number bulk reserves sequences. #98
Conversation
@JamesRTaylor Pull request for PHOENIX-2098 |
* field in the tuple zkquorum is the third field in the tuple | ||
*/ | ||
@Override | ||
public Long exec(Tuple input) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UDF that taken in tuple with number,sequenceName,ZkEndpoint and bulk reserves the sequence
Looks very good, @siddhimehta. Thanks for the quick turnaround. Couple of minor nits and then I'll commit it on your behalf. |
/** | ||
* UDF to Reserve a chunk of numbers for a given sequence | ||
* | ||
* @note The way this UDF is invoked we open a new connection for every tuple row. The UDF will not perform well on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert at Pig or its UDF's.
Isnt there anyway to pass Connection object in UDF rather than creating it for every row? If you cant pass Connection, can you make Connection an instance variable of UDF class?
Is creating new connection only reason for its non-scalability?
@JamesRTaylor Do you think, this non-scalability might lead to bad user experience in production clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prashant would know if that's possible, I suspect, but I'm not sure (as you're definitely more of a Pig expert than me). @mravi may know too. It's not too bad from a Phoenix perspective, as we cache the underlying HConnection, so making a new connection (on the same JVM) is cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c.. In that case, this UDF is scalable? Or It is non-scalable due to some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be alright, but might be able to be improved. As always, we'll want to perf test this and the rest of the Phoenix/Pig loader integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesRTaylor I corrected the nits.
@anilgupta84 I can investigate more to see if its possible to pass in the connection object to the UDF. If we make can make Connection an instance variable of UDF class how do we ensure that the connection is correctly released/closed after all the rows.
… Changing it to use the connection created on initialization code
This has already been merged. |
No description provided.