Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-2098 - Udf that given a number bulk reserves sequences. #98

Closed
wants to merge 5 commits into from
Closed

PHOENIX-2098 - Udf that given a number bulk reserves sequences. #98

wants to merge 5 commits into from

Conversation

siddhimehta
Copy link
Contributor

No description provided.

@siddhimehta
Copy link
Contributor Author

@JamesRTaylor Pull request for PHOENIX-2098

* field in the tuple zkquorum is the third field in the tuple
*/
@Override
public Long exec(Tuple input) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDF that taken in tuple with number,sequenceName,ZkEndpoint and bulk reserves the sequence

@JamesRTaylor
Copy link
Contributor

Looks very good, @siddhimehta. Thanks for the quick turnaround. Couple of minor nits and then I'll commit it on your behalf.

/**
* UDF to Reserve a chunk of numbers for a given sequence
*
* @note The way this UDF is invoked we open a new connection for every tuple row. The UDF will not perform well on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert at Pig or its UDF's.
Isnt there anyway to pass Connection object in UDF rather than creating it for every row? If you cant pass Connection, can you make Connection an instance variable of UDF class?
Is creating new connection only reason for its non-scalability?
@JamesRTaylor Do you think, this non-scalability might lead to bad user experience in production clusters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prashant would know if that's possible, I suspect, but I'm not sure (as you're definitely more of a Pig expert than me). @mravi may know too. It's not too bad from a Phoenix perspective, as we cache the underlying HConnection, so making a new connection (on the same JVM) is cheap.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i c.. In that case, this UDF is scalable? Or It is non-scalable due to some other reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be alright, but might be able to be improved. As always, we'll want to perf test this and the rest of the Phoenix/Pig loader integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesRTaylor I corrected the nits.
@anilgupta84 I can investigate more to see if its possible to pass in the connection object to the UDF. If we make can make Connection an instance variable of UDF class how do we ensure that the connection is correctly released/closed after all the rows.

@stoty
Copy link
Contributor

stoty commented Aug 1, 2023

This has already been merged.

@stoty stoty closed this Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants