Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2520] add UDF/UDAF in BeamSql.query/simpleQuery #3491

Closed
wants to merge 2 commits into from

Conversation

mingmxu
Copy link

@mingmxu mingmxu commented Jul 3, 2017

No description provided.

@mingmxu
Copy link
Author

mingmxu commented Jul 4, 2017

R: @xumingming @lukecwik
CC: @takidau @jbonofre

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 76cce07 on XuMingmin:BEAM-2520 into ** on apache:DSL_SQL**.

return new QueryTransform(sqlQuery);

public static QueryTransform query(String sqlQuery) {
return new AutoValue_BeamSql_QueryTransform.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new AutoValue_BeamSql_QueryTransform.Builder() => new QueryTransform.builder()?

simpleQuery(String sqlQuery) throws Exception {
return new SimpleQueryTransform(sqlQuery);
public static SimpleQueryTransform simpleQuery(String sqlQuery) throws Exception {
return new AutoValue_BeamSql_SimpleQueryTransform.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new AutoValue_BeamSql_SimpleQueryTransform.Builder() => SimpleQueryTransform.builder()?

Copy link
Contributor

@xumingming xumingming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (except two small issues.)

@mingmxu
Copy link
Author

mingmxu commented Jul 5, 2017

Good point, however seems the method of builder()/toBuiler() was not accurate.

@xumingming
Copy link
Contributor

Seems different from auto-value's usage convention: https://github.com/google/auto/blob/master/value/userguide/builders.md

@xumingming
Copy link
Contributor

I quote:

public void testAnimal() {
  Animal dog = Animal.builder().setName("dog").setNumberOfLegs(4).build();
  assertEquals("dog", dog.name());
  assertEquals(4, dog.numberOfLegs());

  // You probably don't need to write assertions like these; just illustrating.
  assertTrue(
      Animal.builder().setName("dog").setNumberOfLegs(4).build().equals(dog));
  assertFalse(
      Animal.builder().setName("cat").setNumberOfLegs(4).build().equals(dog));
  assertFalse(
      Animal.builder().setName("dog").setNumberOfLegs(2).build().equals(dog));

  assertEquals("Animal{name=dog, numberOfLegs=4}", dog.toString());
}

@mingmxu
Copy link
Author

mingmxu commented Jul 5, 2017

--update withUdf/withUdaf methods;

interesting, the code of TextIO/KafkaIO is different from the example. I suppose abstract Builder toBuilder() is better here, to avoid return this in function withUdf/withUdaf.

@lukecwik any comments?

@xumingming
Copy link
Contributor

Indeed interesting :)

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2455dc4 on XuMingmin:BEAM-2520 into ** on apache:DSL_SQL**.

@mingmxu
Copy link
Author

mingmxu commented Jul 10, 2017

@takidau any suggestion which one is prefered in AutoValue?

static Builder builder() {
  return new AutoValue_BeamSql_QueryTransform.Builder();
}

OR

abstract Builder toBuilder();

@takidau
Copy link
Contributor

takidau commented Jul 12, 2017

Re builder vs toBuilder, perhaps I'm misunderstanding the question, but they serve different purposes, don't they? I do prefer defining a static builder() method to directly invoking the generated AutoValue_BeamSql_SimpleQueryTransform.Builder() constructor (thought we seem to have both patterns in the Beam codebase currently).

return PCollectionTuple.of(new TupleTag<BeamSqlRow>(PCOLLECTION_TABLE_NAME), input)
.apply(new QueryTransform(sqlQuery, sqlEnv));
.apply(new AutoValue_BeamSql_QueryTransform.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be reduced to .apply(toBuilder().build()) as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this comment, since it wasn't addressed or answered. Can't this be simplified?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this line, just update.

@mingmxu
Copy link
Author

mingmxu commented Jul 12, 2017

Thanks @takidau for your advise, updated builder and rebase to latest repository.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2284cde on XuMingmin:BEAM-2520 into ** on apache:DSL_SQL**.

/**
* register a UDF function used in this query.
*/
public QueryTransform withUdf(String functionName, Class<?> clazz, String methodName){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to support having arbitrarily named (and potentially multiple) UDF methods on a single class? Why not just have a simply BeamSqlUdf interface you must implement? And why not accept a lambda?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is it necessary to support having arbitrarily named (and potentially multiple) UDF methods on a single class?
    [Mingmin]: I would like to keep the capacity to put similar or related methods together.

  2. Why not just have a simply BeamSqlUdf interface you must implement?
    [Mingmin]: That's how we do with UDAF, UDF is a little different as it may have variable, even optional parameters. Annotation (like DoFn) sounds a solution but the existing definition is more simple IMO.

  3. And why not accept a lambda?
    [Mingmin]: I remember Beam is still on JDK 7, --sorry that I may miss something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Okay
  2. We may want to revisit this when we do a holistic API review before merge to master, but fine for now then.
  3. That's true, but for Java 8 users it'd be nice to have an overload supporting a lambda. Not necessary for this PR though.

@takidau
Copy link
Contributor

takidau commented Jul 12, 2017

Looks like there are build issues, can you please fix?

Also, regarding the change in SimpleQueryTransform.expand, my question was whether toBuilder().build() could be used in place of manually constructing a copy of this. It's both shorter, and more maintainable since the AutoValue generator will keep track of making sure new fields are also copied as the AutoValue object evolves over time.

mingmxu added 2 commits July 12, 2017 13:38
update AutoValue builder()
update with methods

update Builder/build in AutoValue

rebase

simplify AutoValue
@mingmxu
Copy link
Author

mingmxu commented Jul 12, 2017

rebased to fix the compile issue.

toBuilder().build() doesn't work here, as SimpleQueryTransform.expand returns QueryTransform, they've different interfaces.

Regarding to the interface of UDF, your concern is reasonable, created BEAM-2609 for further discussion.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2e0abdd on XuMingmin:BEAM-2520 into ** on apache:DSL_SQL**.

@takidau
Copy link
Contributor

takidau commented Jul 12, 2017

Thanks for the explanation on the toBuilder question, and for opening the Jira. Merging.

asfgit pushed a commit that referenced this pull request Jul 12, 2017
@takidau
Copy link
Contributor

takidau commented Jul 12, 2017

Merged, feel free to close.

@mingmxu mingmxu closed this Jul 12, 2017
@mingmxu mingmxu deleted the BEAM-2520 branch July 12, 2017 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants