Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2676] move BeamSqlRow and BeamSqlRowType to sdk/java/core #3633

Closed
wants to merge 4 commits into from

Conversation

mingmxu
Copy link

@mingmxu mingmxu commented Jul 24, 2017

It contains two parts:

  1. remove SQL word in the name,
    BeamSqlRow --> BeamRow
    BeamSqlRowType --> BeamRowType

  2. move from package org.apache.beam.dsls.sql.schema to org.apache.beam.sdk.sd (sd stands for structure data), in module beam-sdks-java-core

Hint:
The 4 files are changed to remove dependencies on Calcite/BeamSql, others are updated automate by IDE due to class name change:

@mingmxu
Copy link
Author

mingmxu commented Jul 24, 2017

R: @robertwb

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a couple of questions.

* <p>Similar as the <em>row</em> concept in database, {@link org.apache.beam.sdk.sd.BeamRow}
* represents one row element in a {@link org.apache.beam.sdk.values.PCollection<BeamRow>}.
* Limited SQL types are supported now, visit
* <a href="https://beam.apache.org/blog/2017/07/21/sql-dsl.html#data-type">data types</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this blog post is pending? Will the date embedded in the URL change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL will not change as it's determined as below:

+---
+layout: post
+title:  "Use Beam SQL DSL to build a pipeline"
+date:   2017-07-21 00:00:00 -0800
+excerpt_separator: <!--more-->
+categories: blog
+authors:
+  - mingmxu
+---

* <a href="https://beam.apache.org/blog/2017/07/21/sql-dsl.html#data-type">data types</a>
* for more details.
*/
package org.apache.beam.sdk.sd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "sd" stands for "structured data" or similar?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sd stands for structure data

public abstract List<String> getFieldsName();
public abstract List<Integer> getFieldsType();

public static BeamSqlRowType create(List<String> fieldNames, List<Integer> fieldTypes) {
return new AutoValue_BeamSqlRowType(fieldNames, fieldTypes);
public static BeamRowType create(List<String> fieldNames, List<Integer> fieldTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface (taking a list of names, then a parallel list of types) seems fragile and prone to off-by-one errors. If columns are always referred to by name (not index) in the API, could this just be a Map? Otherwise, some kind of an OrderedMap (or list of pairs)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, it doesn't verify that size of names and types matches. I'll open another task to address it. --A line like fieldNames.size() == fieldTypes.size() is enough I think.

--Columns in BeamRow are accessed both by name and index.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create BEAM-2678

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would help, but it's still messy to have the relationship fall to simply having the same index in a parallel list. I'll comment on the bug, as this is somewhat orthogonal to this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, a List<KV<FieldName, Types>> may be better.


import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.sql.Types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this as safe dependency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, java.sql.Types is part of JDK.

@mingmxu
Copy link
Author

mingmxu commented Jul 25, 2017

retest this please

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests passing.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c33150d on XuMingmin:BEAM-2676 into ** on apache:DSL_SQL**.

@mingmxu
Copy link
Author

mingmxu commented Jul 26, 2017

@robertwb could you merge this task?

Also #3628 as I want to use CombineFnTester to test an aggregation function in Beam SQL.

@mingmxu
Copy link
Author

mingmxu commented Jul 26, 2017

R: + @lukecwik

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 5e65d6f on XuMingmin:BEAM-2676 into ** on apache:DSL_SQL**.

@mingmxu
Copy link
Author

mingmxu commented Jul 27, 2017

move back to DSL/SQL as there're some concerns that this is SQL-specific, not a generic Tuple solution.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling b56234d on XuMingmin:BEAM-2676 into ** on apache:DSL_SQL**.

@mingmxu
Copy link
Author

mingmxu commented Jul 27, 2017

retest this please

1 similar comment
@mingmxu
Copy link
Author

mingmxu commented Jul 28, 2017

retest this please

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling b56234d on XuMingmin:BEAM-2676 into ** on apache:DSL_SQL**.

@robertwb
Copy link
Contributor

Could you point me to where these concerns were discussed?

@mingmxu
Copy link
Author

mingmxu commented Jul 31, 2017 via email

@mingmxu
Copy link
Author

mingmxu commented Aug 2, 2017

close this task, and prepare a new PR to avoid the huge rebase work.

@mingmxu mingmxu closed this Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants