-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-2676] move BeamSqlRow and BeamSqlRowType to sdk/java/core #3633
Conversation
R: @robertwb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just a couple of questions.
* <p>Similar as the <em>row</em> concept in database, {@link org.apache.beam.sdk.sd.BeamRow} | ||
* represents one row element in a {@link org.apache.beam.sdk.values.PCollection<BeamRow>}. | ||
* Limited SQL types are supported now, visit | ||
* <a href="https://beam.apache.org/blog/2017/07/21/sql-dsl.html#data-type">data types</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this blog post is pending? Will the date embedded in the URL change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL will not change as it's determined as below:
+---
+layout: post
+title: "Use Beam SQL DSL to build a pipeline"
+date: 2017-07-21 00:00:00 -0800
+excerpt_separator: <!--more-->
+categories: blog
+authors:
+ - mingmxu
+---
* <a href="https://beam.apache.org/blog/2017/07/21/sql-dsl.html#data-type">data types</a> | ||
* for more details. | ||
*/ | ||
package org.apache.beam.sdk.sd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume "sd" stands for "structured data" or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sd
stands for structure data
public abstract List<String> getFieldsName(); | ||
public abstract List<Integer> getFieldsType(); | ||
|
||
public static BeamSqlRowType create(List<String> fieldNames, List<Integer> fieldTypes) { | ||
return new AutoValue_BeamSqlRowType(fieldNames, fieldTypes); | ||
public static BeamRowType create(List<String> fieldNames, List<Integer> fieldTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface (taking a list of names, then a parallel list of types) seems fragile and prone to off-by-one errors. If columns are always referred to by name (not index) in the API, could this just be a Map? Otherwise, some kind of an OrderedMap (or list of pairs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, it doesn't verify that size of names and types matches. I'll open another task to address it. --A line like fieldNames.size() == fieldTypes.size()
is enough I think.
--Columns in BeamRow are accessed both by name and index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create BEAM-2678
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would help, but it's still messy to have the relationship fall to simply having the same index in a parallel list. I'll comment on the bug, as this is somewhat orthogonal to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, a List<KV<FieldName, Types>>
may be better.
|
||
import java.io.IOException; | ||
import java.io.InputStream; | ||
import java.io.OutputStream; | ||
import java.sql.Types; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this as safe dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, java.sql.Types
is part of JDK.
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests passing.
R: + @lukecwik |
move back to DSL/SQL as there're some concerns that this is SQL-specific, not a generic Tuple solution. |
retest this please |
1 similar comment
retest this please |
Could you point me to where these concerns were discussed? |
You can find the comments in JIRA ticket.
…Sent from my iPhone
On Jul 31, 2017, at 11:13 AM, Robert Bradshaw ***@***.***> wrote:
Could you point me to where these concerns were discussed?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
close this task, and prepare a new PR to avoid the huge rebase work. |
It contains two parts:
remove SQL word in the name,
BeamSqlRow
-->BeamRow
BeamSqlRowType
-->BeamRowType
move from package
org.apache.beam.dsls.sql.schema
toorg.apache.beam.sdk.sd
(sd stands for structure data), in modulebeam-sdks-java-core
Hint:
The 4 files are changed to remove dependencies on Calcite/BeamSql, others are updated automate by IDE due to class name change: