[BEAM-8203] Add support for AvroTable in SQL#9597
Conversation
|
R: @amaliujia R: @reuvenlax |
|
Run JavaPortabilityApi PreCommit |
amaliujia
left a comment
There was a problem hiding this comment.
Overall looks good! Left some comments.
|
|
||
| /** A {@link PTransform} to convert {@link GenericRecord} to {@link Row}. */ | ||
| @AutoValue | ||
| public abstract class GenericRecordReadConverter |
There was a problem hiding this comment.
Can this class be moved to AvroTable.java to make this change more compact?
There was a problem hiding this comment.
As per @reuvenlax comment this class is removed now.
|
|
||
| /** A {@link PTransform} to convert {@link Row} to {@link GenericRecord}. */ | ||
| @AutoValue | ||
| public abstract class GenericRecordWriteConverter |
There was a problem hiding this comment.
Can this write converter be moved as well?
There was a problem hiding this comment.
In the package org.apache.beam.sdk.extensions.sql.meta.provider.parquet there is also a ReadConverter class but these are not specific to Parquet.
I was thinking of a separate PR to move these converter classes to some other package. Can you let me know, what is the best place to move these classes ?
| .apply("GenericRecordToRow", writeConverter) | ||
| .apply( | ||
| "AvroIOWrite", | ||
| AvroIO.writeGenericRecords(AvroUtils.toAvroSchema(schema, tableName, null)) |
There was a problem hiding this comment.
I am less familiar with Avro code: why we need use tableName as parameter?
There was a problem hiding this comment.
AvroUtils.toAvroSchema() method provides an avroSchema from the BeamSchema. But the avroSchema geenrated does not have the name of the avroSchema.
Now when I use this avroSchema (without the name) to write/read records it fails with error
SchemaParseException: No name in schema: {"type":"record","fields": ... }
So I had to provide the name for the schema, I have provided the same using the tableName.
|
|
||
| /** {@link AvroTable} is a {@link org.apache.beam.sdk.extensions.sql.BeamSqlTable}. */ | ||
| public class AvroTable extends BaseBeamTable implements Serializable { | ||
| private final String filePattern; |
There was a problem hiding this comment.
There is a concern to keep only one file pattern and read/write uses different patterns (a folder for read and a file for write). It limits this table to a read-only table or write-only table.
The better way might be two parameters for read and write separately. A not-set parameter will fail a read or write operation.
However I can see where this pattern comes from: text table has already used it. So it's up to you if you want to change it in this PR.
There was a problem hiding this comment.
I had followed the TextTableProvider approach over here. May be if needed a seperate PR can be used to handle this read/write using different patterns.
...sions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/avro/AvroTable.java
Show resolved
Hide resolved
e0402fc to
94a8b26
Compare
|
Run JavaPortabilityApi PreCommit |
Adding support Avro Table Provider.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.