Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PECO-1054 Expose Arrow batches to users, part three #166

Merged
merged 6 commits into from Sep 29, 2023

Conversation

rcypher-databricks
Copy link
Contributor

Added DBSqlRows and DBSQLArrowBatchIterator public interfaces.
Added arrowRecordIterator which implements DBSQLArrowBatchIterator.
Moved closing the database operation from rows type into resultPageIterator as well as properties that are only used by resultPageIterator.
Added GetArrowBatches function to rows and arrowRowScanner types.
Added HasNext function to BatchIterator and SparkArrowBatch interfaces.
Added example for accessing Arrow batches and updated doc.go

doc.go Outdated
conn, _ := db.Conn(context.BackGround())
defer conn.Close()

query := `select * from hive_metastore.main.taxi_trip_data`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to UC. main.default.<>

ctx2, cancel2 := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel2()

batches, err := rows.(dbsqlrows.DBSQLRows).GetArrowBatches(ctx2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is DBSQL something that we actually expose externally as an acronym? Any alternatives to this?

Copy link
Contributor

@andrefurlan-db andrefurlan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of annoying comments to make sure we name things that we expose as interface correctly. The rest looks awesome!

Added DBSqlRows and DBSQLArrowBatchIterator public interfaces.
Added arrowRecordIterator which implements  DBSQLArrowBatchIterator.
Moved closing the db operation from rows type into resultPageIterator as well as properties that are only used by resultPageIterator.
Added GetArrowBatches function to rows and arrowRowScanner types.
Added HasNext function to BatchIterator and SparkArrowBatch interfaces.
Added example for accessing Arrow batches and updated doc.go

Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Renamed DBSQLRows and DBSQLArrowBatchIterator to Rows and ArrowBatchIterator by dropping the DBSQL prefix.
Updated example to use UC

Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Signed-off-by: Raymond Cypher <raymond.cypher@databricks.com>
Copy link
Contributor

@andrefurlan-db andrefurlan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job

@rcypher-databricks rcypher-databricks merged commit 6bb1879 into databricks:main Sep 29, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants