Skip to content

[Python] Start of DuckDB Spark API #8083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 6, 2023
Merged

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Jun 26, 2023

This PR introduces the start of the Spark DuckDB API, this API aims to be 1 to 1 compatible with PySpark, but internally it will use DuckDB.

Namely this PR introduces the following core classes:

SparkSession

This class wraps a DuckDBPyConnection and acts as a wrapper around the connection
Methods that are already implemented:

  • sql
  • builder (and its relevant methods)
  • table

DataFrame

This class wraps a DuckDBPyRelation and acts as a wrapper around the relation
Methods that are already implemented:

  • show
  • write.saveAsTable

Catalog

Part of the SparkSession, also wraps DuckDBPyConnection but exposes methods related to querying the catalog of the database
Methods that are already implemented:

  • listDatabases
  • listTables
  • listColumns

Different levels of not implemented

Even the code for our Python client is written in C++, so for this API we wanted to write it in Python to make it easier for the users of the API to be able contribute to the API.

Some methods are planned to be implemented but aren't yet, but a lot of the methods are not explicitly planned to be implemented yet.

To differentiate between the two classes of "not implemented" we introduce a ContributionsAcceptedError, which also inherits from NotImplementedError so they can be generically dealt with, but semantically it serves as a way to communicate that the method that threw this exception is not planned to be implemented by the core team.

@github-actions github-actions bot marked this pull request as draft June 26, 2023 15:27
@Tishj Tishj marked this pull request as ready for review June 27, 2023 14:58
@github-actions github-actions bot marked this pull request as draft June 29, 2023 08:48
@Tishj Tishj marked this pull request as ready for review June 29, 2023 09:58
@Tishj Tishj requested a review from Mause June 30, 2023 10:25
@Tishj Tishj marked this pull request as draft July 4, 2023 07:32
@Tishj Tishj marked this pull request as ready for review July 4, 2023 07:32
@Mytherin Mytherin changed the base branch from feature to master July 4, 2023 13:17
@Mytherin Mytherin merged commit 33938a5 into duckdb:master Jul 6, 2023
@Mytherin
Copy link
Collaborator

Mytherin commented Jul 6, 2023

Thanks! Looks great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants