Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local dataframe to delta support #1450

Closed
9 tasks
dimberman opened this issue Dec 16, 2022 · 0 comments · Fixed by #1397
Closed
9 tasks

Add local dataframe to delta support #1450

dimberman opened this issue Dec 16, 2022 · 0 comments · Fixed by #1397
Labels
feature New feature or request

Comments

@dimberman
Copy link
Collaborator

Please describe the feature you'd like to see

I'd like to see a feature that allows users to take a local dataframe and send it to a Databricks Delta table. This would enable us to more easily and efficiently load data into Databricks for processing and analysis.

Describe the solution you'd like

A function or method that takes a local dataframe and a connection string for a Databricks Delta table as input, and loads the data from the dataframe into the table. It would be helpful if the function also had options for specifying the load behavior (e.g. append vs. overwrite).

Here's an example of what the function signature might look like:

@aql.dataframe()
def df_func() -> pandas.Dataframe:
    return df
    
with dag:
    df_func(output_table=Table(conn_id="my_delta_conn")

Are there any alternatives to this feature?
One alternative would be to use the Databricks API to load data into a Delta table. This would require users to manually construct the API request and handle any errors that might occur, whereas the proposed function would handle these details internally.

Additional context
This feature will be released as part of the 0.1 release so users can start testing basic functionality.

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@dimberman dimberman added the feature New feature or request label Dec 16, 2022
@dimberman dimberman added this to the Databricks Support V 0.1 milestone Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant