Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate append flow API #50

Open
ravi-databricks opened this issue Apr 25, 2024 · 1 comment
Open

Integrate append flow API #50

ravi-databricks opened this issue Apr 25, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ravi-databricks
Copy link
Contributor

Integrate append_flow API for following use cases:

  1. One time backfill
  2. Multiple kafka topics writing to same target

API DOCS Ref

@ravi-databricks ravi-databricks added the enhancement New feature or request label Apr 25, 2024
@ravi-databricks ravi-databricks self-assigned this Apr 25, 2024
@ravi-databricks
Copy link
Contributor Author

  • Introduced bronze_append_flows and silver_append_flows inside onboarding file with below structure:

  • e.g Main bronze table customer needs to insert from different datasets then DLT-META can launch multiple flows under :

 "bronze_append_flows": [
      {
            "name": "customer_bronze_flow",
            "create_streaming_table": false,
            "source_format": "cloudFiles",
            "source_details": {
               "source_path_it": "{dbfs_path}/integration_tests/resources/data/customers_af",
               "source_schema_path": "{dbfs_path}/integration_tests/resources/customers.ddl"
            },
            "reader_options": {
               "cloudFiles.format": "json",
               "cloudFiles.inferColumnTypes": "true",
               "cloudFiles.rescuedDataColumn": "_rescued_data"
            },
            "once": false
      }
   ]
  • With above example in case of kafka as source_format append_flows can contain multiple topics in source_details and reader_options

  • As a result of above change needs to restructure pipeline readers to contain state information like source_details, source_format, reader_options and schema_json. This will make sure dlt.append_flow can have respective callable functions from PipelineReaders like read_dlt_cloud_files, read_dlt_delta, read_kafka

  • Incorporated additional parameters for dlt.apply_changes

            flow_name,
            once,
            ignore_null_updates_column_list,
            ignore_null_updates_except_column_list

@ganeshchand @neil90 @howardwu-db

@ravi-databricks ravi-databricks added this to the v0.0.8 milestone Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant