Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add destinations support #257

Merged
merged 142 commits into from
Jul 30, 2024
Merged

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented May 22, 2024

[Early draft for discussion.]

Resolves: #197

This adds proper destination support to PyAirbyte, including for docker-based destinations when Docker is available. There's still a lot to do here, but it's a good first step.

Open questions:

  1. How to get, write, and store state. Maybe another state backend or re-use our existing state backend tables?
  2. How to handoff the stream from the source's STDOUT to the destination's STDIN.
  3. Windows compat is hard, with pipe and socket handling differing between Windows and Mac/Linux.
  4. Whether the MVP should support source-to-destination or cache-to-destination, or both.

Remaining TODO:

  1. Usage examples in file docstrings.
  2. Incremental processing in Destinations.
  3. Consider releasing get_destination() in experimental module.
  4. Add tests for destination state writer.
  5. Incremental from cache to destination. (Currently, entire cache contents are synced and we don't have a timestamp to use for getting "since" a specific point from the cache.)
  6. Consider whether StateProvider and StateWriter need to be adapted to support GLOBAL stream state. (Currently, this would have to be passed as stream_name='GLOBAL'.
  7. Consider if caches should allow a user to give them custom name. (Although this then would require them to give the same name each time if the name is used for anything important.)

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new SQLTools configuration for improved database connections.
    • Enhanced progress tracking capabilities for message processing and batch operations.
    • Improved state management with new classes and methods for better handling of connector states.
    • Added support for Docker-based connectors through the new Executor class.
    • Enhanced functionality for writing data to various destinations with the new Destination class.
    • Introduced a NoOpStateWriter class for state management in non-persistent scenarios.
    • Enhanced integration tests for better validation of state handling and connectivity.
  • Bug Fixes

    • Improved handling of state management and error logging to ensure better reliability.
  • Chores

    • Cleaned up import statements and organized project structure for better maintainability.

@aaronsteers aaronsteers marked this pull request as draft June 4, 2024 02:11
Copy link

coderabbitai bot commented Jun 18, 2024

Walkthrough

Walkthrough

The recent updates enhance the Airbyte framework by implementing structured management for destination connectors, particularly for Python-based implementations. Key changes include the introduction of new classes for connector management, improvements in error handling, enhanced progress tracking, and additional methods for state and configuration management. These updates aim to boost usability and performance across the framework, facilitating seamless integration with various destination connectors.

Changes

Files Change Summary
.gitignore Added patterns to ignore temporary files and log directories.
.vscode/settings.json Introduced SQLTools connection settings for DuckDB, defining three distinct database connections.
airbyte/__init__.py, airbyte/_connector_base.py Added ConnectorBase class and destinations module to improve connector management and usability.
airbyte/_batch_handles.py Modified type annotations in file handling methods to support text file operations.
airbyte/_future_cdk/*.py Enhanced classes with new methods for processing messages, tracking progress, and managing state.
airbyte/_util/meta.py Improved logic in is_interactive function for better robustness.
examples/run_perf_test_reads.py, examples/run_sync_to_destination_w_cache.py Introduced support for testing destination load performance and synchronization with caching mechanisms.
tests/integration_tests/destinations/test_source_to_destination.py Introduced integration tests for DuckDB destination functionality.

Assessment against linked issues

Objective Addressed Explanation
Add support for loading to Airbyte destination connectors (197)
Install Python-based destination connectors (197)
Send data to them from the source (197)
Support for SQL caches in PyAirbyte (197) No specific SQL cache management updates were made.
Improve error handling in destination connectors (197)

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 33ec03d and 5be2bd8.

Files selected for processing (4)
  • tests/conftest.py (6 hunks)
  • tests/integration_tests/cloud/conftest.py (3 hunks)
  • tests/integration_tests/cloud/test_cloud_api_util.py (9 hunks)
  • tests/integration_tests/test_state_handling.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • tests/integration_tests/cloud/test_cloud_api_util.py
Files skipped from review as they are similar to previous changes (2)
  • tests/integration_tests/cloud/conftest.py
  • tests/integration_tests/test_state_handling.py
Additional comments not posted (5)
tests/conftest.py (5)

23-23: Import change acknowledged.

The import of _get_bin_dir from airbyte._executor has been replaced with get_bin_dir from airbyte._util.venv_util, indicating a restructuring aimed at better modularity for utility functions.


100-103: New condition for new_duckdb_destination_executor fixture acknowledged.

The new condition in pytest_collection_modifyitems function checks for the presence of the fixture name "new_duckdb_destination_executor" in item.fixturenames, expanding the test logic to accommodate additional fixtures.


Line range hint 170-234:
New new_postgres_db fixture acknowledged.

The new_postgres_db fixture initializes a PostgreSQL container for testing and ensures proper resource management after tests are executed. The implementation appears correct and complete.


Line range hint 236-248:
Redefined new_postgres_cache fixture acknowledged.

The new_postgres_cache fixture now depends on the new new_postgres_db fixture, enhancing the clarity of how database connections are established for tests. The implementation appears correct and complete.


305-305: Updated method for obtaining pip path acknowledged.

The method for obtaining the path to pip in the source_test_installation function has been updated to utilize get_bin_dir, aligning with the new import structure and improving consistency.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

airbyte/_executors/base.py Outdated Show resolved Hide resolved
@aaronsteers aaronsteers merged commit 4591a6d into main Jul 30, 2024
22 checks passed
@aaronsteers aaronsteers deleted the aj/feat/add-destinations-support branch July 30, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

💡 Feature Request: Add support for loading to Airbyte destination connectors
1 participant