Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-lib: Refactor connectors #34552

Merged
merged 6 commits into from
Jan 30, 2024

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Jan 26, 2024

This PR refactors most connectors so they are compatible with airbyte-lib:

  • Move the entrypoint to run.py into the module that's going to be published
  • Export the entrypoint functionality as a run function
  • In the main.py file, import the run function and execute it
  • Add a console script definition to the setup.py that calls the run function as well.

Publish plan

This PR is intended to merged without CI checks, as a lot of the touched connectors currently have broken tests. As the docker image tag won't be bumped, no new version will be published, reducing the risk of this change.

The next developer working on the connector will implicitly publish these changes, which shouldn't have any effect on functionality.

How to test

airbyte-lib comes with a helper to validate that a connector works in general:

  • Install ci_credentials as explained here: https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/ci_credentials/README.md
  • Go to airbyte-lib and run poetry shell
  • Load credentials for a connector: VERSION=dev ci_credentials source-<connector under test> write-to-storage
  • Run the validator: airbyte-lib-validate-source --connector-dir ./airbyte-integrations/connectors/source-<connector under test>/ --sample-config ./airbyte-integrations/connectors/source-<connector under test>/secrets/config.json
    • This will install the connector into a new venv and make sure spec, check, discover and read can be called successfully (read is assumed to be valid if one record can be extracted from one of the streams returned by discover)

Converted connectors:

Click to expand list
source-activecampaign/
source-adjust/
source-aha/
source-aircall/
source-airtable/
source-alpha-vantage/
source-amazon-ads/
source-amazon-seller-partner/
source-amazon-sqs/
source-amplitude/
source-appfollow/
source-apple-search-ads/
source-appsflyer/
source-appstore-singer/
source-asana/
source-ashby/
source-auth0/
source-aws-cloudtrail/
source-azure-blob-storage/
source-azure-table/
source-babelforce/
source-bamboo-hr/
source-bigcommerce/
source-bing-ads/
source-braintree/
source-braze/
source-breezometer/
source-callrail/
source-captain-data/
source-cart/
source-chargebee/
source-chargify/
source-chartmogul/
source-clickup-api/
source-clockify/
source-close-com/
source-coda/
source-coin-api/
source-coingecko-coins/
source-coinmarketcap/
source-commcare/
source-commercetools/
source-configcat/
source-confluence/
source-convertkit/
source-convex/
source-copper/
source-courier/
source-customer-io/
source-datadog/
source-datascope/
source-delighted/
source-dixa/
source-dockerhub/
source-dremio/
source-drift/
source-dv-360/
source-emailoctopus/
source-everhour/
source-exchange-rates/
source-facebook-pages/
source-fastbill/
source-fauna/
source-file/
source-firebase-realtime-database/
source-firebolt/
source-flexport/
source-freshcaller/
source-freshsales/
source-freshservice/
source-fullstory/
source-gainsight-px/
source-gcs/
source-genesys/
source-getlago/
source-github/
source-glassfrog/
source-gnews/
source-gocardless/
source-gong/
source-google-analytics-v4/
source-google-directory/
source-google-pagespeed-insights/
source-google-search-console/
source-google-webfonts/
source-google-workspace-admin-reports/
source-greenhouse/
source-gridly/
source-gutendex/
source-harness/
source-harvest/
source-hellobaton/
source-hubplanner/
source-hubspot/
source-insightly/
source-instatus/
source-intercom/
source-intruder/
source-ip2whois/
source-jira/
source-k6-cloud/
source-klarna/
source-klaus-api/
source-klaviyo/
source-kustomer-singer/
source-kyriba/
source-kyve/
source-launchdarkly/
source-lemlist/
source-lever-hiring/
source-linkedin-pages/
source-linnworks/
source-lokalise/
source-looker/
source-mailerlite/
source-mailersend/
source-mailgun/
source-mailjet-mail/
source-mailjet-sms/
source-merge/
source-metabase/
source-microsoft-dataverse/
source-microsoft-onedrive/
source-microsoft-teams/
source-monday/
source-my-hours/
source-n8n/
source-nasa/
source-netsuite/
source-news-api/
source-newsdata/
source-notion/
source-nytimes/
source-okta/
source-omnisend/
source-onesignal/
source-open-exchange-rates/
source-openweather/
source-opsgenie/
source-orb/
source-orbit/
source-oura/
source-outbrain-amplify/
source-outreach/
source-pagerduty/
source-pardot/
source-partnerstack/
source-paystack/
source-pendo/
source-persistiq/
source-pexels-api/
source-pinterest/
source-pivotal-tracker/
source-plaid/
source-plausible/
source-pocket/
source-pokeapi/
source-polygon-stock-api/
source-posthog/
source-postmarkapp/
source-prestashop/
source-primetric/
source-public-apis/
source-punk-api/
source-pypi/
source-python-http-tutorial/
source-qonto/
source-qualaroo/
source-quickbooks/
source-railz/
source-rd-station-marketing/
source-recharge/
source-recreation/
source-recruitee/
source-recurly/
source-reply-io/
source-retently/
source-ringcentral/
source-rki-covid/
source-rocket-chat/
source-rss/
source-salesloft/
source-sap-fieldglass/
source-scaffold-source-http/
source-scaffold-source-python/
source-search-metrics/
source-secoda/
source-sendgrid/
source-sendinblue/
source-senseforce/
source-sentry/
source-serpstat/
source-sftp-bulk/
source-shortio/
source-smaily/
source-smartengage/
source-snapchat-marketing/
source-sonar-cloud/
source-spacex-api/
source-square/
source-statuspage/
source-strava/
source-survey-sparrow/
source-surveycto/
source-surveymonkey/
source-talkdesk-explore/
source-tempo/
source-the-guardian-api/
source-tiktok-marketing/
source-timely/
source-tmdb/
source-todoist/
source-toggl/
source-tplcentral/
source-trello/
source-trustpilot/
source-tvmaze-schedule/
source-twilio-taskrouter/
source-twilio/
source-twitter/
source-tyntec-sms/
source-unleash/
source-us-census/
source-vantage/
source-visma-economic/
source-vitally/
source-waiteraid/
source-weatherstack/
source-webflow/
source-whisky-hunter/
source-wikipedia-pageviews/
source-woocommerce/
source-workable/
source-workramp/
source-wrike/
source-xkcd/
source-yahoo-finance-price/
source-yandex-metrica/
source-yotpo/
source-younium/
source-youtube-analytics/
source-zapier-supported-storage/
source-zendesk-sell/
source-zendesk-sunshine/
source-zenefits/
source-zenloop/
source-zoho-crm/
source-zoom/
source-zuora/

Skipped connectors (already converted):

Click to expand list
source-apify-dataset/
source-facebook-marketing/
source-faker/
source-freshdesk/
source-gitlab/
source-google-ads/
source-google-analytics-data-api/
source-google-drive/
source-google-sheets/
source-instagram/
source-iterable/
source-linkedin-ads/
source-mailchimp/
source-marketo/
source-mixpanel/
source-paypal-transaction/
source-pipedrive/
source-s3/
source-salesforce/
source-shopify/
source-slack/
source-smartsheets/
source-stripe/
source-typeform/
source-xero/
source-zendesk-support/
source-zendesk-talk/

Skipped connectors (non-python):

Click to expand list
source-bigquery/
source-breaker/
source-clickhouse-strict-encrypt/
source-clickhouse/
source-cockroachdb-strict-encrypt/
source-cockroachdb/
source-db2-strict-encrypt/
source-db2/
source-debug/
source-dynamodb/
source-e2e-test-cloud/
source-e2e-test/
source-elasticsearch/
source-jdbc/
source-kafka/
source-mongodb-strict-encrypt/
source-mongodb-v2/
source-mongodb/
source-mssql-strict-encrypt/
source-mssql/
source-mysql-strict-encrypt/
source-mysql/
source-oracle-strict-encrypt/
source-oracle/
source-postgres-strict-encrypt/
source-postgres/
source-redshift/
source-relational-db/
source-s3-unstructured/
source-scaffold-java-jdbc/
source-sftp/
source-snowflake/
source-stock-ticker-api-tutorial/
source-teradata/
source-tidb/

Skipped directories (main.py not found, language:python/low-code):

source-zendesk-chat/

Script used to do the conversion

convert.sh
  #!/bin/bash

# Initialize arrays to keep track of processed and skipped directories
processed_dirs=()
skipped_no_main=()
skipped_no_setup=()
skipped_run_exists=()


# Loop through all directories starting with source-
for dir in source-*/ ; do
    echo "Processing directory: $dir"
    FOLDER_NAME="$dir"
    FOLDER_NAME="${FOLDER_NAME%/}"  # Remove trailing slash
    SNAKE_CASED_FOLDER_NAME=$(echo "$FOLDER_NAME" | sed 's/-/_/g')

    # Check for setup.py
    if [ ! -f "$dir/setup.py" ]; then
        echo "setup.py not found in $dir"
        skipped_no_setup+=("$dir")
        continue
    fi

    # Check for main.py
    if [ ! -f "$dir/main.py" ]; then
        echo "main.py not found in $dir"
        skipped_no_main+=("$dir")
        continue
    fi

    # Check if run.py already exists
    if [ -f "$dir/$SNAKE_CASED_FOLDER_NAME/run.py" ]; then
        echo "run.py already exists in $FOLDER_NAME/$SNAKE_CASED_FOLDER_NAME"
        skipped_run_exists+=("$dir")
        continue
    fi

    # Step 1: Add entry_points to setup.py
    sed -i '' "/setup(/a \\
    entry_points={\\
        \"console_scripts\": [\\
            \"${FOLDER_NAME}=${SNAKE_CASED_FOLDER_NAME}.run:run\",\\
        ],\\
    }," "$FOLDER_NAME/setup.py"

    # Step 2: Create run.py and copy contents from main.py
    mkdir -p "$FOLDER_NAME/$SNAKE_CASED_FOLDER_NAME"
    cp "$FOLDER_NAME/main.py" "$FOLDER_NAME/$SNAKE_CASED_FOLDER_NAME/run.py"
    sed -i '' 's/if __name__ == "__main__":/def run():/' "$FOLDER_NAME/$SNAKE_CASED_FOLDER_NAME/run.py"

    # Step 3: Modify main.py
    echo -e "#\n# Copyright (c) 2023 Airbyte, Inc., all rights reserved.\n#\n\nfrom ${SNAKE_CASED_FOLDER_NAME}.run import run\n\nif __name__ == \"__main__\":\n    run()" > "$FOLDER_NAME/main.py"

    processed_dirs+=("$dir")
done

skipped_no_setup_python=()
skipped_no_setup_other=()
skipped_no_main_python=()
skipped_no_main_other=()

# Function to categorize directories based on metadata.yaml
categorize_based_on_metadata() {
    local dir=$1
    local metadata_file="$dir/metadata.yaml"
    if grep -q "language:python\|language:low-code" "$metadata_file" && ! grep -q "language:java" "$metadata_file"; then
        eval "$2+=('$dir')"  # Add to python/low-code array
    else
        eval "$3+=('$dir')"  # Add to other array
    fi
}

# Categorize skipped_no_setup
for dir in "${skipped_no_setup[@]}"; do
    categorize_based_on_metadata "$dir" "skipped_no_setup_python" "skipped_no_setup_other"
done

# Categorize skipped_no_main
for dir in "${skipped_no_main[@]}"; do
    categorize_based_on_metadata "$dir" "skipped_no_main_python" "skipped_no_main_other"
done


# Print processed and skipped directories
echo "Processed directories:"
printf '%s\n' "${processed_dirs[@]}"

echo "Skipped directories (setup.py not found):"
printf '%s\n' "${skipped_no_setup[@]}"

echo "Skipped directories (main.py not found):"
printf '%s\n' "${skipped_no_main[@]}"

echo "Skipped directories (run.py already exists):"
printf '%s\n' "${skipped_run_exists[@]}"

echo "Skipped directories (setup.py not found, language:python/low-code):"
printf '%s\n' "${skipped_no_setup_python[@]}"

echo "Skipped directories (setup.py not found, other languages):"
printf '%s\n' "${skipped_no_setup_other[@]}"

echo "Skipped directories (main.py not found, language:python/low-code):"
printf '%s\n' "${skipped_no_main_python[@]}"

echo "Skipped directories (main.py not found, other languages):"
printf '%s\n' "${skipped_no_main_other[@]}"

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Jan 26, 2024
Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

Copy link

vercel bot commented Jan 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 29, 2024 6:51pm

Copy link
Collaborator

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my part, this looks great. Thanks, @flash1293 !

I have reviewed the migration scripts and scanned the updated connector code changes. This looks 'safe' to me, and I appreciate the broader approach to including package_data. We might find something is still missed, but this should catch everything that is foreseeable. And if something is missed, it'll fail the validation tests later when we next attempt to publish.

I noted this on other threads, but I'll note here too:

  • One nice advantage of having all the connectors' CLI entrypoints registered, is that we gain new test capabilities via git-based or local path-based installs. This will help us find bugs and test functionality in a variety of use cases before needing to formally publish. (Without a CLI entrypoint, this is just a lot harder to do.)

@alafanechere
Copy link
Contributor

/approve-and-merge reason="We collectively decided that CI can be bypassed for this global change. No publish should be triggered"

@octavia-approvington
Copy link
Contributor

This looks fine!
Merged!
Imagine it being fine

@octavia-approvington octavia-approvington merged commit f29234a into master Jan 30, 2024
23 of 284 checks passed
@octavia-approvington octavia-approvington deleted the flash1293/refactor-all-the-connectors branch January 30, 2024 09:22
clnoll pushed a commit that referenced this pull request Jan 30, 2024
Copy link

sentry-io bot commented Jan 30, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ Exception: bad flows response /airbyte/integration_code/source_klaviyo/stream... View Issue
  • ‼️ ValueError: ('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted wit... /airbyte/integration_code/source_google_analyti... View Issue
  • ‼️ airbyte_cdk.utils.traced_exception.AirbyteTracedException: None /usr/local/lib/python3.9/site-packages/airbyte_... View Issue
  • ‼️ airbyte_cdk.sources.file_based.exceptions.SchemaInferenceError: Error inferring schema from files. Are the files valid? Contact Support if you need assistance. /usr/local/lib/python3.9/site-packages/airbyte_... View Issue
  • ‼️ airbyte_cdk.sources.declarative.exceptions.ReadException: Request to https://api.typeform.com/forms/b1Pn8vtj/webhooks failed with status code 504 and error... /usr/local/lib/python3.9/site-packages/airbyte_... View Issue

Did you find this useful? React with a 👍 or 👎

jbfbell pushed a commit that referenced this pull request Feb 1, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants