Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectors-ci: add finalize_build logic to handle custom Dockerfiles #26489

Merged
merged 12 commits into from
May 25, 2023

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented May 24, 2023

What

Closes #26426
Java connectors are not built with Dockerfiles anymore now that we switched to Dagger.
But some connectors have a custom Dockerfile that install system dependencies are add custom env variable.
This PR is a suggested approach to handle these exceptional cases.

How

  • A finalize_build.sh or a finalize_build.py file can be created in the connector folder. Developers are free to pick the format they prefer. The finalize_build python function allows direct interaction with the Dagger container being built, which can be convenient for docker specific instructions and improved caching.
  • If this file is present the with_airbyte_python_connector or with_airbyte_java_connector function will run the bash script or the finalize_build Python function on the container that was built with the common procedure.

I added a deprecation warning in order to make our developers aware that Dockerfile changes currently have no effect on images we currently publish with the automated pipelines.
We can't delete the Dockerfiles at the moment because they're still use for /test - but once the tests powered by dagger are live we can proceed to their deletion.

For a "real-life" example I've added:

  • a finalize_build.sh script to destination-s3 to install LZO compression.
  • a finalize_build.py script to destination-iceberg to set the custom JAVA_OPTS env var this connector requires

⚠️ Please note this PR does not fix destination-s3 or destination-iceberg:

  • The finalize_build.sh script for destination-s3 exits with 1 status code on ARM64 architecutre
  • The destination-iceberg connector build has been broken for a long time (probably since monorepo split) and relies on Java library that are not on the airbyte repo anymore...

Recommended reading order:

🚨 User Impact 🚨

No impact as the build logic for connector with "normal" Dockerfiles was not altered. I'll publish a java and a python connector pre-release for sanity.
Update: confirmed that the "normal" connector publish is working. The destination-s3 failure is related to an error in the finalize_build.sh script I pinged @edgao about here.

@alafanechere alafanechere requested a review from a team as a code owner May 24, 2023 13:59
@github-actions
Copy link
Contributor

github-actions bot commented May 24, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (36)

Connector Version Changelog Publish
source-alloydb 2.0.28
source-alloydb-strict-encrypt 2.0.28 🔵
(ignored)
🔵
(ignored)
source-azure-blob-storage 0.1.0
source-bigquery 0.2.3
source-clickhouse 0.1.17
source-clickhouse-strict-encrypt 0.1.17 🔵
(ignored)
🔵
(ignored)
source-cockroachdb 0.1.22
source-cockroachdb-strict-encrypt 0.1.22 🔵
(ignored)
🔵
(ignored)
source-db2 0.1.19
source-db2-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
source-dynamodb 0.1.2
source-e2e-test 2.1.4
source-e2e-test-cloud 2.1.4 🔵
(ignored)
🔵
(ignored)
source-elasticsearch 0.1.1
source-jdbc 0.3.5 🔵
(ignored)
🔵
(ignored)
source-kafka 0.2.3
source-mongodb 0.3.3 🔵
(ignored)
🔵
(ignored)
source-mongodb-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
source-mongodb-v2 0.1.19
source-mssql 1.0.16
source-mssql-strict-encrypt 1.0.16 🔵
(ignored)
🔵
(ignored)
source-mysql 2.0.23
source-mysql-strict-encrypt 2.0.23 🔵
(ignored)
🔵
(ignored)
source-oracle 0.3.24
source-oracle-strict-encrypt 0.3.24 🔵
(ignored)
🔵
(ignored)
source-postgres 2.0.29
source-postgres-strict-encrypt 2.0.29 🔵
(ignored)
🔵
(ignored)
source-python-http-tutorial 0.1.4 🔵
(ignored)
🔵
(ignored)
source-redshift 0.3.16
source-relational-db 0.3.1 🔵
(ignored)
🔵
(ignored)
source-scaffold-java-jdbc 0.1.0 🔵
(ignored)
🔵
(ignored)
source-sftp 0.1.2
source-snowflake 0.1.34
source-stock-ticker-api-tutorial 0.2.1 🔵
(ignored)
🔵
(ignored)
source-teradata 0.1.0
source-tidb 0.2.4
  • See "Actionable Items" below for how to resolve warnings and errors.

❌ Destinations (49)

Connector Version Changelog Publish
destination-azure-blob-storage 0.2.0
destination-bigquery 1.4.3
destination-bigquery-denormalized 1.4.1
destination-cassandra 0.1.4
destination-clickhouse 0.2.3
destination-clickhouse-strict-encrypt 0.2.3 🔵
(ignored)
🔵
(ignored)
destination-csv 1.0.0
destination-databricks 1.0.2
destination-dev-null 0.3.0 🔵
(ignored)
🔵
(ignored)
destination-doris 0.1.0
destination-dynamodb 0.1.7
destination-e2e-test 0.3.0
destination-elasticsearch 0.1.6
destination-elasticsearch-strict-encrypt 0.1.6 🔵
(ignored)
🔵
(ignored)
destination-exasol 0.1.1
destination-gcs 0.3.0
destination-iceberg 0.1.0
destination-kafka 0.1.10
destination-keen 0.2.4
destination-kinesis 0.1.5
destination-local-json 0.2.11
destination-mariadb-columnstore 0.1.7
destination-mongodb 0.1.9
destination-mongodb-strict-encrypt 0.1.9 🔵
(ignored)
🔵
(ignored)
destination-mqtt 0.1.3
destination-mssql 0.1.23
destination-mssql-strict-encrypt 0.1.23 🔵
(ignored)
🔵
(ignored)
destination-mysql 0.1.20
destination-mysql-strict-encrypt 0.1.21
(mismatch: 0.1.20)
🔵
(ignored)
🔵
(ignored)
destination-oracle 0.1.19
destination-oracle-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
destination-postgres 0.3.27
destination-postgres-strict-encrypt 0.3.27 🔵
(ignored)
🔵
(ignored)
destination-pubsub 0.2.0
destination-pulsar 0.1.3
destination-r2 0.1.0
destination-redis 0.1.4
destination-redpanda 0.1.0
destination-redshift 0.4.7
destination-rockset 0.1.4
destination-s3 0.4.1
destination-s3-glue 0.1.7
destination-scylla 0.1.3
destination-selectdb 0.1.0
destination-snowflake 1.0.4
destination-starburst-galaxy 0.0.1
destination-teradata 0.1.1
destination-tidb 0.1.1
(diff seed version)
destination-yugabytedb 0.1.1
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the cloud or oss registry, so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that you have added a metadata.yaml file and the expected registries are enabled.

echo "$ARCH"
yum install lzop lzo lzo-dev -y

# alanechere: I'm not sure we need this custom install of lzo anymore. Using the yum install above works in the build context.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao I was not able to run this script successfully. The sdk install java 8.0.342-librca command exits with a 1 status code. Considering the current code freeze on this connector I think it's fine to merge this PR with a "broken" script. I'll let your team figure the changes required to support LZO on all architectures. I'm even not sure the custom install for ARM64 is required. I tried a simple yum install lzop lzo lzo-dev -y on the ARM64 variant and it worked.

You can reproduce my failure on this branch by running:
airbyte-ci connectors --name=destination-s3 build

I'll also let you decide if you prefer to use a bash script or the finalize_build.py module with a finalize_build (please check the destination-iceberg/finalize_build.py for an example). The 🐍 way is better for improved caching and access to the Dagger API. But I also understand that running this script as it is is convenient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing. yeah I'll just submit an issue to figure this out, not a blocker for this pr.

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, and the comments in the Java dockerfiles are great too

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

Just a couple very small suggestions.

Loving how extensible this makes our pipelines

@@ -1,3 +1,11 @@
### WARNING ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

has_finalize_bash_script = "finalize_build.sh" in finalize_scripts
has_finalize_python_script = "finalize_build.py" in finalize_scripts
if has_finalize_python_script and has_finalize_bash_script:
raise Exception("Connector has both finalize_build.sh and finalize_build.py, please remove one of them")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super small nit:

I'm accustomed to seeing post_, pre_, or custom or override in cases like these

  1. post_install or pre_install with npm
  2. post_deploy or pre_deploy on heroku

Would some thing like post_build, custom_build, build_script fit better?

Comment on lines +718 to +719
.with_entrypoint("sh")
.with_exec(["/tmp/finalize_build.sh"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.with_entrypoint("sh")
.with_exec(["/tmp/finalize_build.sh"])
.with_exec(["sh", "/tmp/finalize_build.sh"])

return await finalize_build(context, connector_container)


async def finalize_build(context: ConnectorContext, connector_container: Container) -> Container:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets put this in the docs!

@alafanechere alafanechere enabled auto-merge (squash) May 24, 2023 21:55
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, submitted #26521 to get dest-s3 into good state.

echo "$ARCH"
yum install lzop lzo lzo-dev -y

# alanechere: I'm not sure we need this custom install of lzo anymore. Using the yum install above works in the build context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing. yeah I'll just submit an issue to figure this out, not a blocker for this pr.

if not finalize_scripts:
return connector_container

# We don't want finalize scripts to override the entrypoint so we keep it in memory to reset it after finalization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why don't we want the scripts to override the entrypoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might reconsider it for specific upcoming cases, but I've not seen connector specifically overriding the classic airbyte entrypoint (except your "normalization in container" usecase, which is handled directly in the main code path you contributed to)

@alafanechere alafanechere merged commit fd36557 into master May 25, 2023
@alafanechere alafanechere deleted the augustin/connectors-ci/fix-destination-s3-build branch May 25, 2023 07:32
marcosmarxm pushed a commit to natalia-miinto/airbyte that referenced this pull request Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues connectors/destination/azure-blob-storage connectors/destination/bigquery connectors/destination/bigquery-denormalized connectors/destination/cassandra connectors/destination/clickhouse connectors/destination/clickhouse-strict-encrypt connectors/destination/csv connectors/destination/databricks connectors/destination/dev-null connectors/destination/doris connectors/destination/dynamodb connectors/destination/e2e-test connectors/destination/elasticsearch connectors/destination/elasticsearch-strict-encryp connectors/destination/exasol connectors/destination/gcs connectors/destination/iceberg connectors/destination/kafka connectors/destination/keen connectors/destination/kinesis connectors/destination/local-json connectors/destination/mariadb-columnstore connectors/destination/mongodb connectors/destination/mongodb-strict-encrypt connectors/destination/mqtt connectors/destination/mssql connectors/destination/mssql-strict-encrypt connectors/destination/mysql connectors/destination/mysql-strict-encrypt connectors/destination/oracle connectors/destination/oracle-strict-encrypt connectors/destination/postgres connectors/destination/postgres-strict-encrypt connectors/destination/pubsub connectors/destination/pulsar connectors/destination/r2 connectors/destination/redis connectors/destination/redpanda connectors/destination/redshift connectors/destination/rockset connectors/destination/s3-glue connectors/destination/s3 connectors/destination/scylla connectors/destination/selectdb connectors/destination/snowflake connectors/destination/starburst-galaxy connectors/destination/teradata connectors/destination/tidb connectors/destination/yugabytedb connectors/source/alloydb connectors/source/alloydb-strict-encrypt connectors/source/azure-blob-storage connectors/source/bigquery connectors/source/clickhouse connectors/source/clickhouse-strict-encrypt connectors/source/cockroachdb connectors/source/cockroachdb-strict-encrypt connectors/source/db2-strict-encrypt connectors/source/db2 connectors/source/dynamodb connectors/source/e2e-test connectors/source/e2e-test-cloud connectors/source/elasticsearch connectors/source/kafka connectors/source/mongodb-strict-encrypt connectors/source/mongodb-v2 connectors/source/mssql connectors/source/mssql-strict-encrypt connectors/source/mysql connectors/source/mysql-strict-encrypt connectors/source/oracle connectors/source/oracle-strict-encrypt connectors/source/postgres connectors/source/postgres-strict-encrypt connectors/source/redshift connectors/source/sftp connectors/source/snowflake connectors/source/teradata connectors/source/tidb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

connector-ci: implement custom build logic for some java connectors
5 participants