Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-workers: add /tmp emptyDir volume to connector pods #10761

Merged
merged 1 commit into from
Mar 10, 2022

Conversation

tbcdns
Copy link
Contributor

@tbcdns tbcdns commented Mar 1, 2022

What

Some connectors (such as destination-s3) require to write some temporary data (generally to /tmp).
It is a good security practice to enforce read only root filesystem on Kubernetes pod, and, some productive Kubernetes clusters enforce that all pods run with read only root filesystem.
Therefore, in order to still allow connectors to write temporary data to /tmp with read only root fs, we must mount an emptyDir volume to /tmp.

The original PR was here: #9874 we decided to split it into 3 different PRs.

How

Simply creating a new emptyDir volume and mount it under /tmp.

🚨 User Impact 🚨

No user impact

@github-actions github-actions bot added area/platform issues related to the platform area/worker Related to worker labels Mar 1, 2022
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tbcdns

@davinchia
Copy link
Contributor

We are going to hold off on merging this first as we have some internal discussions to do. FYI @sherifnada

@davinchia
Copy link
Contributor

@tbcdns, after discussing with @ChristopheDuong, we agree this is useful and we want to merge this in.

let us run some test in the mean time before doing so. should merge this in the next couple of days.

were you able to successfully run a destination-s3 sync using this deployment?

@davinchia
Copy link
Contributor

davinchia commented Mar 9, 2022

Where are you seeing that destination-s3 uses a local file? Do you mean destination-snowflake?

@tbcdns
Copy link
Contributor Author

tbcdns commented Mar 9, 2022

@davinchia , I faced it with destination-s3, using Parquet as data format.
I don't know all the details, but Parquet (and / or the S3 client) is using some hadoop libraries where the default hadoop tmp dir is /tmp/hadoop-${user.name}
My understanding is that some temporary data needs to be written to the hadoop tmp directory. If /tmp is not writable, I was facing some weird and not so obvious hadoop stack trace.

But, I can confirm that I was able to run a destination-s3 sync using this deployment.

Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong @davinchia Outside of this PR, should we be setting a maximum size limit for the tmp volume as part of the resource requirements? Maybe this is only worth doing in the future with ephemeral storage, but I am a bit worried about adding/encouraging support for disk space usage on nodes without limiting the size.

@davinchia
Copy link
Contributor

@tbcdns thanks!

@jrhizor that's a good call. I'll create an issue so we don't forget. We can decide on this limit once @ChristopheDuong and the connector teams have decided on disk bugger limits.

@davinchia davinchia merged commit 921f4a1 into airbytehq:master Mar 10, 2022
@davinchia
Copy link
Contributor

pmossman added a commit that referenced this pull request Mar 10, 2022
pmossman added a commit that referenced this pull request Mar 11, 2022
terencecho pushed a commit that referenced this pull request Mar 14, 2022
pmossman added a commit that referenced this pull request Mar 29, 2022
pmossman added a commit that referenced this pull request Mar 29, 2022
pmossman added a commit that referenced this pull request Apr 4, 2022
pmossman added a commit that referenced this pull request Apr 5, 2022
pmossman added a commit that referenced this pull request Apr 6, 2022
pmossman added a commit that referenced this pull request Apr 11, 2022
* Revert "Revert "add /tmp emptyDir volume to connector pods (#10761)" (#11053)"

This reverts commit eea5156.

* prettier

* bump version of base-normalization to pick up /tmp -> /dbt-tmp change

* change /dbt-tmp/dbt_modules to /dbt

* Regenerate test output files

* add to changelog

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/worker Related to worker community on-hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants