Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Operations API on Kube: Refactor normalization transform_config into Java #5091

Closed
davinchia opened this issue Jul 30, 2021 · 38 comments
Closed
Labels

Comments

@davinchia
Copy link
Contributor

  • Severity: High

Current Behavior

  • See this slack thread for info.
  • The CustomDbtRunner no longer works on Kube since it assumes processes are able to share file space. This is not true for Kube processes.
  • What happens is the first process is started to prepare certain files DBT requires. The second process is then started up to run DBT, however the files are missing and it errors out.

There are two approaches here:

  1. Make the user installed git and the base normalize folder in their submitted docker image
  2. Migrate the transform_config directory to Java. This way the scheduler can run this and transfer the yml file over to the container.
    • Submitted image will still need git
    • need to modify normalization as well (all we need to do is remove this from the entrypoint.sh in base-normalization and make sure we also copy the yaml over)

We are choosing option 2 since it's simpler to implement.

Expected Behavior

This should work on Kube.

@Phlair
Copy link
Contributor

Phlair commented Sep 7, 2021

this will also need to take into account ssh tunnelling process

@ChristopheDuong ChristopheDuong changed the title Fix Operations API on Kube. Fix Operations API on Kube: Refactor normalization transform_config into Java Oct 7, 2021
@delucca
Copy link

delucca commented Oct 21, 2021

Any updates on this?

@zackliston
Copy link

This is blocking all usage for custom DBT transformations in kubernetes deployments. What's the expected timeline? Are there workarounds that could be shared?

@ro8inmorgan
Copy link

Wish I read this before going through the hassle of setting up a kube install. Now I waisted my morning to this. This is a serious feature breaking bug

@RajuGujjalapatiChowdary
Copy link

My dbt is not working on airbyte(deployed in kube).
Any solutions on this?

@haithem-souala
Copy link
Contributor

Any ETA?

@nathanaevitas
Copy link

This issue gonna celebrate its first birthday to be 1 year old.

@GabrielGLevine
Copy link

Is anyone aware of a workaround for this?

@evantahler
Copy link
Contributor

evantahler commented Aug 17, 2022

Alternative suggestion - Approach .3: shared temporary directory.
We could use something like Fuse+S3 mounts to make the directory in which the first process prepares the dbt files backed by a "network drive", then available to the second process. We would need a way to pass configuration in for the S3 mount (e.g. a destination config option), or set via ENV

@stepcha
Copy link

stepcha commented Aug 26, 2022

Any ETA? or some workaround?

@alepietrobon
Copy link

+1

1 similar comment
@ThoSap
Copy link

ThoSap commented Aug 29, 2022

+1

@fabianofpena
Copy link

Really need this feature! :/

@pmossman pmossman removed their assignment Feb 16, 2023
@redaq
Copy link

redaq commented Apr 20, 2023

We bumped into this wall too. Hope this will be resolved soon!

@blueyo
Copy link

blueyo commented Apr 24, 2023

So the current open source airbyte does not support dbt custom transformation ? I am getting an error related to dbt version. Is that related the discussed above issue?

2023-04-18 13:02:11 dbt > Running: dbt deps --profiles-dir=/data/60/0/transform --project-dir=/data/60/0/transform/git_repo
2023-04-18 13:02:14 dbt > 13:02:14 Encountered an error while reading the project:
2023-04-18 13:02:14 dbt > 13:02:14 ERROR: Runtime Error
2023-04-18 13:02:14 dbt > This version of dbt is not supported with the 'mixpanel' package.
2023-04-18 13:02:14 dbt > Installed version of dbt: =1.0.0
2023-04-18 13:02:14 dbt > Required version of dbt for 'mixpanel': ['>=1.3.0', '<2.0.0']
2023-04-18 13:02:14 dbt > Check the requirements for the 'mixpanel' package, or run dbt again with --no-version-check

@franviera92
Copy link
Contributor

@blueyo Not is run on k8s

@Grayfados
Copy link
Contributor

Hi guys... I'm glad to say that I was able to customize the worker as the @davinchia said workaround to run on k8s (I don't remember exactly what slack post was ..) I finished the implementation last week and just entered in Vacation mode kkkkk As soon I return from that I will post here the way that I implemented. Cya!

@anaselmhamdi
Copy link

Also running into this issue as i was trying to migrate to a kube setup - I'll try to implement a workaround as @davinchia mentioned as well by creating a custom DBT image to supply the transformation

@Grayfados
Copy link
Contributor

Grayfados commented May 8, 2023

Hi guys, sorry about my delay. That's the solution that I used to use custom normalization on k8s.
238

As my main problem is to works with BigqueryDestination I implemented only that.
The main concern is about the container it self that will run the custom dbt while running the transform-config script.
I created a python script using exclusively standard python libs to work with that.

The generate_profile.py script only deals with Bigquery Destination! So you will need to customize your own use case!

@glevineLeap
Copy link

I think this is a good feedback opportunity for this group to influence the next version of Normalization: #25194

@evantahler
Copy link
Contributor

I think this is a good feedback opportunity for this group to influence the next version of Normalization: #25194

This issue you linked is a bit different. Airbyte is going to stop using dbt internally as the tool to create typed columns in database destinations, and along the way, handle errors better and make saner tables.

That's unrelated to this feature which lets you run custom dbt transformations after Airbyte has moved the data.

@5ylar
Copy link

5ylar commented Aug 9, 2023

Any workaround solution for this?

@skenmy
Copy link

skenmy commented Oct 3, 2023

Adding my +1 for this - with this undocumented behaviour I have spent significant time trying to debug this and we are ultimately having to move our deployment into a pet instance to unblock a large migration we have underway.

This is key functionality that breaks when deploying using an apparently supported method and this either needs calling out on the documentation page or fixing as a relative priority!

@akozichev
Copy link

just hit this issue with Kubernetes.
As I understand the Destination V2 won't fix it.
I haven't seen it in action yet but it looks like the "normalisation" with DBT is being removed, whilst custom DBT steps will remain hence the issue will remain.

This wouldn't be a difficult fix if it weren't hidden so far deep inside the worker code. I couldn't find a way to change the behaviour. Updating CONTAINER_ORCHESTRATOR_IMAGE doesn't help so as providing your own DBT Image.
The failure happens in airbyte/custom-transformation-prep image.
I can probably write a mutation webhook in kubernetes to attach an extra volume to all Job pods of airbyte, but this seems to be a bit too much.
Is there an easy way to use my own container for some of the processing?

@dramirez-bluon
Copy link

+1 | This feature is really important

@vatsal-kavida
Copy link

seems like a big issue, +1 for the addition of this feature its a critical one.

@dhl777
Copy link

dhl777 commented Mar 13, 2024

+1 | This feature is really important, seems like a big issue, +1 for the addition of this feature its a critical one.

@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
@cgardens
Copy link
Contributor

cgardens commented Apr 3, 2024

Closed as won't do because we have deprecated the custom dbt transformation feature.

@cgardens cgardens closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests