-
Notifications
You must be signed in to change notification settings - Fork 4
📝 💥 prepare for merge into Dagster project #48
Conversation
@danielgafni this is going into dagster directly 😄 ? I am wondering how the interaction of the dagster-deltalake will be with this one? |
Hey @ion-elgreco ! Yes, it is. We don't have any plans for working with There can definitely be lots of code reusal. We will get to this at some point. |
@danielgafni I work on the delta-RS project and I'm going to now start using dagster at work, so I'll need to start looking in which implementation to build on top of. The dagster-deltalake has been built by Robert another maintainer of delta-rs, so I was thinking building on top of that. But if we can bring this to together that would be nice. Because the dagster-deltalake also returns polars dataframes |
Sounds really cool! I have been using I've started making heavy use of Currently there are some issues with the base I would really like to make support for native DeltaLake partitioning better and cleaner, but this depends on Anyway, let's see if @roeap wants to do anything about this. |
@danielgafni do you have something I can reach you on (slack, discord)? Would like to ask some questions about current implementation? :) I also already see a couple things we can add across the api, and some improvements to the Delta integration so we can discuss that as well. |
Hey @ion-elgreco , you can reach out to me on Dagster slack with the same tag as here. I've sent you a PM there. |
@danielgafni - i recently noticed that ther is some weird behaviour when loading partitions, and yes, happy to try and do something about this :). Could you elaborate a bit what issue exactly youa re referring to, since we are not using the UPathIOManager as a base, but he DB one - delta is kind of inbetween and since it has internal storage handling it does not profit from UPath. |
I eventually decided for the deltalake integration to continue on top of dagster-deltalake since the DBIOmanager is easier to work with and makes more sense. I have a working version though for lazy frame in dagster-polars now since I also plan to use the parquet io manager, still need to expand test coverage but hopefully later this week I can push a PR |
Changes:
BigQueryPolarsIOManager
->PolarsBigQueryIOManager
for the sake of consistencyUPath
import as done in Dagster itself