Skip to content
This repository has been archived by the owner on Mar 11, 2024. It is now read-only.

📝 💥 prepare for merge into Dagster project #48

Merged
merged 7 commits into from
Jan 11, 2024
Merged

Conversation

danielgafni
Copy link
Owner

@danielgafni danielgafni commented Jan 11, 2024

Changes:

  • 📝 added lots of docstrings to the IOManagers
  • 💥 renamed BigQueryPolarsIOManager -> PolarsBigQueryIOManager for the sake of consistency
  • defer UPath import as done in Dagster itself

@danielgafni danielgafni changed the title 📝 add API docs, prepare for merge into Dagster project 📝 💥 prepare for merge into Dagster project Jan 11, 2024
@danielgafni danielgafni enabled auto-merge (squash) January 11, 2024 21:37
@danielgafni danielgafni merged commit 2645632 into master Jan 11, 2024
72 of 73 checks passed
@danielgafni danielgafni deleted the api-docs branch January 11, 2024 22:31
@ion-elgreco
Copy link
Contributor

@danielgafni this is going into dagster directly 😄 ?

I am wondering how the interaction of the dagster-deltalake will be with this one?

@danielgafni
Copy link
Owner Author

Hey @ion-elgreco !

Yes, it is.

We don't have any plans for working with dagster-deltalake yet, however, I think we should.

There can definitely be lots of code reusal. We will get to this at some point.

@ion-elgreco
Copy link
Contributor

@danielgafni I work on the delta-RS project and I'm going to now start using dagster at work, so I'll need to start looking in which implementation to build on top of.

The dagster-deltalake has been built by Robert another maintainer of delta-rs, so I was thinking building on top of that. But if we can bring this to together that would be nice. Because the dagster-deltalake also returns polars dataframes

@danielgafni
Copy link
Owner Author

danielgafni commented Jan 17, 2024

Sounds really cool!

I have been using dagster-polars at my current and previous jobs. As I know, it works well in production.

I've started making heavy use of DeltaLake just recently tho.

Currently there are some issues with the base UPathIOManager, which I would really like to solve before proceeding with dagster-polars improvements.

I would really like to make support for native DeltaLake partitioning better and cleaner, but this depends on UPathIOManager refactor.

Anyway, let's see if @roeap wants to do anything about this.

@ion-elgreco
Copy link
Contributor

ion-elgreco commented Jan 21, 2024

@danielgafni do you have something I can reach you on (slack, discord)?

Would like to ask some questions about current implementation? :) I also already see a couple things we can add across the api, and some improvements to the Delta integration so we can discuss that as well.

@danielgafni
Copy link
Owner Author

Hey @ion-elgreco , you can reach out to me on Dagster slack with the same tag as here. I've sent you a PM there.

@roeap
Copy link

roeap commented Jan 23, 2024

@danielgafni - i recently noticed that ther is some weird behaviour when loading partitions, and yes, happy to try and do something about this :).

Could you elaborate a bit what issue exactly youa re referring to, since we are not using the UPathIOManager as a base, but he DB one - delta is kind of inbetween and since it has internal storage handling it does not profit from UPath.

@ion-elgreco
Copy link
Contributor

ion-elgreco commented Jan 23, 2024

I eventually decided for the deltalake integration to continue on top of dagster-deltalake since the DBIOmanager is easier to work with and makes more sense.

I have a working version though for lazy frame in dagster-polars now since I also plan to use the parquet io manager, still need to expand test coverage but hopefully later this week I can push a PR

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants