Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support larger seed files #161

Merged
merged 13 commits into from
Mar 30, 2023
Merged

Conversation

henriblancke
Copy link

@henriblancke henriblancke commented Feb 24, 2023

Description

This change uploads seed files to s3 before creating the seed table. This makes larger seeds possible and removes the limitation of the athena query char limit.

It uploads the seeds as json to have better type casting support. OpenCSVSerde is not good at casting timestamps and inferring correct data types. Since seeds are mostly smaller files this should be fine. Writing them as parquet adds too much complexity to this adapter.

Checklist

  • You followed contributing section
  • You added unit testing when necessary
  • You added functional testing when necessary

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke henriblancke changed the title feat: upload seeds to s3 feat: support larger seed files Feb 24, 2023
@nicor88
Copy link
Member

nicor88 commented Feb 25, 2023

Does seeds still support passing the seed file as csv?

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke
Copy link
Author

henriblancke commented Feb 27, 2023

Does seeds still support passing the seed file as csv?

Yep, files are still being passed ascsv, agate will convert to json before uploading the file to s3. So you still add your seed files to dbt as csv files from the end user's perspective.

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Copy link
Member

@nicor88 nicor88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some small nits.

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
nicor88
nicor88 previously approved these changes Feb 28, 2023
Copy link
Member

@nicor88 nicor88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@nicor88 nicor88 self-requested a review March 6, 2023 09:54
@henriblancke
Copy link
Author

@nicor88 @Jrmyy anything I can do here as a follow up?

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@nicor88
Copy link
Member

nicor88 commented Mar 21, 2023

@henriblancke please rebase with main - then I will test and review this feature.

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke
Copy link
Author

@nicor88 rebased ✅

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke
Copy link
Author

@nicor88 fixed merge conflicts and rebased again, let me know if there is anything else I can do to help

@nicor88
Copy link
Member

nicor88 commented Mar 29, 2023

@henriblancke nice job here, please address this #161 (comment) and then we can merge, if also @Jrmyy gives the ok.
I tried the feature - with a csv that is 1MB and works fine, data types included!

@Jrmyy
Copy link
Member

Jrmyy commented Mar 30, 2023

I will make a test this morning and let you know !

@Jrmyy
Copy link
Member

Jrmyy commented Mar 30, 2023

The test I made works therefore once my comment is resolved, this is a go for merging 👍🏻
Thanks a lot for the contribution 🔥

@nicor88
Copy link
Member

nicor88 commented Mar 30, 2023

@henriblancke please resolve this one #161 (comment) and we merge this and include in the next release.

nicor88 and others added 2 commits March 30, 2023 13:31
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke
Copy link
Author

@henriblancke please resolve this one #161 (comment) and we merge this and include in the next release.

@Jrmyy @nicor88 thanks again for the review, I've addressed #161 (comment)

@nicor88 nicor88 requested a review from Jrmyy March 30, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants