Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dbt deps tarball #4689

Merged
merged 53 commits into from Dec 7, 2022
Merged

Conversation

timle2
Copy link
Contributor

@timle2 timle2 commented Feb 5, 2022

resolves #4205

add new dbt.deps type: url to internally hosted tarball #420

Continued from
#4220

Revision 3 added Nov 6 2022

Proposed solution for feature request 4205

Description

Enable direct linking to tarball urls in packages.yml, for example:

# manufactured test, since you'd want to use hub to install these 
# public tarball used here as example only! 
# this would usually be a tarball hosted  on an internal network
packages:
  - tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.6.5
    name: 'dbt_utils_065'

image

image

Rational:

  • dbt projects being self hosted on larger enterprise environments often don't have a connection to the internet (dbt hubs won't work).
  • dbt users on larger enterprise environments like to build internal private packages for non-public use (help out other dbt users in company with specific functionality)
  • git package install is not a good option at scale for larger enterprise environments
  • internal file hosting service (such as internal artifactory service or internal cloud storage buckets) can be easily configured to host packages for install during deployment, so lets give dbt users a way to install from a direct tar file link

Sketching out doc changes here:
https://github.com/timle2/docs.getdbt.com/blob/dbt-docs-tarball-package-updates/website/docs/docs/building-a-dbt-project/package-management.md#tar-files

Checklist

@timle2 timle2 requested a review from a team as a code owner February 5, 2022 21:51
@timle2 timle2 requested a review from a team February 5, 2022 21:51
@timle2 timle2 requested a review from a team as a code owner February 5, 2022 21:51
@timle2 timle2 requested review from nathaniel-may and stu-k and removed request for a team February 5, 2022 21:51
@cla-bot
Copy link

cla-bot bot commented Feb 5, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@timle2 timle2 mentioned this pull request Feb 5, 2022
1 task
@cla-bot
Copy link

cla-bot bot commented Feb 5, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@cla-bot check

@cla-bot
Copy link

cla-bot bot commented Feb 7, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Feb 7, 2022

The cla-bot has been summoned, and re-checked this pull request!

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@timle2 thanks for opening up the PR with your new username! I've double checked and you did indeed sign the CLA. We have your username listed in the correct format. Looks like you may need to configure with your email to get our bot to process it.

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Feb 13, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

3 similar comments
@cla-bot
Copy link

cla-bot bot commented Feb 13, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Feb 13, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Feb 13, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@timle2
Copy link
Contributor Author

timle2 commented Feb 14, 2022

Issues referenced no longer apply to revision 3 - leaving for history

TODO: #4220 (comment)

We've updated how to we log to use the Events modules. You'll need to create a new log event for this, and everywhere else you log, and use fire_event to kick off writing the log. We have a README with some good info on our logging system.

✅ Switched all the logging over to the Events / fire_event.

TODO #4220 (comment)

As a standard we don't use assert in our code other than for tests. They shadow the actual exceptions and throw an AssertionError instead. Please rewrite the assert statements to catch relevant exceptions (create new ones if needed!).
In this specific case, I do like the use is_tarfile to check the file! Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile?

✅ Moved over to properly defined exceptions. For the most part leveraging dbt.exceptions.DependencyException.

Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile? No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, with dbt deps failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.

@emmyoop emmyoop requested review from emmyoop and removed request for stu-k February 14, 2022 17:24
@emmyoop
Copy link
Member

emmyoop commented Feb 15, 2022

Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile? No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, with dbt deps failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.

I agree that this is rare, but it does happen. The user will still find out the tar file is invalid if that is indeed the issue and not a transmission error we could recover from. This will be rare but worth protecting against. In dbt Cloud deps are installed every time a job runs so we want to ensure if there is a transmission error we can recover from with a simple retry, we do so as not to fail an entire job.

Unrelated: You'll want to pull in the latest on the main branch as we've added some auto formatting/checking and that is why you have a failing test. You can read more about it here.

Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress! I know you're not done yet but I've left a few comments to change course a bit. Let me know if you have any questions!

core/dbt/deps/tarball.py Outdated Show resolved Hide resolved
core/dbt/events/types.py Outdated Show resolved Hide resolved
core/dbt/events/types.py Outdated Show resolved Hide resolved
@cla-bot
Copy link

cla-bot bot commented Feb 21, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Feb 21, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Feb 21, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@timle2 timle2 closed this Dec 3, 2022
@timle2 timle2 reopened this Dec 3, 2022
@timle2
Copy link
Contributor Author

timle2 commented Dec 3, 2022

@timle2 I just want to add that you only need to open an issue for a new feature here and a PM will get the docs written up.

However, adding info on tarball.py to the README would be perfect.

@emmyoop Thanks for flagging this! I've added/altered the readme with dc8ba0a

@timle2 timle2 requested a review from emmyoop December 3, 2022 23:06
def _install(self, project, renderer):
metadata = self.fetch_metadata(project, renderer)

tar_name = "{}.{}.tar.gz".format(self.package, self.version)
Copy link
Contributor

@iknox-fa iknox-fa Dec 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor detail: We prefer using the more modern f-string style python string formatting for new commits to dbt-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ian - I've changed it over to f-string with 8787ba4

metadata = self.fetch_metadata(project, renderer)

tar_name = "{}.{}.tar.gz".format(self.package, self.version)
tar_path = os.path.realpath(os.path.join(get_downloads_path(), tar_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor detail: We're trying to avoid using os.path for new commits to dbt-core (preferring Pathlib instead). THis is due to the many odd caveats when running in non-posix environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is refactored from [core/dbt/deps/registry](https://github.com/dbt-labs/dbt-core/blob/16f529e1d4e067bdbb6a659a622bead442f24b4e/core/dbt/deps/registry.py#L62). I'm not certain it's my place to be doing this os.path -> pathlib conversion, since I'm far from the expert on downstream effects on registry.py. But took a stab with b06a662, and we can revert if you change your mind on this.

Copy link
Contributor

@iknox-fa iknox-fa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the few mentioned minor details.
Edit, removed approval since this was tagged as language-- @emmyoop's got you covered (it'd be nice to fix the f-strings and pathlib bits though)

@iknox-fa iknox-fa requested review from iknox-fa and removed request for iknox-fa December 4, 2022 16:07
'tarball' as version so that the temp files format nicely:
[tempfile_location]/dbt_utils_2..tar.gz # old
vs
[tempfile_location]/dbt_utils_1.tarball.tar.gz # current
core/dbt/deps/tarball.py Outdated Show resolved Hide resolved
@Mathyoub
Copy link
Contributor

Mathyoub commented Dec 5, 2022

Changes look great @timle2! Just left one minor comment to lowercase "tarball" in the output!

@timle2
Copy link
Contributor Author

timle2 commented Dec 6, 2022

Tests failing with b06a662 due to tests in test/unit/test_deps.py that still referenced a version input to tarball class. Fixed with e9f2121. pytest passing now on local machine.

@dbeatty10
Copy link
Contributor

@timle2 we'll do some investigation and troubleshooting on our end for the two CI tests that aren't passing:

  • Generate CLI API docs / check if generation needed
  • Tests and Code Checks / integration test / python 3.8 / windows-latest

image

image

@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 6, 2022

@timle2 we'll do some investigation and troubleshooting on our end for the two CI tests that aren't passing:

  • Generate CLI API docs / check if generation needed
  • Tests and Code Checks / integration test / python 3.8 / windows-latest

👍 After investigation, we are good from the perspective of CI tests.

Justification

Confirmed with @emmyoop and @stu-k that Generate CLI API docs can be ignored for this PR. Alternatively, if you do a fresh merge with main, it will skip this particular check in the future.

The windows-latest Tests and Code Checks was flaky and addressed by just manually re-running it. A future issue/PR combo with these updates should be able to address that flakiness.

@timle2
Copy link
Contributor Author

timle2 commented Dec 7, 2022

@timle2 we'll do some investigation and troubleshooting on our end for the two CI tests that aren't passing:

  • Generate CLI API docs / check if generation needed
  • Tests and Code Checks / integration test / python 3.8 / windows-latest

👍 After investigation, we are good from the perspective of CI tests.

Justification

Confirmed with @emmyoop and @stu-k that Generate CLI API docs can be ignored for this PR. Alternatively, if you do a fresh merge with main, it will skip this particular check in the future.

The windows-latest Tests and Code Checks was flaky and addressed by just manually re-running it. A future issue/PR combo with these updates should be able to address that flakiness.

Thanks for this Doug! 🙏

@timle2
Copy link
Contributor Author

timle2 commented Dec 7, 2022

@emmyoop I think we are complete on the PR. Lmk if there is anything needed from me before Merge!

Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timle2 thank you again for all your work (and persistence) on this! This is such a great community contribution! We'll be including it in the 1.4 release!

@emmyoop emmyoop merged commit 99f27de into dbt-labs:main Dec 7, 2022
@timle2
Copy link
Contributor Author

timle2 commented Dec 7, 2022

@timle2 thank you again for all your work (and persistence) on this! This is such a great community contribution! We'll be including it in the 1.4 release!

Amazing! Thanks to you as well for your patience and guiding me through all the first time contributor stuff. Looking forward to making some more (and much smoother) contributions in the future!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 7, 2022

Nice work team !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] add new dbt.deps type: url to internally hosted tarball
9 participants