Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1512] [Bug] The profiles.yml file is read multiple times. #6263

Closed
2 tasks done
akashrn5 opened this issue Nov 16, 2022 · 5 comments
Closed
2 tasks done

[CT-1512] [Bug] The profiles.yml file is read multiple times. #6263

akashrn5 opened this issue Nov 16, 2022 · 5 comments
Labels
duplicate This issue or pull request already exists tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality

Comments

@akashrn5
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a dbt run/compile/deps/docs generate are executed the profiles.yml is being read twice which is unnecessary.

Expected Behavior

The reading of profiles.yml has to be once per operation.

Steps To Reproduce

  1. In any environment
  2. setup a dbt project
  3. Add log in dbt.config.profile.read_profile
  4. run dbt run, or dbt compile, the profiles.yml file will be read twice.

Relevant log output

No response

Environment

- OS: mac OS
- Python: 3.8.9
- dbt: 1.4.0a1

Which database adapter are you using with dbt?

spark

Additional Context

No response

@akashrn5 akashrn5 added bug Something isn't working triage labels Nov 16, 2022
@github-actions github-actions bot changed the title [Bug] The profiles.yml file is read multiple times. [CT-1512] [Bug] The profiles.yml file is read multiple times. Nov 16, 2022
@jtcohen6 jtcohen6 added Refinement Maintainer input needed and removed triage labels Nov 16, 2022
@jtcohen6
Copy link
Contributor

@akashrn5 Thanks for opening. You're right, profiles.yml is accessed twice today during config loading. While this does feel unnecessary, it's because we access the UserConfig first (to resolve Flags), and then load the actual profiles. To clarify, does this cause any functional issues for you today? Or just that it feels suboptimal?

It looks like the approach you're taking in #6264 is to hold onto the raw_profiles in memory (Python dict from yaml, before Jinja rendering), and re-render it at the appropriate time, without needing to re-access the files. I'll be interested to hear what some of the Core engineers think about this approach.

There's definitely merit in being able to pass a raw or already-rendered connection profile, as a Python dict from memory, into config generation methods, rather than needing to read it from the file system at all. That's a desirable outcome of our "API-ification" work.

Related issues:

@akashrn5
Copy link
Author

@jtcohen6 thanks for your views on this. I am a new user to dbt. I observed this thing during debugging the code and i felt its not ok to read the file two times when you have the option to read once and reuse it. It didn't create any functional issue for me as such. But, i thought we should avoid it. Since profiles.yml file is the base to start your any dbt operation, i guess keeping in memory won't harm in anyway. yeah, lets here it from other engineers as well.

@jtcohen6
Copy link
Contributor

@akashrn5 Heard! Amazing to see you, as a newer dbt user, already diving into this part of the code & opening a PR to improve it :)

In terms of sequencing this work:

  • We really like the idea of passing a dictionary of profile contents directly into config instantiation. This will make it much easier to programmatically generate credentials/configs, without requiring file access, and it makes the logic much easier to follow. We're likely to do something similar as part of our "API-ification" work (the issue linked above).
  • Let's keep UserConfig and Profile logically separate, for now, because we know we want to eventually move UserConfig elsewhere. The fact that we need to read profiles.yml twice today remains a good signal that we've coupled two things that shouldn't be. (Even after this change, we'll still need to look for it in profiles.yml, for backwards-compatibility reasons, at least for a few versions.)

Given that - we're going to hold off on merging the exact changes you're proposing in #6264, until we've had a chance to push the API-ification work around profiles to our feature branch (feature/click-cli).

@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality and removed bug Something isn't working labels Nov 17, 2022
@akashrn5
Copy link
Author

@jtcohen6 Thank you for your welcome. DBT is a great project and would love to work on it, develop and optimize it.

I understood your point. Since you are already working on the refactoring part, you want the current changes to be as it is. So i guess it will be open then. But if the "API-fication" is going to take long time and won't be considered for next release, it would be ok to consider this change. This is just a suggestion as i'm not aware of the timeline, if its on the finish line, thats really great. Also i would love to work on the feature if there are some tasks where i can contribute.

Thanks again :)

@jtcohen6 jtcohen6 added duplicate This issue or pull request already exists and removed Refinement Maintainer input needed labels Feb 17, 2023
@jtcohen6
Copy link
Contributor

Let's remove the double read of profiles.yml as part of splitting out UserConfig, and cleaning up our runtime config / initialization story: #6207 (comment)

Closing in favor of that issue

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality
Projects
None yet
2 participants