Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2114] In-repo documentation (readmes): architecture/sequence diagram of core execution flow #6969

Closed
Tracked by #6706
jtcohen6 opened this issue Feb 14, 2023 · 4 comments
Labels
python_api Issues related to dbtRunner Python entry point spike

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 14, 2023

I'm envisioning a diagram that includes:

  • breakdown of major steps in core execution flow
  • inputs & outputs of each step

Is this a "flow diagram" or a "sequence diagram"? I didn't even realize that these are different things, but plenty of people have thought about this in more depth than I have.

I don't think we should aim to auto-generate this. It should be one level higher than specific methods / codepaths. Rather, this should instruct & inform on the patterns we're seeking to actualize in our execution APIs.

Potential inspiration (internal Notion docs / diagrams from 2021!):

Out of scope for this: flow/sequence diagram that breaks down parsing. This is also important, but for the sake of core's execution flow, parsing can be understood as a single step with clear inputs & outputs.

@github-actions github-actions bot changed the title In-repo documentation (readmes): architecture/sequence diagram of core execution flow [CT-2114] In-repo documentation (readmes): architecture/sequence diagram of core execution flow Feb 14, 2023
@jtcohen6 jtcohen6 added Team:Execution python_api Issues related to dbtRunner Python entry point labels Feb 14, 2023
@stu-k
Copy link
Contributor

stu-k commented Feb 20, 2023

@jtcohen6 As part of backlog grooming, we aren't going to point this issue as there is no acceptance criteria and are not sure what the work to be done is.

Outstanding questions

  • who will this diagram be for (particularly helpful to understand the necessary complexity of the diagram)
  • where should this live
  • who should own this
  • how will this be maintained over time

@jtcohen6
Copy link
Contributor Author

@stu-k Good questions, thanks for asking them!

who will this diagram be for (particularly helpful to understand the necessary complexity of the diagram)

This is not intended for the average end user of dbt. The target audience is:

  • Maintainers of dbt-core (us & future team members)
  • Our engineering colleagues at dbt Labs, who want to develop a mental model of dbt-core's execution flow
  • Community contributors to dbt-core (more technical than the average end user of dbt)

where should this live

A README in this repository

who should own this

@dbt-labs/core-execution !

how will this be maintained over time

I'd leave that for the Core-Execution team to determine. It might inform implementation decisions about which tool to use, and how to keep it up-to-date. As I said in the issue description, I don't think this is something we should be auto-generating from code; we could look to define it using code that lives in this repo, but I'd also be happy with a Lucid diagram.

@jtcohen6
Copy link
Contributor Author

To make the point more strongly — IMO these are medium/high-priority exit criteria for Phase 2 of API-ification:

  1. Our team is capable of making a high-level diagram of the dbt-core task/command execution flow. Any member of the Core-Execution team can contribute to it and reason about it.
  2. This diagram exists for public reference. Improvements to our actual codepaths & abstractions are essential, but if we don't do a good job of documenting them, I don't believe we'll have accomplished our goal as maintainers of an application (erstwhile "tool" and future "library") that lots of people want/need to comprehend and contribute to.

@stu-k stu-k added the spike label Feb 28, 2023
@stu-k
Copy link
Contributor

stu-k commented Feb 28, 2023

We're going to call this a spike with the assumption that the output is some sort of diagram that depicts the dbt-core task/command execution flow. Timebox at 2 days.

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python_api Issues related to dbtRunner Python entry point spike
Projects
None yet
Development

No branches or pull requests

2 participants