Skip to content

Terminology: pipeline #516

@treysp

Description

@treysp

It's critical for the documentation to be internally consistent with respect to terminology. We're not trying to figure out the "true" definition of terms in the sense of how the words are used in the wild, we just need to have a specific and stable definition for use in the docs.

The term pipeline is ubiquitous in both data engineering lingo and our docs. We don't need to come up with a perfect definition, but we want to make sure it's being used at the correct level of generality in the docs.

Specifically, I'd like to clarify the relationship between pipelines and other terms we're using, and I'd like to understand enumeration of pipelines.

Here is the definition in the SQLMesh glossary:
"The set of tools and processes for moving data from one system to another. Datasets are then organized, transformed, and inserted into some type of database, tool, or app, where data scientists, engineers, and analysts can access the data for analysis, insights, and reporting."

Here is Iaroslav's comment on a concepts/overview.md pull request:
I don't think we have a concept of a "pipeline" internally but even if we did a it would be something like a series of execution steps applied to a single model: eg. create a table, evaluate model's logic, write results, run audits, etc.

Questions to answer in this issue:

  1. What is the right level of generality for the docs?
    • We could keep it very general (more like glossary def), precluding statements like "there are X models in this pipeline." This would allow us to use it frequently but not in specific technical contexts.
    • We could make it defined at the model level (each model constitutes one pipeline). This would make it very clear how to use it in technical contexts, but would restrict how we used it in introductory paragraphs and other non-technical contexts.
  2. How many pipelines are in each of A-D? Is that even a sensible question?

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions