We are re-building dbt-docs for speed & scale #13080

eliasdefaria · 2026-06-01T13:56:39Z

eliasdefaria
Jun 1, 2026
Collaborator

Drew built the first version of dbt-docs in 2018 — and how it’s worked! You run dbt docs generate && dbt docs serve on your command line, and dbt generates metadata about your project and a html site to explore it. “Auto-generated and easy-to-host data documentation” was one of the biggest reasons that teams have adopted dbt over the years, as well as motivating them to add description and other metadata in their version-controlled project code — much to the benefit of LLMs today.

At the same time, the vast majority of dbt developers are generating dbt code with the help of AI, and dbt projects are larger than ever before (mo’ data mo’ problems). For human members of data teams, project documentation is more than just a useful resource, for visually inspecting the DAG and searching across business logic — it’s an essential way to see what their agents are up to.

But you and your agents liked adding to the docs so much that the original design simply didn’t scale. Docs worked by loading the entire manifest.json and catalog.json into the browser, then rendering the static website based on all the valuable metadata packed into those artifacts. When Drew first built it, nobody imagined a single manifest could be 50 MB, let alone a gigabyte. But that is exactly where many large dbt projects have ended up. We have heard from countless teams running dbt at scale that docs can become slow, unstable, and difficult to use as projects grow.

Today, we open sourced the Fusion runtime as Core v2.0, and that made two things clear:

We could not ship a major version upgrade of the framework without full feature parity, including dbt Docs.
We finally had the right technical foundation to rebuild dbt-docs for speed and scale.

Fusion has the performance characteristics needed for large projects, and it still produces the same rich manifest data people rely on. Along with being a Rust-based runtime built for scale, another key difference is that docs no longer depends on loading massive manifest.json and catalog.json blobs in the browser. Instead, the same metadata is emitted as Parquet artifacts that are scalable, joinable, and analyzable, and the UI can query only what it needs.

That’s why, this time around, Grace and I built dbt docs for v2. And oh how we think it will work!

“This is the most fun I had in the last month” - Elias

Goodbye static site, hello server

I want to analyze the dependencies for my project with thousands of models, without crashing my browser or bloating my agent’s context window

dbt-docs was originally powered by raw project artifacts (aka giant JSON blobs like manifest.json), loaded directly into your browser by inlining those entire JSON blobs into the HTML payload. This was expedient, and made it easy to self-host, but it did not work at any medium-sized scale.

In 2023, we solved the scaling problem for customers of dbt platform with dbt Catalog, featuring a new faster React frontend and powered by the dbt platform Discovery API on the backend. This was great for solving the immediate need of exploring big projects with speed, but there wasn’t a natural pathway from the small self-hosted thing to the large distributed solution.

The next generation of dbt-docs must be instead powered by a real unified backend that stores project context and metadata in a way that’s easily consumable for browsers, agents, and humans alike.

It must also include richer (bigger) metadata, including column-level lineage, inferred column grains, and type-aware impact analysis powered by the native SQL comprehension in the dbt Fusion engine.

dbt-docs will look less like a static web-page, and more like a database and server. It should be easy to get started by running on your own laptop, and then naturally to scale up to stateful production deployment.

Why a database? Three reasons:

It’s efficient. Rather than loading all project metadata into the browser at once, the new UI can make selective requests (queries) to the underlying database for just the project context it needs: lineage, information about a single model, etc.
It can be stateful. It’s fast and easy to spin up an empty database and load in the current project context for quick exploration. When hosted in production, and hooked up to metadata ingestion pipes, that same database can persist historical metadata, power change-tracking over time, and span across multiple dbt projects. We believe this should be possible for the entire dbt community to run and host themselves. dbt Catalog will continue to provide a fully managed, scaled, and stateful version in dbt platform — exposing that rich context over API, CLI, MCP with high uptime and reliability.
You can ask complex questions by writing SQL, the language you know and love. Parquet is a columnar format designed to be queried. In the future, this could power other metadata features of the dbt framework (including checks), and help agents get relevant dbt project context during iterative development.

Docs v2 gets better when it is connected

One important thing about docs v2: it still works locally, and it is still something you can self-host. That matters. A lot of people love dbt Docs precisely because it is generated from their project and easy to serve wherever they want.

But there is also a reality we should be honest about: the best documentation experience is not always possible from local artifacts alone.

Some context lives outside your repo. Some of it lives across projects. Some of it in BI tools downstream of dbt. Some of it lives in orchestration history, once dbt can see how your project is actually being run over time.

When docs v2 is connected to dbt Platform, we can start bringing that context into the experience. That unlocks things like:

Cross-project lineage visualization
Auto exposures
Historical runtime metadata for models orchestrated in dbt Platform

We are going to build out these connections intentionally. Not because we want self-hosted docs to feel incomplete, but because we do not want “self-hosted” to mean “cut off from the best experience.” If your team wants to host your own catalog, great. If you also want to connect it to Platform and get richer context, that path should be easy too.

You may also see a few places in the product that encourage you to try the connected experience. We know that can feel like advertising space, so let me say the quiet part plainly: We believe the connected experience can be better, and we want people to be able to discover those additional capabilities, even if they haven’t seen a demo or read our roadmap.

A simple early example is column-level lineage. In v2, dbt-docs can visualize column lineage when Fusion’s SQL comprehension has produced the column lineage artifact locally. This feature is free, but it does require you to be on Fusion and logged in.

The goal is not to make docs less open. The goal is to make docs much more useful, whether you run it locally, connect it to Platform, or eventually decide the managed experience is the right fit for your team.

The fun part: try it out!

Here’s the big idea: this still feels like dbt docs generate && dbt docs serve.

dbt parse|compile|run|build --write-index == dbt docs generate. In addition to (and maybe someday instead of) generating metadata artifacts as giant JSON files, any command that uses --write-index will push your project metadata into scalable parquet artifacts (on your laptop, a database you’re hosting yourself, or a data store hosted in dbt platform). One key update here is different commands produce different types/amounts of metadata, all of which can be used to enrich the documentation experience. (hint: use --static-analysis strict in Fusion to generate column level lineage metadata)
dbt docs serve → Load up the new, beautiful, and **performant UI — served directly from the dbt Core v2.0 CLI

Here’s what it looks like in action today:

models_tab.mov

macros_tab.mov

We're in alpha right now, and there's more work ahead before this is ready from primetime. The biggest things we’re looking for feedback on:

Does any aspect of it feel like a regression from docs v1? Is there a feature you love that’s missing?
If you could query the parquet directly with SQL, what would you build? What else would you want to analyze? Would you use a standalone SQL query interface in the docs site for this? (see demo below)
Does the feature gating make sense? Does this direction concern you? Or does enriching local documentation with a platform connection a vision you’re excited about?

Plus here’s a sneak peak about a new feature we’re thinking about that relates to question two… 🙂

query_tab_demo.mp4

jc00ke · 2026-06-03T21:16:09Z

jc00ke
Jun 3, 2026

I'm commenting to advocate for a static option. We work in a very constrained environment (state department of health) and we don't have access to compute to run the docs server. We can host static sites via the forge (GitLab Pages in our case) which is what we're doing now on dbt-core. Thanks for your consideration!

3 replies

eliasdefaria Jun 5, 2026
Collaborator Author

Hey @jc00ke! Thanks so much for your feedback. This has been the most common piece of feedback thus far, so we're considering iterating to introduce a static option for teams that don't need the scale. Can I ask about the scale of your current dbt deployment? How many models are you deploying with and have you found the docs v1 site to be slow from a performance standpoint? i.e. initial load, usability when switching between pages, etc.

jc00ke Jun 6, 2026

Hi @eliasdefaria! It's not that we don't need the scale, is that we don't have compute resources available to us, so if there ended up not being a static option, we'd either have to stay on dbt-core, use something else or build our own. As for number of models, dozens... less than 100. @jenna-jordan can answer that better, plus we just spun up another project that'll probably have at most dozens, not hundreds.

I don't think we've noticed perf issues with initial or subsequent page loading, but we have seen issues with building the manifest.json file. File size tops out at 2Mb (I think more like 1.4 or 1.8, can't recall exactly) but we use Vertica and the view that compiles comments on objects is terribly implemented. I've rewritten it and need to merge it into our fork, but that's an adapter issue only on dbt docs generate.

jenna-jordan Jun 8, 2026

So our project has ~250 models, and we are not experiencing any of the common performance issues with the docs site that larger projects often experience. I anticipate that our dbt projects will always be in the hundreds of models range, never in the thousands of models range, and I would argue that in fact the vast majority of dbt projects.... particularly the dbt projects that rely on open source core (none of the enterprise product offerings) will tend toward the smaller size. I'll point toward this (admittedly a couple years out of date) visualization to provide some evidence for my point:

(source: https://www.getdbt.com/blog/introducing-cross-platform-dbt-mesh)

Static sites are great, because at this point if you are using a platform like GitLab or GitHub for your code sharing & version control, you get a static site for free with it (GitHub/Lab Pages), and this is very easily integrated into CI pipelines. In essence, minimal extra lift to get an operational docs site, and reasonably within a data team's expected skill set.

I fully agree that dbt docs should be able to serve the needs of larger projects and I'm willing to trust y'all's judgement that a server is the best way to accomplish this. However, don't throw the baby out with the bathwater. I bet there is a way to back into a static site deployment option even with a server-style deployment as the prioritized/preferred option, especially with tools like duckdb WASM out there. For example - take a look at what evidence.dev is doing (querying parquet files with duckdb WASM). For projects that are not - and many never - hitting the scale where performance starts to be an issue, a static site is the easiest starting point. And for some of us, it is literally the only option available to us - static site or bust. If there is no static site option, we don't get a data catalog, and dbt 2.0 loses one of the most valuable features (imho) from 1.0.

joshuataylor · 2026-06-05T08:30:58Z

joshuataylor
Jun 5, 2026

Will this be opensource?

The current docs as it exists in dbt Core v1 is sluggish, as it's a single massive file, and is pretty much unmaintained/not cared about, as far as I'm aware.

We ended up giving customers who want schema documentation generated documentation by SchemaSpy, which compiles into a static site, but split into files. Not everything needs to be a single page!! It's really basic, but is good enough.

Context: Where I work, we have a feature where we give customers tables, which are built using dbt, into their databases, synced via Fivetran.

Having looked at OpenMetadata, and seeing how they structure tables/columns/etc as a standardised schema, here is what I've been thinking lately:

Generate a OpenMetadata structure of the built database (tables, columns, etc) from the dbt built artifacts (similar to https://docs.open-metadata.org/v1.12.x/connectors/database/dbt).
This can then be built into a documentation website.

3 replies

eliasdefaria Jun 5, 2026
Collaborator Author

Hey @joshuataylor! Thanks for your feedback.

Will this be opensource?

Yes it will. The current version in the dbt-core repo only contains the minified HTML, JS, and CSS, but all that code is fully OSS under Apache 2 (https://github.com/dbt-labs/dbt-core/tree/main/crates/dbt-docs-server/web/dist). This code is produced by a more readable web app framework, and we're working on OSSing the readable components as well.

eliasdefaria Jun 5, 2026
Collaborator Author

We ended up giving customers who want schema documentation generated documentation by SchemaSpy, which compiles into a static site, but split into files. Not everything needs to be a single page!! It's really basic, but is good enough.We ended up giving customers who want schema documentation generated documentation by SchemaSpy, which compiles into a static site, but split into files. Not everything needs to be a single page!! It's really basic, but is good enough.

This is an interesting idea and something we considered. I'm open to fleshing it out a bit more, but getting that to work with parquets would ultimately require loading them into the browser and running data operations on them there. Otherwise, we'd need to change our storage format that JS could natively parse and deal with per-page (like a JSON artifact). Alternatively, we could try this with DuckDB WASM, but not quite as scalable as a server-side implementation. We're actively discussing going this direction based on all the feedback, so will keep you posted with the direction we go.

joshuataylor Jun 5, 2026

Also, just wanted to also say, the UI looks amazing on this.

I actually started a project at work to figure out how to instrospect the main Rails application to provide table documentation/lineage, showing how table lineage back to Rails models/controllers/routes etc (it's a fairly large Rails app), so this has been good timing :-).

I'm really impressed how far the web has come in the last few years for "static HTML".

pempey · 2026-06-05T13:10:23Z

pempey
Jun 5, 2026

My Criticism

I will be blunt, I am once again getting a lot of "trust me I know best" energy from this post and docs v2 in general. This post feels like an attempt at justification of a decision and not a discussion.

The previous site was an artifact produced by dbt that could branded and modified. The new site is part of dbt and can only me modified if I fork the repo and compile dbt. That is a rather big regression in my opinion. The need to have and maintain hardware that not only has to act as a server but have to be able to run dbt is a regression over the many ways to host a static website.

Why are there features of doc v2 that are only useful if you have a way to combine all of your run data together, yes dbt Platform can do that but there has been to guidance or support provided on how one would do that without dbt Platform. The features that require dbt Platform to work sit as unusable clutter the user experience. If feels like it has added friction for the express purpose of driving people to the managed service of dbt Platform. That does not feel like being a good steward of an open source tool.

My Supportive Feedback

I think that the dbt Index data is very nice, but I feel like there are ways that it can be used with a static site, can that be a an option? Maybe bring back the generate command to output a static version that can be powered by the parquets files? Can we get a config file that lets us brand and set up the parts of the site we want displayed?

I have other feedback about the functioning of the site that I have already shared on the Slack channel and I will link that here.
https://getdbt.slack.com/archives/C088YCAB6GH/p1780410502991229

4 replies

graciegoheen Jun 8, 2026
Maintainer

Hi @pempey - thanks for sharing the candid feedback. We are still in an alpha phase here, so are very much open for discussion! That's exactly why we started this discussion forum - to get people trying it out and providing feedback so we can iterate ahead of beta and final release.

Even though dbt-docs is used heavily by many dbt-core users (I personally cited it as my favorite dbt feature when I first applied to come work here!), it currently breaks down at scale:

Our goal with this project is to provide a re-vamped dbt-docs experience that is performant and modern.

That is a rather big regression in my opinion.

We definitely do not want this to feel like a regression, so thank you for flagging. Could you say more about the branding and modifications you were making to the previous html artifact? What were you changing and how?

pempey Jun 8, 2026

@graciegoheen I am aware of the scale issues, the Gitlab docs site has in the past been used as an example of the problems with the docs v1 site and I have long asked and looked for ways to make it better. When I saw that an improved docs site was one of the first issues in what was the public side of the Fusion development I was very happy.

At present we maintain the index.html that is produced from docs generate with a modified <head> that points to our own stylesheet and logos as part of our repo . We have an overview docs block as per the documentation to provide a custom instructions and links on the landing page. We keep the <body> of the html updated as part of out dbt version upgrades. This allows us to control logos, fonts, colors, and so forth so that when out business users use the page it can feel like an internal site and not a product site. When out docs site was public we also had to remove the row counts from the docs site and we did that by editing the catalog.josn before it was hosted on the static site. This was part of our CI job that manages the docs site.

eliasdefaria Jun 8, 2026
Collaborator Author

Hey @pempey thanks for your feedback as always! Few quick comments on customization:

I actually added support for the overview docs macro in Fusion awhile back for docs v1 parity. We should definitely continue to support this in v2. Thanks for pointing that out.
We continue to use a single HTML file as the starting point for the web app that is served. See here. What if we let you pass in your own version via a flag? I'm thinking dbt docs serve --html path/to/file.html. Then we'd just serve that one instead.

Lmk what you think

pempey Jun 10, 2026

@eliasdefaria being able to point to an other html file may allow for some custom configuration. With the current site our css file overrides what is coded into the static html file, but the v2 docs has its own css and I wonder how hard it would be to maintain our overrides. If v2 is updated as infrequently as v1 was this may not be an issue, but if v2 is moving to be a maintained feature than the workaround may tun into a maintenance headache.

graciegoheen · 2026-06-08T21:24:46Z

graciegoheen
Jun 8, 2026
Maintainer

@dtaniwaki I saw you opened a PR to add a new page to dbt-docs to display config information about a given node. Including it in this discussion to get feedback on what other community members think!

0 replies

dbeatty10 · 2026-06-09T17:19:25Z

dbeatty10
Jun 9, 2026
Maintainer

We're gonna host a community feedback / office hours session coming up in two weeks, and we'd love to have you join and hear your feedback!

Wednesday, 24 June, 9am Pacific: dbt docs for v2

👆 Click here to register, and we'll email you a link to join.

0 replies

eckesru · 2026-06-10T21:12:48Z

eckesru
Jun 10, 2026

I just wanted to mention that it sounds extemly exciting. Documentation is very important, and dbt docs v1 has been a great way of providing a very good automated documentation in our company. We have a growing project (500+ models) where having all these new features sound very exciting.

What I want to bring to attention is, that it is important to keep providing a way of having an static version of a documentation, even if it might lack some exciting features which you are planning, which can be hosted by the customer on gitlab or github pages or whereever easily. I'm confident with that most data teams won't have the resources, capacity or budget to get an server running which will host the dbt docs. It would be really a pity if you spend all these great effort in investing into dbt docs v2 just to have the users not use it due to being forced to actually host it. Also, for getting users to migrate from v1 to v2.

Btw, we can not confirm the freezing in the browser or anything within our dbt docs, even if there are 50-100 nodes in the lineage.

0 replies

We are re-building dbt-docs for speed & scale #13080

Uh oh!

Uh oh!

eliasdefaria Jun 1, 2026 Collaborator

Goodbye static site, hello server

Docs v2 gets better when it is connected

The fun part: try it out!

Replies: 6 comments · 10 replies

Uh oh!

Uh oh!

eliasdefaria Jun 5, 2026 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eliasdefaria Jun 5, 2026 Collaborator Author

Uh oh!

eliasdefaria Jun 5, 2026 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

My Criticism

My Supportive Feedback

Uh oh!

graciegoheen Jun 8, 2026 Maintainer

Uh oh!

Uh oh!

eliasdefaria Jun 8, 2026 Collaborator Author

Uh oh!

Uh oh!

graciegoheen Jun 8, 2026 Maintainer

Uh oh!

Uh oh!

dbeatty10 Jun 9, 2026 Maintainer

Uh oh!

Uh oh!

eliasdefaria
Jun 1, 2026
Collaborator

Replies: 6 comments 10 replies

eliasdefaria Jun 5, 2026
Collaborator Author

eliasdefaria Jun 5, 2026
Collaborator Author

eliasdefaria Jun 5, 2026
Collaborator Author

graciegoheen Jun 8, 2026
Maintainer

eliasdefaria Jun 8, 2026
Collaborator Author

graciegoheen
Jun 8, 2026
Maintainer

dbeatty10
Jun 9, 2026
Maintainer