Export documentation site as a set of static pages. #53

drewbanin · 2019-11-12T17:27:31Z

@philippe-lavoie commented on Tue Nov 12 2019

Describe the feature

Shipping a zip of static HTML pages makes it easy to ship internal documentation to others. I'd use something like Gatsby to create the static pages. But to stick to a Python pipeline, something like Pelican could be used. I have never tried it though.

Describe alternatives you've considered

An alternative would be to generate a PDF file corresponding to the documentation, but would have to include the node graph and in practice, this might be a true alternative in the sense that it improves portability.

Additional context

I wanted to ship what I had to a client but I can't expose an internal site nor can easily create a locked-down public site for them. Shipping a site.zip would make this trivial.

Who will this benefit?

Anyone that wants to more easily share what's going on inside all those DBT models. Also, with a static site, you can add the documentation has a sub-folder of a larger site which makes exposing documentation even easier.

@philippe-lavoie commented on Tue Nov 12 2019

The documentation states that the site is static. However, when loading directly from the index.html page, I get

dbt Docs was unable to load the mantifest file at path: manifest.json?cb=1573569439241

Perhaps, just the fix the fetch to use the above, but when it fails defaults to manifest.json instead ?

@drewbanin commented on Tue Nov 12 2019

hey @philippe-lavoie - the site is "static" in that you don't need to run a database or an application webserver to use it. The site does however load a few .json files that contain information about your project code and the state of the database.

You're seeing this problem because your web browser is disallowing the docs site to request other local files (eg. file://Users/drew/my_project/target/manifest.json). This is a security feature intended to prevent sites from reading arbitrary files on a user's hard drive.

I think that if we were to do something here, it would be to embed the json data directly into the index.html file. That way, you wouldn't need to send along a .zip file with a bunch of html files inside of it - you'd just have a single index.html file that showed all of the docs.

I'm going to transfer this issue over to the docs repo for further discussion. Let me know if you have any questions or thoughts!

The text was updated successfully, but these errors were encountered:

larsbkrogvig · 2020-06-15T20:52:05Z

Hey, let me add a +1 to this one. I'm trying to host the docs website with some sort of authentication, but I seem to run into problems with this whenever I enable it. Could be that I'm missing something here since this is well outside my comfort zone, but it feels related!

justinwagg · 2020-09-13T19:26:05Z

Is there a recommended solution to this? I'm running into the same thing. The docs state you can host from s3 but I'm also getting

dbt Docs was unable to load the manifest file at path: 
  manifest.json?cb=1600024044199

larsbkrogvig · 2020-09-14T09:19:56Z

Since posting above I found a solution for GCP users that work quite well: Host the docs website with AppEngine and wrap it in Google Auth with Identity-Aware Proxy (IAP). This is relatively (but not quite) straightforward and does exactly what I wanted.

drewbanin · 2020-09-14T17:42:22Z

@justinwagg sounds like you might just need to upload the manifest.json, catalog.json, and run_results.json files to the same path as the index.html file in S3 (or similar!). The index.html file just contains the skeleton of the website, but all of the actual docs information comes from these json files. Want to give that a spin and let us know how it goes?

justinwagg · 2020-09-14T20:46:54Z

Interesting @larsbkrogvig I will have look into it, thank you for the tip! @drewbanin my bucket looks like this

dbt-docs-bucket
├── catalog.json
├── graph.gpickle
├── index.html
├── manifest.json
├── partial_parse.pickle
├── run_results.json
├── sources.json
├── compiled/
└── run/

but I'm getting

dbt Docs was unable to load the manifest file at path: 
  manifest.json?cb=1600116304939

Error: Bad Request (400)

The dbt Docs site may not work as expected if this file cannot be found.Please try again, and contact support if this error persists.

when accessing index.html. The page loads, but obviously without data. Perhaps not a dbt issue and something more to do with permissions given the 400.

drewbanin · 2020-09-15T00:48:06Z

huh! a 400?? That status code would be served up by S3 I think.... I'm super with you that this points to a permissions or configuration issue. The bucket ls you showed here looks 100% right to me!

I'm unsure if this is the right answer for your org, but more docs on s3 permissions can be found here: https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteAccessPermissionsReqd.html

justinwagg · 2020-09-15T14:10:29Z

Yes, I agree and think this has more to do with GCS permissioning/setup than dbt. Once/if I figure it out I can follow up with step by step for future users who want to host privately for their org via gcs bucket. Thanks for the follow ups!

larsbkrogvig · 2020-09-15T14:41:55Z

To host the docs website with GCS without any authetication should work in the sense that I've been able to get it to work before. What I said above about AppEngine is for when you don't want the docs to be publicly available

RMHogervorst · 2021-03-12T15:40:05Z

But this should work locally right? Because I get the same error, dbt Docs was unable to load the manifest file at path on both Chrome and Firefox.

RMHogervorst · 2021-03-12T15:57:59Z

For local development you could use the python native server:
navigate to target/ and run python -m http.server (or python3 -m http.server)

RMHogervorst · 2021-03-13T10:42:55Z

And I just realized there is also a dbt docs serve command that does the same thing... 🤦

alieus · 2021-04-09T20:47:12Z

@justinwagg any chance you were able to find a solution? I am facing the same issue trying to host on an azure WebApp

philipp-heinrich · 2021-04-15T08:35:30Z

@larsbkrogvig do you mind sharing some insights on how you set it up?

larsbkrogvig · 2021-04-17T09:00:50Z

This is my app.yaml

service: default
runtime: python37

handlers:

- url: /
  static_files: public/index.html
  upload: public/index.html

- url: /
  static_dir: public

- url: /.*
  secure: always
  redirect_http_response_code: 301
  script: auto

This is my deploy script:

#!/bin/sh
dbt docs generate --project-dir my-dbt --target prod

cp my-dbt/target/catalog.json my-docs/public/
cp my-dbt/target/manifest.json my-docs/public/
cp my-dbt/target/run_results.json my-docs/public/
cp my-dbt/target/index.html my-docs/public/

gcloud app deploy my-docs --project my-project --quiet

And then I control access to it with IAP:

philipp-heinrich · 2021-04-26T15:59:51Z

@larsbkrogvig Thanks a lot! runs perfectly.

worknate · 2021-10-05T20:33:30Z

We were never able to get the static files working locally. I suspect this is a security feature of modern browsers (i.e. https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp)

For local development, we instruct developers to use dbt docs generate && dbt docs serve
Our project file has the line target-path: "docs" and we configure our (private) GitHub Pages to master branch /docs folder

RMHogervorst · 2021-10-06T05:19:26Z

I did get it to deploy on github like here https://github.com/RMHogervorst/dbt_postgresql

giovanni-girelli-sdg · 2021-10-07T12:54:48Z

After exploring a bit, the issue with GCS has to do with the fact that when you load a page containing an authenticated link to a GCS bucket file, i.e. something like

https://storage.cloud.google.com/REGION-BUCKET/PATH?authuser=0

you get redirected to a completely different URL. Again, no expert, but discussing this with google they told me that it is entirely impossible to make this work. Either you use a domain, or you go through App Engine or the solution they push the most now, Cloud Run.

I know this doesn't help, but I never stopped hoping and now I can.

vergenzt · 2021-10-14T18:29:32Z

I went down a bit of a rabbit hole researching this issue so am snapshotting some thoughts I had here. They're not very well structured, so apologies in advance if it's a bit of a brain dump.

Main idea

The contents of manifest.json and catalog.json could be inlined into index.html using <script type="application/json" id="..."> tags, the inclusion of which could be triggered by an --inline-dbt-assets flag passed to dbt docs generate.

One way to accomplish this could be to add a static string placeholder to src/index.html similar to:

diff --git a/src/index.html b/src/index.html
index 23681ca..ec11238 100644
--- a/src/index.html
+++ b/src/index.html
@@ -26,3 +26,5 @@
         <div ui-view></div>
+
+        <!-- PLACEHOLDER__DBT_DOCS_GENERATE_INLINED_RESOURCES -->
     </body>
 </html>

and then if dbt docs generate sees an --inline-dbt-assets flag, then it could do a regex replace on dbt-core's compiled index.html asset at user-runtime, replacing the placeholder comment with:

<script type="application/json" id="inlined-manifest"> ... (insert html-escaped contents of manifest.json here) </script>
<script type="application/json" id="inlined-catalog"> ... (insert html-escaped contents of catalog.json here) </script>

Using an actual HTML parser for the injection would probably be more elegant, but also would be heavier and likely add otherwise-unnecessary dependencies. As long as the placeholder comment is sufficiently unique and care is taken to HTML-escape the contents of the artifacts, I don't see what else would go wrong.

The only other change would be to this project's src/app/services/project_service.js to first check if there are #inlined-manifest / #inlined-catalog elements in the DOM before sending an XHR request, and short circuit with the inlined content if so.

One challenge I foresee with this approach though is that it could be easy for an inlined target/index.html to get out of sync with target/manifest.json without it being apparent that that's the case.

One possible solution to that challenge could be adding a small header or footer to the docs site itself that includes the dbt version used to generate the site and the date and time when the site was generated, both of which would be sourced from manifest.json's metadata key. This way there is at least a mechanism for determining if you're looking at an up-to-date copy of your project's docs site.

E.g. <footer><small>Generated by dbt v0.21.0. Project manifest compiled <time datetime="2021-10-14T17:31:22.660230Z">on Oct 14, 2021, 1:31 PM EDT</time>. Database catalog compiled on <time datetime="2021-10-14T17:31:33.518894Z">on Oct 14, 2021, 1:31 PM EDT</time>.</small></footer>

4sushi · 2022-01-12T13:36:51Z

What is the reason that the content of the file manifest.json and catalog.json are not directly inserted into the HTML file index.html during the documentation generation?

If you do this, you don't need a web server to see the documentation. There are significant benefits:

In local, you can directly open the index.html without CORS restriction (https://en.wikipedia.org/wiki/Cross-origin_resource_sharing)
You can upload and host the documentation in some cloud storage, like Google Cloud Storage. For the moment, it's not the case. So we need to use some web services from the Cloud.
You have a single page application, so you can share your it by email.

Python script to build a single page documentation

To solve this problem and host my documentation on Google Cloud Storage (GCS), I write this python script:

import json

search_str = 'o=[i("manifest","manifest.json"+t),i("catalog","catalog.json"+t)]'

with open('target/index.html', 'r') as f:
    content_index = f.read()
    
with open('target/manifest.json', 'r') as f:
    json_manifest = json.loads(f.read())

with open('target/catalog.json', 'r') as f:
    json_catalog = json.loads(f.read())
    
with open('target/index2.html', 'w') as f:
    new_str = "o=[{label: 'manifest', data: "+json.dumps(json_manifest)+"},{label: 'catalog', data: "+json.dumps(json_catalog)+"}]"
    new_content = content_index.replace(search_str, new_str)
    f.write(new_content)

Source: https://data-banana.github.io/dbt-generate-doc-in-one-static-html-file.html

Now you can use index2.html as single page documentation.

amirbtb · 2022-01-14T23:33:44Z

@4sushi your solution is far from perfect but it saved my life !
In order to avoid an oversized index2.html I'm forced to use node selection when running dbt docs generate + IGNORE_PROJECTS like you did in DBT - Generate doc in one static HTML file.
Thank you for the hack 🙏🏽

chris-gaia-lens · 2022-03-22T17:10:26Z

@4sushi Thank you for your snippet, worked for me

Crimsonabyss · 2022-06-23T07:01:05Z

@4sushi thanks very much for your script, worked for me!

bbrewington · 2022-08-09T14:06:50Z

@drewbanin any update on this? It looks like @4sushi's implementation works well - see comment above on 2022-01-12

What is needed to get this added to the product? That solution uses Python, so not entirely sure where it would fit in the actual docs generation process, but would love for this to get prioritized since it would make working with the docs MUCH easier!

sithson · 2022-11-21T17:04:30Z

@drewbanin any update on this? It looks like @4sushi's implementation works well - see comment above on 2022-01-12

What is needed to get this added to the product? That solution uses Python, so not entirely sure where it would fit in the actual docs generation process, but would love for this to get prioritized since it would make working with the docs MUCH easier!

Well..... dbt is written in Python, so I guess the odds are fairly low.
Also, creating multiple files is more complex than using a single file only.

gbatiz · 2023-04-21T13:50:27Z

Any update on this one?

mescanne · 2023-09-07T17:58:38Z

This would simplify things for many use cases if we can.

I'd like to propose a solution (and I'll likely put forward a PR) to do the following:

Modify loadFile to check a local inline dictionary that has mappings for manifest.json and catalog.json. The default values will be unique strings. If the inline dictionary has a string (not a dictionary), then the inline dictionary will be skipped.
Modify GenerateTask to have an additional flag to generate a static index. If the flag is set, the code path towards the end of generation will re-open the index.html and substitute the manifest.json and catalog.json data based on the unique strings directly in the index.html.

Why?

This respects the existing webpack process as much as possible.
It retains the existing functionality and behaviour, while adding a new option for those who desire it.

Any feedback on this proposal is appreciated. I will likely translate it into a PR in either case.

github-actions · 2024-03-21T01:43:41Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

vergenzt · 2024-03-22T02:01:47Z

Still relevant and would still be helpful IMO. (Commenting to chase off Stalebot.)

mescanne · 2024-03-22T07:00:14Z

I think this is done and should be closed. @vergenzt - what is missing?

vergenzt · 2024-03-22T14:04:45Z

Oooh I hadn't seen that feature! 😄 Never mind then! Yay! Resolved by dbt-labs/dbt-core#8615 it looks like 🙂

drewbanin mentioned this issue Nov 12, 2019

Export documentation site as a set of static pages. dbt-labs/dbt-core#1916

Closed

github-actions bot added the triage label Sep 7, 2023

mescanne mentioned this issue Sep 11, 2023

Enable easy creation of static index.html pages #465

Merged

5 tasks

mescanne mentioned this issue Sep 11, 2023

[CT-3105] [Feature] Generate static documentation (static_index.html) dbt-labs/dbt-core#8614

Closed

3 tasks

jtcohen6 removed the triage label Sep 22, 2023

joppevos mentioned this issue Dec 5, 2023

Extend DbtDocsLocalOperator with --static flag astronomer/astronomer-cosmos#746

Closed

github-actions bot added the Stale label Mar 21, 2024

github-actions bot added the triage label Mar 22, 2024

github-actions bot removed the Stale label Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export documentation site as a set of static pages. #53

Export documentation site as a set of static pages. #53

drewbanin commented Nov 12, 2019

larsbkrogvig commented Jun 15, 2020

justinwagg commented Sep 13, 2020

larsbkrogvig commented Sep 14, 2020

drewbanin commented Sep 14, 2020

justinwagg commented Sep 14, 2020

drewbanin commented Sep 15, 2020

justinwagg commented Sep 15, 2020

larsbkrogvig commented Sep 15, 2020

RMHogervorst commented Mar 12, 2021

RMHogervorst commented Mar 12, 2021

RMHogervorst commented Mar 13, 2021

alieus commented Apr 9, 2021

philipp-heinrich commented Apr 15, 2021

larsbkrogvig commented Apr 17, 2021

philipp-heinrich commented Apr 26, 2021

worknate commented Oct 5, 2021

RMHogervorst commented Oct 6, 2021

giovanni-girelli-sdg commented Oct 7, 2021

vergenzt commented Oct 14, 2021 •

edited

Loading

4sushi commented Jan 12, 2022

amirbtb commented Jan 14, 2022

chris-gaia-lens commented Mar 22, 2022

Crimsonabyss commented Jun 23, 2022

bbrewington commented Aug 9, 2022

sithson commented Nov 21, 2022

gbatiz commented Apr 21, 2023

mescanne commented Sep 7, 2023

github-actions bot commented Mar 21, 2024

vergenzt commented Mar 22, 2024

mescanne commented Mar 22, 2024

vergenzt commented Mar 22, 2024

Export documentation site as a set of static pages. #53

Export documentation site as a set of static pages. #53

Comments

drewbanin commented Nov 12, 2019

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

larsbkrogvig commented Jun 15, 2020

justinwagg commented Sep 13, 2020

larsbkrogvig commented Sep 14, 2020

drewbanin commented Sep 14, 2020

justinwagg commented Sep 14, 2020

drewbanin commented Sep 15, 2020

justinwagg commented Sep 15, 2020

larsbkrogvig commented Sep 15, 2020

RMHogervorst commented Mar 12, 2021

RMHogervorst commented Mar 12, 2021

RMHogervorst commented Mar 13, 2021

alieus commented Apr 9, 2021

philipp-heinrich commented Apr 15, 2021

larsbkrogvig commented Apr 17, 2021

philipp-heinrich commented Apr 26, 2021

worknate commented Oct 5, 2021

RMHogervorst commented Oct 6, 2021

giovanni-girelli-sdg commented Oct 7, 2021

vergenzt commented Oct 14, 2021 • edited Loading

Main idea

4sushi commented Jan 12, 2022

Python script to build a single page documentation

amirbtb commented Jan 14, 2022

chris-gaia-lens commented Mar 22, 2022

Crimsonabyss commented Jun 23, 2022

bbrewington commented Aug 9, 2022

sithson commented Nov 21, 2022

gbatiz commented Apr 21, 2023

mescanne commented Sep 7, 2023

github-actions bot commented Mar 21, 2024

vergenzt commented Mar 22, 2024

mescanne commented Mar 22, 2024

vergenzt commented Mar 22, 2024

vergenzt commented Oct 14, 2021 •

edited

Loading