Skip to content

fix: add output: 'blog' to publish workflow to fix /output/ URL leak#179

Merged
alamb merged 1 commit into
apache:mainfrom
kevinjqliu:fix/publish-output-blog
May 11, 2026
Merged

fix: add output: 'blog' to publish workflow to fix /output/ URL leak#179
alamb merged 1 commit into
apache:mainfrom
kevinjqliu:fix/publish-output-blog

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented May 10, 2026

What changes are included in this PR?

Relates to #178

Adds output: 'blog' to publish-site.yml, matching stage-site.yml.

The publish workflow was missing this parameter, so the pelican action defaulted to output: 'output'. This caused content to land in output/ on asf-site instead of blog/. Since .asf.yaml uses subdir: blog, an .htaccess with rewrite rules was added as a workaround (see INFRA-27512) to internally map requests into output/.

Those rewrite rules have a regex bug: the file-extension check (\.[^./]+$) matches .0 at the end of version-number slugs, skipping the trailing-slash redirect. Apache's mod_dir then adds the slash but exposes the internal output/ path:

  • /blog/2026/04/18/datafusion-comet-0.15.0 → 301 → /blog/output/...
  • /blog/2026/04/02/datafusion-53.0.0 → 301 → /blog/output/...

What this PR does

With output: 'blog', the pelican action puts built content into blog/ on asf-site, matching what .asf.yaml (subdir: blog) expects. This is the same configuration stage-site.yml already uses for the staging site.

Deployment safety

This PR is non-disruptive. After it deploys:

  • blog/ is created with fresh content on asf-site
  • output/ still exists with the old content
  • .htaccess continues to rewrite requests to output/ — the site keeps working exactly as before
  • No user-visible change until the follow-up PR (Remove stale output/ directory and simplify .htaccess #180) is merged

Follow-up

After this deploys, PR #180 (targeting asf-site) should:

  1. Remove the rewrite rules from .htaccess (keeping only the CSP directive)
  2. Remove the stale output/ directory

At that point, .asf.yaml's subdir: blog serves content directly from blog/ — the same pattern the staging site uses successfully.

publish-site.yml was missing the output parameter, so the pelican
action defaulted to 'output' instead of 'blog'. This caused a mismatch
with .asf.yaml (subdir: blog), requiring .htaccess rewrite rules that
incorrectly exposed /output/ in redirects for blog posts with version
numbers in their slug (e.g. datafusion-comet-0.15.0).

Aligns publish-site.yml with stage-site.yml which already sets
output: 'blog'.
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 11, 2026

thanks @kevinjqliu -- this looks better to me. I got really confused at some point when the pubishing flow got changed and I couldn't figure out how why the blog wasn't published anymore

See some more backstory here:

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- let's give it a try

@alamb alamb merged commit 0d36814 into apache:main May 11, 2026
4 checks passed
@kevinjqliu kevinjqliu deleted the fix/publish-output-blog branch May 11, 2026 13:55
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 12, 2026

Thus I reverted that

However, I can't revert this PR by myself -- @kevinjqliu can you please make a PR to do so (so I can approve and merge that one)?

I am worried we now have two copies of the site (in output and in blog) -- I want a single output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants