New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop rewriting 404s #170

Open
GoogleCodeExporter opened this Issue Apr 6, 2015 · 11 comments

Comments

Projects
None yet
5 participants
@GoogleCodeExporter

GoogleCodeExporter commented Apr 6, 2015

What steps will reproduce the problem?
1. Go to http://www.modpagespeed.com/mod_pagespeed_beacon
2. Look at the source and notice that it was rewritten even though it's a 404 
page

My first thought was that a 404 page should not be rewritten. But maybe it 
doesn't hurt. In this case we are adding instrumentation to the 404 page, which 
seems like a bad idea because it seems like a 404 page shouldn't have resources.

What does everyone else think?

Original issue reported on code.google.com by sligocki@google.com on 30 Dec 2010 at 9:28

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Skip 'em.  At best we end up using extra resources in the face of a DoS 
attempt.  I think I've seen this before.

But the failures on mod_pagespeed_beacon are particularly odd.  I tried these 
on my machine and mod_pagespeed_beacon worked...

Original comment by jmaes...@google.com on 30 Dec 2010 at 10:57

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Original comment by sligocki@google.com on 11 Jan 2011 at 9:50

  • Changed state: Started
  • Added labels: Type-Enhancement
  • Removed labels: Type-Other
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Fixed in r373 in trunk

Original comment by sligocki@google.com on 13 Jan 2011 at 10:56

  • Changed state: Fixed
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Original comment by sligocki@google.com on 20 Jan 2011 at 4:39

  • Added labels: release-note
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Original comment by sligocki@google.com on 20 Jan 2011 at 4:40

  • Changed title: Stop rewriting 404s
@jeffkaufman

This comment has been minimized.

Contributor

jeffkaufman commented Jun 15, 2015

It may make sense to revisit this. 404 pages are often relatively rich, with lots to optimize. Neither the introduction of PLT beacons nor the DOS attack reasoning makes much sense to me. Is there any other reason to keep this restriction?

@jeffkaufman jeffkaufman reopened this Jun 15, 2015

@jeffkaufman

This comment has been minimized.

Contributor

jeffkaufman commented Jun 15, 2015

@sligocki

This comment has been minimized.

Contributor

sligocki commented Jun 15, 2015

My 4 year old memory is a bit fuzzy on this :) but the reasoning I remember is: A) We do not expect much content on 404s, B) We don't expect high percentage of traffic to be for 404, C) We really don't want to break 404 pages (ex: somehow obfuscate the error message).

But if we're not worried about (C), then maybe there's no harm. I certainly don't feel strongly and think it would be fine if we changed this back.

@jmaessen

This comment has been minimized.

Contributor

jmaessen commented Jun 15, 2015

On Mon, Jun 15, 2015 at 5:39 PM, Shawn Ligocki notifications@github.com
wrote:

My 4 year old memory is a bit fuzzy on this :) but the reasoning I
remember is: A) We do not expect much content on 404s, B) We don't expect
high percentage of traffic to be for 404, C) We really don't want to break
404 pages (ex: somehow obfuscate the error message).

But if we're not worried about (C), then maybe there's no harm. I
certainly don't feel strongly and think it would be fine if we changed this
back.

I seem to recall in particular that folks serving huge 404 pages and sites
with lots of missing links (or just a missing favicon.ico) ran into
problems whenever (say) googlebot came a-visiting.

Arguably we should actually be replacing some 404 pages with
content-length: 0 404s, as this has produced high returns for Chrome mobile
bandwidth reduction.


Reply to this email directly or view it on GitHub
#170 (comment)
.

@jeffkaufman

This comment has been minimized.

Contributor

jeffkaufman commented Jun 16, 2015

Arguably we should actually be replacing some 404 pages with
content-length: 0 404s, as this has produced high returns for Chrome mobile
bandwidth reduction.

I don't think this is something we could do by default. Many sites use their 404 pages to offer people suggestions about how to find what they're looking for.

I do think 0-length 404s would be good for favicon.ico, apple-touch-icon.png, and apple-touch-icon-precomposed.png. On mobile browsers that fetch these by default (not Chrome anymore) they were accounting for 3%-4% of bytes downloaded: https://bugs.webkit.org/show_bug.cgi?id=104585

@morlovich

This comment has been minimized.

Contributor

morlovich commented Jun 16, 2015

Note that there are practical difficulties rewriting 404s on Apache 2.4
with our current filter and the way it's configured.
I don't think it's unsolvable, but config backwards compat may be tricky.

IIRC it implements AddOutputFilterByType in mod_filter (2.2 had a
different impl in core.c), which eventually dispatches to:

static apr_status_t filter_harness(ap_filter_t *f, apr_bucket_brigade *bb)
{
apr_status_t ret;
#ifndef NO_PROTOCOL
const char *cachecontrol;
#endif
harness_ctx *ctx = f->ctx;
ap_filter_rec_t *filter = f->frec;

if (f->r->status != 200
    && !apr_table_get(f->r->subprocess_env, "filter-errordocs")) {
    ap_remove_output_filter(f);
    return ap_pass_brigade(f->next, bb);
}

... I guess the filter-errordocs may be an option, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment