Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

need a way to selectively keep some comments #237

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 9 comments
Closed

need a way to selectively keep some comments #237

GoogleCodeExporter opened this issue Apr 6, 2015 · 9 comments

Comments

@GoogleCodeExporter
Copy link

some google services rely on some specific comments to better interpret 
(semantically) the content of web pages.

for example, the google-adsense crawler (user-agent Mediapartner-Google) uses 
the comments:

<!-- google_ad_section_start(weight=ignore) --> 
...
<!-- google_ad_section_end --> 

and

<!-- google_ad_section_start --> 
...
<!-- google_ad_section_end --> 

to ignore (or to put emphasis on) some sections of text for better targeting 
advertising that should be related to the significant content of the page.

see adsense section targeting:
https://www.google.com/adsense/support/bin/answer.py?hl=en&answer=23168

stripping those comments would cause lower quality ad targeting in pages using 
google adsense.

so mod_pagespeed should have a way to selectively keep some particular comments 
(e.g. a text file that contains the comments to keep).

Original issue reported on code.google.com by loupi...@gmail.com on 12 Mar 2011 at 3:30

@GoogleCodeExporter
Copy link
Author

I think this is a good general purpose enhancement request.

I've also heard this request relative to copyrights, so maybe a regex/wildcard 
pattern for comment retention would be sensible, and would be ad-vendor 
agnostic.

This should probably apply not just to HTML comments but CSS and Javascript too.

Original comment by jmara...@google.com on 14 Mar 2011 at 2:25

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

@GoogleCodeExporter
Copy link
Author

yeah, a reg-exp would be a good solution and it would cover all cases.

e.g.
<!-- google_ad_section_[^>]*-->

would work for the adsense targetting, i suppose.

just make sure to tell exactly what "flavor" of regexp, e.g. in some regexp 
engines, \< means "word start" so angle brackets must not be backslash-escaped, 
while in some other (perl?) angle brackets must be backslash-escaped.

Original comment by loupi...@gmail.com on 14 Mar 2011 at 9:36

@GoogleCodeExporter
Copy link
Author

How would you feel about using simple wildcards for this in lieu of regexps?  
We are already supporting wildcards for ModPagespeedAllow/Disallow and domain 
configuration.  Is there something you'd want regexps for in this case where 
wildcards would not suffice?

Original comment by jmara...@google.com on 14 Mar 2011 at 9:45

@GoogleCodeExporter
Copy link
Author

i would vastly prefer regexp. shell-type wildcards are used in file names, but 
the problem is that it's difficult control what the * is matched against (i 
suppose * is not "greedy" otherwise it would not work at all - i.e. it matches 
the shortest string).
wildcards are much less powerful for special cases.

Original comment by loupi...@gmail.com on 14 Mar 2011 at 10:00

@GoogleCodeExporter
Copy link
Author

That makes sense.  The flip-side is that our existing interfaces use wildcards 
and (to a lesser degree of importance) we already have a good wildcard 
integration integrated into our architecture.  Most importantly, IMO regexps 
are hard to use for simple things.  But they are strictly more powerful.

Can you give a specific example of a comment pattern that you'd want to match 
with regexps that would be harder with wildcards?  The example you gave above 
in wildcards is just

<!-- google_ad_section_*-->

Original comment by jmara...@google.com on 15 Mar 2011 at 1:32

@GoogleCodeExporter
Copy link
Author

These comments are used to hide content from the ht://Dig search spider:

<!--htdig_noindex-->
<!--/htdig_noindex-->

The wildcards match would be:

<!--*htdig_noindex-->

Original comment by dun...@chirp.com.au on 17 Mar 2011 at 9:52

@GoogleCodeExporter
Copy link
Author

Another use-case from mod-pagespeed-discuss where wildcarding would help:


On Thu, Apr 7, 2011 at 10:41 AM, Dave <davidcroda@gmail.com> wrote:
....
Just wondering if it would be a feasible feature.  I like the html
comment stripping functionality, but I cannot use it while running a
Google Website Optimizer multivariate experiment because it strips the
various <!-- utmx section name="section-name" --> comments.

It would be nice if there was a way to exclude these, but probably not
high priority.

Original comment by jmara...@google.com on 7 Apr 2011 at 2:56

@GoogleCodeExporter
Copy link
Author

One would never be able to implement something like,

http://00f.net/2010/09/22/transparent-client-side-fragment-cache/

without ability to selectively preserve comments.

Original comment by webmas...@clubsilver.org on 11 Apr 2011 at 2:18

@GoogleCodeExporter
Copy link
Author

Fixed in r652.  Note that this has not been released as a binary yet.

Original comment by jmara...@google.com on 28 Apr 2011 at 7:45

  • Changed state: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant