Reverse proxy with body content modify ability #1164

Open
gate2 opened this Issue Nov 6, 2014 · 5 comments

Comments

Projects
None yet
4 participants
@gate2

gate2 commented Nov 6, 2014

Hello, I want to thank you all for developed this great webserver with easy to config admin interface.
And I would like to make a suggestion to the reverse proxy module:
Reverse proxy module will return the html body unmodified, but when website contain hard links(eg. example.com/url instead of /url) in html, the link go to original source. Could you please add the filter function into the reverse proxy module, and let user set urls to be replaced with new urls
similar mod in apache(I didn't try it): http://httpd.apache.org/docs/trunk/mod/mod_substitute.html and in nginx(didn't try it either): http://wiki.nginx.org/HttpSubsModule

@Borkason Borkason changed the title from Suggestion: Reserve proxy with body content modify ability to Suggestion: Reverse proxy with body content modify ability Nov 6, 2014

@skinkie

This comment has been minimized.

Show comment
Hide comment
@skinkie

skinkie Nov 6, 2014

Member

I think regular expressions could be a nice addition. But there are some hairy details such as gzip encoded bodies et al. I am certainly not opposed to this, but should be user friendly :)

Member

skinkie commented Nov 6, 2014

I think regular expressions could be a nice addition. But there are some hairy details such as gzip encoded bodies et al. I am certainly not opposed to this, but should be user friendly :)

@skinkie skinkie changed the title from Suggestion: Reverse proxy with body content modify ability to Reverse proxy with body content modify ability Nov 6, 2014

@einsty

This comment has been minimized.

Show comment
Hide comment
@einsty

einsty Nov 6, 2014

I agree on the hairy details comment... be very careful here. HTML is not a string (see first answer in this post on SO. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags).

Those apache and nginx libraries exist and they do work but it is _extremely sensitive and the development of those rules should not be done in a live web server environment. You _will crash your server, or at a minimum destroy the output of your site at some time, regardless of how awesome you are at regular expressions. There must be a full test suite provided alongside this feature if it is to be provided in a management UI. (just my two cents).

Additionally, there is (at least one) Java library that supports an approach to modification of (X)HTML documents in a much more robust fashion. Perhaps a port of a library like jsoup or htmlcleaner might be more effective consideration than just plain ol' regex?

einsty commented Nov 6, 2014

I agree on the hairy details comment... be very careful here. HTML is not a string (see first answer in this post on SO. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags).

Those apache and nginx libraries exist and they do work but it is _extremely sensitive and the development of those rules should not be done in a live web server environment. You _will crash your server, or at a minimum destroy the output of your site at some time, regardless of how awesome you are at regular expressions. There must be a full test suite provided alongside this feature if it is to be provided in a management UI. (just my two cents).

Additionally, there is (at least one) Java library that supports an approach to modification of (X)HTML documents in a much more robust fashion. Perhaps a port of a library like jsoup or htmlcleaner might be more effective consideration than just plain ol' regex?

@skinkie

This comment has been minimized.

Show comment
Hide comment
@skinkie

skinkie Nov 6, 2014

Member

After you port the Java lib to C it would be worthwhile to consider its use ;)

Member

skinkie commented Nov 6, 2014

After you port the Java lib to C it would be worthwhile to consider its use ;)

@einsty

This comment has been minimized.

Show comment
Hide comment
@einsty

einsty Nov 6, 2014

tough but fair :)

einsty commented Nov 6, 2014

tough but fair :)

@gate2

This comment has been minimized.

Show comment
Hide comment
@gate2

gate2 Nov 7, 2014

I'm not a programmer, try to find the C port of jsoup/htmlcleaner, and hear someone suggest for C:
http://www.netsurf-browser.org/projects/libdom/
and
https://github.com/google/gumbo-parser
and for python:
http://www.crummy.com/software/BeautifulSoup/
will these help?

gate2 commented Nov 7, 2014

I'm not a programmer, try to find the C port of jsoup/htmlcleaner, and hear someone suggest for C:
http://www.netsurf-browser.org/projects/libdom/
and
https://github.com/google/gumbo-parser
and for python:
http://www.crummy.com/software/BeautifulSoup/
will these help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment