New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected responses by the WebDAV-interface #1442

Closed
funkyfuture opened this Issue May 16, 2017 · 20 comments

Comments

Projects
None yet
4 participants
@funkyfuture
Contributor

funkyfuture commented May 16, 2017

we're stumbling over an issue with responses of the WebDAV-interface that always returns the same contents, leading to an infinite recursion.

we are accessing the exist-instance via an nginx-proxy with this configuration:

location ~* /collections/?.* {
    rewrite ^/collections/?(.*) /exist/webdav/db/$2 break;
    proxy_pass http://localhost:8000;
}

here's an example from the xml-response to a web browser, but the behaviour is identical with a dav-client like the Oxygen editor is providing:

http://ourhost/collections

<exist:result xmlns="http://exist.sourceforge.net/NS/exist" xmlns:exist="http://exist.sourceforge.net/NS/exist">
    <exist:collection xmlns:exist="http://exist.sourceforge.net/NS/exist" name="db" created="2017-05-05T16:31:42.538+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x">
        <exist:collection name="OurProject" created="2017-05-16T12:31:53.103+02:00" owner="admin" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="apps" created="2017-05-05T16:34:12.242+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="system" created="2017-05-05T16:31:42.557+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
  </exist:collection>
</exist:result>

yet http://ourhost/collections/OurProject yields the same:

<exist:result xmlns="http://exist.sourceforge.net/NS/exist" xmlns:exist="http://exist.sourceforge.net/NS/exist">
    <exist:collection xmlns:exist="http://exist.sourceforge.net/NS/exist" name="db" created="2017-05-05T16:31:42.538+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x">
        <exist:collection name="OurProject" created="2017-05-16T12:31:53.103+02:00" owner="admin" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="apps" created="2017-05-05T16:34:12.242+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="system" created="2017-05-05T16:31:42.557+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
</exist:collection>
</exist:result>

and so on..

looking at nginx' logs, we noticed that all subsequent requests after the initial one simply appended an additional /db to the uri.

so, our first attempt for a workaround was this:

location ~* /collections/?.* {
    rewrite ^/collections(/db)?(/?.*) /exist/webdav/db$2 break;
    proxy_pass http://localhost:8000;
}

now, everything looks as expected in the web browser, but the dav-client always shows an additional (ghost) directory contained in any directory that has the same name as the containing one, but yields a proper 404 when requesting its contents.

i can provide additional information as needed.

this is observed w/ eXist-db 3.2.0 / 691bcd6 and nginx 1.10.0 on a Ubuntu Xenial, but the issue is also known for previous versions in conjunction with the Apache httpd.

@dizzzz

This comment has been minimized.

Member

dizzzz commented May 19, 2017

So from what we have seen (the webdav code is there for quite some time), the interface is pretty stable. When direct accessing webdav, there are very little issues.

So from my point, it looks like nginx is breaking stuff, and the solution should be found in nginx.

So the next step to do is to compare the actual HTTP calls between NGINX and eXist-db (headers, data) with the calls done when connected with eXist-db directly.

@dizzzz

This comment has been minimized.

Member

dizzzz commented May 19, 2017

Another thing...... the output you report

<exist:result xmlns="http://exist.sourceforge.net/NS/exist" xmlns:exist="http://exist.sourceforge.net/NS/exist">
    <exist:collection xmlns:exist="http://exist.sourceforge.net/NS/exist" name="db" created="2017-05-05T16:31:42.538+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x">
        <exist:collection name="OurProject" created="2017-05-16T12:31:53.103+02:00" owner="admin" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="apps" created="2017-05-05T16:34:12.242+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
        <exist:collection name="system" created="2017-05-05T16:31:42.557+02:00" owner="SYSTEM" group="dba" permissions="rwxr-xr-x"/>
  </exist:collection>
</exist:result>

is NOT part of the webdav specification! This XML document has been added as a convenience for an enduser, but it is not webdav. WebDAV is far more complex. This is not the interface Oxygen uses, But i agree it is the same data that Oxygen receives.

@dizzzz

This comment has been minimized.

Member

dizzzz commented May 19, 2017

I think I remember that I have seen ghosts directories a long time ago when things are mounted on exist/webdav/ instead of exist/webdav/db/ ... and maybe the the / on the end makes a difference too?

@funkyfuture

This comment has been minimized.

Contributor

funkyfuture commented May 21, 2017

thanks for your responses.

So from my point, it looks like nginx is breaking stuff, and the solution should be found in nginx.

as Apache httpd does the same, i'd consider the to be at least a documentation bug. are such deployments behind proxies intended to be supported anyway? another issue we're confronted with is that the web root response of an exist-instance contains a link to assets from the /resources path which can't be matched uniquely exist when we prefix different instances with arbritrary paths. (yes, this is another issue, but it raises the question.)

is NOT part of the webdav specification! This XML document has been added as a convenience for an enduser, but it is not webdav.

yep, and that conveniance was easier to post here; the behaviour is identical with the xml output and a WebDAV client.

I think I remember that I have seen ghosts directories a long time ago when things are mounted on …

can you elaborate more specifically what you mean with 'mounted' here? on the server or on the client having mounted a webdav resource from or to the said paths?

So the next step to do is to compare the actual HTTP calls between NGINX and eXist-db (headers, data) with the calls done when connected with eXist-db directly.

shall we start with the endless recursion or with the ghost folders?

@dizzzz

This comment has been minimized.

Member

dizzzz commented May 22, 2017

"are such deployments behind proxies intended to be supported anyway?"

Well it is not part of a test plan, nor we (I?) intended to have these interfaces tested behind one of the many potential third party reverse proxies. The 'core' interface works, that should count :-)

can you elaborate more specifically what you mean with 'mounted' here?

when you mount the webDav interface from e.g. macOS Finder, the mount point should be on 'http://host:8080/exist/webdav/db/' as documented on http://exist-db.org/exist/apps/doc/webdav.xml ; From a long time ago [the extension has been developed 5 years ago!] I remember that when mounting on ..../exist/webdav/db or ..../exist/webdav/ could yield into strange results, with these ghost directories. If nginx strips this last "/" there is nothing we can do on eXist-db side.

The extension works for 5+ years already more or less without too many issues.....

@funkyfuture

This comment has been minimized.

Contributor

funkyfuture commented May 24, 2017

okay, we looked at the issue with this condensed setup as nginx configuration (not including more general stuff):

location /davtests/ {
    proxy_pass http://localhost:8002/davtests/;
}

location /caspars_vault/ {
    proxy_pass http://localhost:8002/davtests/webdav/db/apps/caspar/documents/;
}

any request that matches the first declaration works fine, the aliased access shows a ghost folder named documents in Oxygen.

comparing both dimensions (working/non-working, proxied/unproxied data) reveals no differences in requests or responses; except for cookie related headers that shouldn't matter at all (dav-related rfcs only mention 'cooking recipes' once).

i also investigated other client's behaviour and so far Oxygen is the most forgiving with its ghost folders, others do not work at all on aliased locations.

i'll further look into posssible solutions and probably post a proposal to extend the docs next week.

@funkyfuture

This comment has been minimized.

Contributor

funkyfuture commented Jun 22, 2017

okay, i tested the whole setup with another WebDAV-implementation as upstream server with exactly the same results. now i wonder,

  1. should i amend a note to the docs that our naive and optimistic expectation isn't worth trying to accomplish?
  2. would a feature request for an aliasing of dav-resources facilitated by eXist-db be something that could reasonably be solved?
@dizzzz

This comment has been minimized.

Member

dizzzz commented Jun 22, 2017

hi, if something is really not working, as you investigated, some explanation in the documentation makes sense.

For the second point, I am not sure what you exactly want or need?

@funkyfuture

This comment has been minimized.

Contributor

funkyfuture commented Jun 27, 2017

the second thing would be a 'native' support by eXist-db to facilitate aliased access to WebDAV resources. but it's propably too much effort as a 3rd-party library is used for the WebDAV service, right?

@htInEdin

This comment has been minimized.

htInEdin commented Oct 31, 2018

Same problem with Konqueror and curl. Reports an error, but whole file is actually transferred. To reproduce, open a webdav folder in Konqueror and try to copy an xml file to another folder with ctrl-c, ..., ctrl-v. You'll get an error popup, and a file called orig.xml.part, which actually has the whole file in it.

@dizzzz

This comment has been minimized.

Member

dizzzz commented Oct 31, 2018

@htInEdin please specify exist-db version, curl versions and conqueror version. and the form of the webdav http end-point definition in your app.

control-c control-v sounds to me like a KDE specific operation, so maybe it is a bug in there? For sure I have no idea what webdav calls are done under the hood, please check the logging.

@htInEdin

This comment has been minimized.

htInEdin commented Oct 31, 2018

@funkyfuture

This comment has been minimized.

Contributor

funkyfuture commented Oct 31, 2018

@htInEdin could you also post your proxy configuration and the Jetty config?

control-c control-v sounds to me like a KDE specific operation

i think this refers to keyboard commands for clipboard interactions.

@dizzzz

This comment has been minimized.

Member

dizzzz commented Oct 31, 2018

@htInEdin what is the URL you are connecting to? (leave out the hostname for obvious privacy reasons) ; it is the webdav URL you typed in to connect with the database.

with the URL I should be able to re-play the scenario in curl. Please could you share the exact curl command with me?

@funkyfuture yes true, but I have no idea what is happening under the hood.

@htInEdin

This comment has been minimized.

htInEdin commented Nov 1, 2018

The URL which gives the bad behaviour in Konqueror is
webdav://localhost:8080/exist/webdav/db/apps/mabudungun/repo.xml
As @funkyfuture said above, ctrl-c/ctrl-v are just keyboard shortcuts in Konqueror for Copy...Paste

The complete relevant curl session is

> curl -o repo.xml -u xx:yy http://localhost:8080/exist/webdav/db/apps/mabudungun/repo.xml
  % Total    % Received % Xferd 
  9  4096    9   401    0     0  
curl: (18) transfer closed with 3695 bytes remaining to read

The file is indeed 401 bytes long, and it's all there.

@htInEdin

This comment has been minimized.

htInEdin commented Nov 1, 2018

No proxy, not sure which file you mean for jetty config, but all xml files under tools/jetty are unchanged from 4.4 release.

@dizzzz

This comment has been minimized.

Member

dizzzz commented Nov 1, 2018

ah ok. thnx. it is all clear for me now.

In exist-db there is not something like actual document size: the actual document size that is serialized out of the database model highly depends on the serialization parameters, indenting, inclusion of PIs etc etc.

So as a result, we make a guesstimation on the size by looking a the amount of datapages the document uses. nr of pages * size of page (4k) = total size. That explains the 4k value.

It is implemented this way for performance reasons. The only alternative is to pre-serialize the whole document and measure the size before the document is actually transferred. This is a very expensive operation.

Most webDAV clients work OK with this approach. Some clients don't. Or did and don't. Or didnt and do. You get the idea.

Leaving out the content-length header cannot be done by default, although the specs allows. Some clients really expect this header to be set.

So with the curl command nothing is wrong; the document is correctly transferred, you see the report that a content length was reported, but less bytes were effectively transferred.

so I wrote in the sourcecode some instructions how to work around this. But you are right, this is missing in the documentation.

So workaround is set the environment variable "org.exist.webdav.PROPFIND_METHOD_XML_SIZE" and/or "org.exist.webdav.GET_METHOD_XML_SIZE" to the value NULL, EXACT or APPROXIMATE

As you can read in the code, this part is a can of worms in the WEBDAV clients, it is never OK for every one....

@htInEdin

This comment has been minimized.

htInEdin commented Nov 2, 2018

OK, so, following your suggestion solves the original problem, but took three of us all morning to actually implement.

When you document this, you should explain how to do it. I believe the right way to do it is by editting $EXIST_HOME/vm.properties to replace

vmoptions=-Dfile.encoding=UTF-8

with

vmoptions.linux=-Dfile.encoding=UTF-8 -Dorg.exist.webdav.PROPFIND_METHOD_XML_SIZE=EXACT

The change to .linux (or whatever your OS is) is needed because the value of vmoptions as such is not split before being stored, so you end up with a very long and useless value for the file.encoding property if you use vmoptions=...

@adamretter

This comment has been minimized.

Member

adamretter commented Nov 2, 2018

@htInEdin Would you be able to send a Pull Request to improve our documentation? https://github.com/exist-db/documentation

@htInEdin

This comment has been minimized.

htInEdin commented Nov 5, 2018

Will try, but given the way the code is written it's not easy. Am I allowed to suggest reverting a change to the code (in launcher/LauncherWrapper.java)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment