-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GridFTP incompatibilities with Globus Online #3545
Comments
Could you find the FTP client (GO) operations for the successful transfer? The access log file should contain all operations and dCache's response. A simple grep using the session ( |
Hi Paul, I'm not so familiar with GO ... I do not find the failed transfer I mentioned here, but this one looks identical and failed at the same time:
The successful attempt only mentions this:
Here the "grepped" session log from our gridftp door: |
Thanks for the information. The message Unfortunately, the access log you found is almost certainly not the connection that experience the problem. Certainly, there is no indication of a problem in that access log file. Instead, it shows the FTP client (GO) disconnecting shortly after starting a new transfer, apparently unprovoked. I have seen this behaviour before. It comes from the recovery produce GO uses, where it disconnects all FTP connections when there is a problem with any connection; therefore, it is quite likely that the problem was with some other FTP connection: either the same GridFTP door or another GridFTP door. You could try restricting GO to making a single transfer at any time and try to recreate the problem there. This should make it easier to discover why GO is aborting. |
Would it be possible to try the latest dCache version (3.2) -- well, it's not yet released, but we're just putting together the release notes ? This has a couple of features that GO requires (dynamic checksum calculation; command pipelining). Perhaps you could set up a small test system just to demonstrate whether GO works better with this version of dCache. |
Hi Paul, Upgrading our test system once version 3.2 is released should not be much of a problem. But getting a firewall exception for that system is one ... Any idea how to simulate the GO client with e.g. globus-url-copy? GO looks like a "black-box client" to me ... Is prometheus.desy.de public so that we could use it for compatibility tests? Thanks! |
If you like, you can take one of the latest 3.2 pre-release builds and try that:
Unfortunately, I'm not sure how to emulate GO with globus-url-copy. In my experiments, I created a virtual machine and ran the GO packaged server there. However, mostly it was a case of observing what GO does when interacting with dCache, instrumenting error cases, and the occasional inspired detective work to understand what was going wrong and get it to work with dCache. Yes, you can certainly use prometheus for testing -- that's one of its major reasons for existing. Various VOs are already authorised, but I can also create an account specifically for you (tied to your DN). Just drop me an email if that would be useful. |
Hello, this is Gonzalo from IceCube @ UW-Madison. We have a data archive service here that issues Globus-online transfers to archive data from endpoint A to endpoint B. If you think it might be of useful for your testing, we could quite easily direct some arbitrary transfer load to a test endpoint that you would point us to. Gonzalo |
Hi Gonzalo, Sorry for the delay in getting back in touch -- what I propose is giving you an account on our test system called 'prometheus'. This would allow a much faster turn-around for getting to the bottom of any problems with GO. Could you send me the output of
(replacing Cheers, Paul. |
Hello Paul, Are there any updates on the debugging of this issue? thanks! |
My apologies for the delay in replying. There was an unresolved issue with the update that prevented me from updating dCache so it supports the Globus transfer-service. It turns out the problem was not with the patch, but with the existing dCache code. That problem is now fixed, so I've deployed the patch. Currently the patch is in our 'master' branch. This means it appears in prometheus test system right now, so you should be able to verify that it works there. We will back-port the fix to our stable branches, going back to dCache v3.2. It's too late to do that for this release cycle (due out tomorrow), but it should be available as part of the next release cycle (due next Tuesday: 2018-03-06). |
Nice. Thanks!
…On Mon, Feb 26, 2018 at 6:59 AM Paul Millar ***@***.***> wrote:
My apologies for the delay in replying.
There was an unresolved issue with the update that prevented me from
updating dCache so it supports the Globus transfer-service. It turns out
the problem was not with the patch, but with the existing dCache code. That
problem is now fixed, so I've deployed the patch.
Currently the patch is in our 'master' branch. This means it appears in
prometheus test system right now, so you should be able to verify that it
works there.
We will back-port the fix to our stable branches, going back to dCache
v3.2. It's too late to do that for this release cycle (due out tomorrow),
but it should be available as part of the next release cycle (due next
Tuesday: 2018-03-06).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3545 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHjryV1KMDEnnFyNsX79nAZNd5OMg_6yks5tYqq0gaJpZM4PiwJ->
.
|
Hello Paul,
No problem. Thanks for looking into this.
I tried to submit a sync transfer of few hundred files from UW-Madison to
Prometheus yesterday.
The transfer eventually completed, but there were few errors appearing at
around midnight, Central US Time.
I paste below some of these errors. I don't know if they are relevant, or
if you could correlate with some errors looking at the logs on your side.
Tell me if you see something, or if you would like to see some specific
test.
…-----
2018-03-15 12:01 am
unknown error
Error (transfer)
Endpoint: Prometheus DESY test server (eca64ab0-b811-11e7-b125-22000a92523b)
Server: prometheus.desy.de:2811
File:
/Users/gmerino/0101/PFFilt_PhysicsFiltering_Run00129005_Subrun00000000_00000260.tar.bz2
Command: CKSM MD5 0 -1
/Users/gmerino/0101/PFFilt_PhysicsFiltering_Run00129005_Subrun00000000_00000260.tar.bz2
Message: Fatal FTP response
---
Details: 550 Error retrieving
/Users/gmerino/0101/PFFilt_PhysicsFiltering_Run00129005_Subrun00000000_00000260.tar.bz2:
Transfer was forcefully killed\r\n
----
2018-03-15 12:02 am
connection failed
{
"context": [
{
"endpoint": "Prometheus DESY test server
(eca64ab0-b811-11e7-b125-22000a92523b)",
"operation": "File Transfer - Capability Check"
}
],
"error": {
"details": "Error (connect)\nEndpoint: Prometheus DESY test server
(eca64ab0-b811-11e7-b125-22000a92523b)\nServer:
prometheus.desy.de:2811\nMessage:
Could not connect to server\n---\nDetails: globus_xio: Unable to connect to
prometheus.desy.de:2811\\nglobus_xio: System error in connect: Connection
refused\\nglobus_xio: A system call failed: Connection refused\\n\n",
"type": "GSHError"
}
}
On 26 February 2018 at 06:59, Paul Millar ***@***.***> wrote:
My apologies for the delay in replying.
There was an unresolved issue with the update that prevented me from
updating dCache so it supports the Globus transfer-service. It turns out
the problem was not with the patch, but with the existing dCache code. That
problem is now fixed, so I've deployed the patch.
Currently the patch is in our 'master' branch. This means it appears in
prometheus test system right now, so you should be able to verify that it
works there.
We will back-port the fix to our stable branches, going back to dCache
v3.2. It's too late to do that for this release cycle (due out tomorrow),
but it should be available as part of the next release cycle (due next
Tuesday: 2018-03-06).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3545 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHjryV1KMDEnnFyNsX79nAZNd5OMg_6yks5tYqq0gaJpZM4PiwJ->
.
|
Thanks for doing this testing, Gonzalo. Every day, at 06:00 CET/CEST, prometheus is wiped clean and reinstalled from scratch. This isn't normal dCache behaviour -- it's something special to prometheus, as it always has the latest dCache version. I believe that (currently) this time corresponds to midnight in Central US time. Looking at the logs, I see the Globus transfer service connections (acting on your behalf), starting 2018-03-15T05:13:00.669+0100, with the last one connecting 2018-03-15T08:07:40.427+0100. So, I believe this explains the "few errors" you described. Could you retry the transfers, starting them somewhat earlier, to try and avoid midnight? |
There doesn't seem to have been much progress on this ticket. To be clear, I believe this problem is solved. I am able to transfer many files between two dCache instances using Globus. |
Hi,
dCache version: 2.16.47
We are suffering from a rather long-standing incompatibility with Globus Online's GridFTP implementation. In our case Icecube is suffers from this problem. GO transfers files in parallel but as soon as one file is transferred successfully, it cancels all other still ongoing transfers.
Here an example, finished transfer:
09.25 15:14:52 [door:GFTP-plum15-AAVaAuwI7mA@gridftp-plum15Domain:request] ["/DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=jade/jade-lta.icecube.wisc.edu":16892:248:184.73.189.163] [00009B888CD56F6447778A58209BCDFF1237,176851709905] [/pnfs/ifh.de/acs/icecube/archive/data/exp/IceCube/2015/unbiased/PFDST/0318/9e20f17c-b429-44d3-bcba-1f8e8c0480dd.zip] icecube:pfdst@osm 1810144 0 {0:""}
And here one transfer that gets cancelled just in the same moment:
09.25 15:14:52 [door:GFTP-plum15-AAVaAuwI6ng@gridftp-plum15Domain:request] ["/DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=jade/jade-lta.icecube.wisc.edu":16892:248:184.73.189.163] [000013047092D1644B3CB9C32F368DC4CCDD,0] [/pnfs/ifh.de/acs/icecube/archive/data/exp/IceCube/2015/unbiased/PFDST/1211/feaab171-9c23-46ee-8a6b-f1a53bf173f3.zip] icecube:pfdst@osm 1810481 0 {451:"Aborting transfer due to session termination"}
I guess, GO uses a GridFTP feature dCache doesn't support (pipelining?). Any idea how to smoothly interoperate with GO?
The text was updated successfully, but these errors were encountered: