Skip to content

Commit

Permalink
dcache-xrootd: fix TPC rendezvous to work with token authorization
Browse files Browse the repository at this point in the history
Motivation:

A bit of history.

When xrootd first implemented TPC, it used a schema
whereby the initiating client would do an open on both
the source and destination servers, passing to
them a generated "rendezvous key"; when the third-party
client then connects to the source, it should have that
key in its possession; the source server validates that
the key is the same as the one the client used to on
open, and then allows the third-party client to proceed
to open the file (in our case, start the mover).

After delegation was implemented, this strategy could
be short-circuited (the client avoids calling open on
the source); designated "TPC Lite."

Because the rendezvous token carries only implicit
authorization and no authentication, in order to
support a third-party client that connects without
authenticating (say, via a certificate), the code was
modified to make the TPC Subject = ROOT, since it
would only be reading the file, never writing.

However, when JWT token authorization was introduced,
this strategy accidentally got defeated by indicating
that the presence of a token meant the open could
take place immediately.  While this may be true for
the TPC client, it is not true for the initiating
client.  In the case where the TPC client has no
token but the initiating client does, the former
will sit there waiting for the rendezvous key forever.

Modification:

Change the logic to create the rendezvous key even in
the presence of the authz CGI, except on the TPC client.
This will allow for the rendezvous authorization of the third-party
client without a token even if the initiator originally was
authorized/authenticated or is using a JWT token.

If the TPC client is in fact presenting a JWT token,
the rendezvous store-and-wait is aborted.

Result:

Rendezvous TPC without requiring a JWT token to be passed
by the third-party client is possible (again).

Target: master
Request: 8.0
Request: 7.2
Request: 7.1
Request: 7.0
Request: 6.2
Patch: https://rb.dcache.org/r/13502/
Requires-notes: yes
Requires-book: no
Ackd-by: Dmitry
  • Loading branch information
alrossi authored and mksahakyan committed May 6, 2022
1 parent b719355 commit 93d7afa
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 69 deletions.
Expand Up @@ -1037,6 +1037,18 @@ public XrootdTpcInfo createOrGetRendezvousInfo(String key) {
}
}

public void removeTpcPlaceholder(String key) {
synchronized (_tpcFdIndex) {
if (!key.equals(TPC_PLACEMENT)) {
XrootdTpcInfo info = _tpcInfo.remove(key);
if (info != null) {
_tpcFdIndex.remove(info.getFd());
_log.debug("key {} was removed.", key);
}
}
}
}

public boolean removeTpcPlaceholder(int fd) {
synchronized (_tpcFdIndex) {
String tpc = _tpcFdIndex.remove(fd);
Expand Down
Expand Up @@ -83,6 +83,7 @@
import java.util.Set;
import java.util.UUID;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
import javax.security.auth.Subject;
import org.dcache.auth.LoginReply;
Expand All @@ -99,7 +100,6 @@
import org.dcache.xrootd.core.XrootdException;
import org.dcache.xrootd.core.XrootdSession;
import org.dcache.xrootd.protocol.XrootdProtocol;
import org.dcache.xrootd.protocol.messages.AwaitAsyncResponse;
import org.dcache.xrootd.protocol.messages.CloseRequest;
import org.dcache.xrootd.protocol.messages.DirListRequest;
import org.dcache.xrootd.protocol.messages.DirListResponse;
Expand All @@ -117,6 +117,7 @@
import org.dcache.xrootd.protocol.messages.StatResponse;
import org.dcache.xrootd.protocol.messages.StatxRequest;
import org.dcache.xrootd.protocol.messages.StatxResponse;
import org.dcache.xrootd.protocol.messages.WaitRetryResponse;
import org.dcache.xrootd.protocol.messages.XrootdResponse;
import org.dcache.xrootd.tpc.XrootdTpcInfo;
import org.dcache.xrootd.tpc.XrootdTpcInfo.Cgi;
Expand Down Expand Up @@ -193,6 +194,7 @@ public boolean isLoggedIn() {
private final LoginSessionInfo _defaultLoginSessionInfo;
private final Deque<LoginSessionInfo> _logins;
private final FsPath _rootPath;
private final AtomicInteger openRetry = new AtomicInteger(0);

/**
* Custom entries for kXR_Qconfig requests.
Expand Down Expand Up @@ -240,10 +242,10 @@ public void exceptionCaught(ChannelHandlerContext ctx, Throwable t) {
* For third-party copy where dCache is the source, the interactions are as follows:
* <p>
* 1. The client opens the file to check availability (the 'placement' stage). An OK response
* is followed by the client closing the file. 2. The client opens the file again with
* rendezvous metadata. The client will close the file only when notified by the destination
* server that the transfer has completed. 3. The destination server will open the file for the
* actual read.
* is followed by the client closing the file. 2. Full TPC: The client opens the file again
* with rendezvous metadata. The client will close the file only when notified by the
* destination server that the transfer has completed. If TPC Lite (delegation), #2 is skipped.
* 3. The destination server will open the file for the actual read.
* <p>
* The order of 2, 3 is not deterministic; hence the response here must provide for the
* possibility that the destination server attempts an open before the client specifies a
Expand All @@ -256,15 +258,14 @@ public void exceptionCaught(ChannelHandlerContext ctx, Throwable t) {
* seconds; otherwise, if the request matches and occurs within the ttl, the mover will be
* started and the destination redirected to the pool. Response to the client will carry a file
* handle but will not actually open a mover. The close from the client is handled at the door
* by removing the rendezvous information.
* by removing the rendezvous information. All of this is skipped if the third-party client has
* been delegated a credential, in which case it connects and is treated as if it were a normal
* two-party read.
* <p>
* Third-party copy where dCache is the destination should proceed with the usual upload
* transfer creation, but when the client is redirected to the pool and calls kXR_open there, a
* third-party client will be started which does read requests from the source and then writes
* the data to the mover channel.
* <p>
* NOTE: with the changed TPC Lite protocol, the client is not required to open the source
* again during the copy phase (2) if delegation is being used.
*/
@Override
protected XrootdResponse<OpenRequest> doOnOpen(ChannelHandlerContext ctx, OpenRequest req) {
Expand Down Expand Up @@ -476,9 +477,8 @@ protected XrootdResponse<OpenRequest> doOnOpen(ChannelHandlerContext ctx, OpenRe
* need to wait for the rendezvous destination check by comparing the open from the source.</p>
*
* <p>There is also the case where no delegated proxy exists but
* a different authentication protocol (like ZTN/scitokens) is being used. It seems that even
* with delegation in this case the initiating client does not call open. A check for authz in
* the opaque data has been added (03/21/2021).</p>
* a different authentication protocol (like ZTN/scitokens) is being used. If --tpc delegate
* only has been used, we allow rendezvous to take </p>
*/
private XrootdResponse<OpenRequest>
conditionallyHandleThirdPartyRequest(OpenRequest req,
Expand Down Expand Up @@ -514,101 +514,109 @@ protected XrootdResponse<OpenRequest> doOnOpen(ChannelHandlerContext ctx, OpenRe
return null; // proceed as usual with mover + redirect
}

if (opaque.containsKey(Cgi.AUTHZ.key())) {
_log.debug("{} –– request contains authorization token.", req);
return null; // proceed as usual with mover + redirect
}

enforceClientTlsIfDestinationRequiresItForTpc(opaque);

/*
* Check the session for the delegated credential to avoid hanging
* in the case that tpc cgi have been passed by the destination
* server even with TPC with delegation.
* Check the session for the delegated credential first, to avoid hanging
* in the case that tpc cgi have been passed anyway by the destination server
* to the TPC client.
*/
if (req.getSession().getDelegatedCredential() != null) {
_log.debug("{} –– third-party request with delegation.", req);
return null; // proceed as usual with mover + redirect
}

String slfn = req.getPath();
enforceClientTlsIfDestinationRequiresItForTpc(opaque);

String slfn = req.getPath();
XrootdTpcInfo info = _door.createOrGetRendezvousInfo(tpcKey);

/*
* The request originated from the TPC destination server.
* If the client has not yet opened the file here,
* The request originated from the destination TPC client.
* If the initiating client has not yet opened the file here,
* tells the destination to wait. If the verification, including
* time to live, fails, the request is cancelled. Otherwise,
* the destination is allowed to open the mover and get the
* normal redirect response.
*
* Note that the tpc info is created by either the client or the
* server, whichever gets here first. Verification of the key
* Note that the tpc info is created by either the initiating client or the
* destination client, whichever gets here first. Verification of the key
* itself is implicit (it has been found in the map); correctness is
* further satisfied by matching org, host and file name.
*/
if (opaque.containsKey("tpc.org")) {
info.addInfoFromOpaque(slfn, opaque);
if (opaque.containsKey(Cgi.AUTHZ.key())) {
/*
* Since it possesses a bearer token, this means that --tpc delegate only
* was called, and therefore that the client will not do a second
* open with the tpcKey on the source. Thus we should
* remove the key and return immediately.
*/
_door.removeTpcPlaceholder(tpcKey);
_log.debug("{} –– request contains authorization token.", req);
return null; // proceed as usual with mover + redirect
}

info.addInfoFromOpaque(slfn, opaque); /** updates the status **/
switch (info.verify(remoteHost, slfn, opaque.get("tpc.org"))) {
case READY:
_log.debug("Open request {} from destination server, info {}: "
+ "OK to proceed.",
req, info);
/*
* This means that the destination server open arrived
* second, the client server open succeeded with
* This means that the tpc client open arrived
* second, the initiating client open succeeded with
* the correct permissions; proceed as usual
* with mover + redirect.
*/
return null;
case PENDING:
_log.debug("Open request {} from destination server, info {}: "
+ "PENDING client open.",
+ "PENDING client open; sending WAIT-RETRY.",
req, info);
/*
* This means that the destination server open arrived
* first; return a wait-retry reply.
* This means that the tpc client open arrived
* first, the initiating client open has not yet taken place;
* tell the tpc client to wait and retry.
*
* Keep track of the retries and fail after 10.
*/
return new AwaitAsyncResponse<>(req, 3);
case CANCELLED:
String error = info.isExpired() ? "ttl expired" : "dst, path or org"
+ " did not match";
_log.warn("Open request {} from destination server, info {}: "
+ "CANCELLED: {}.",
req, info, error);
_door.removeTpcPlaceholder(info.getFd());
return withError(req, kXR_InvalidRequest,
"tpc rendezvous for " + tpcKey
+ ": " + error);
if (openRetry.incrementAndGet() < 10) {
return new WaitRetryResponse<>(req, 1);
}
/* fall through to ERROR condition */
case ERROR:
/*
* This means that the destination server requested open
* before the client did, and the client did not have
* read permissions on this file.
*/
error = "invalid open request (file permissions).";
String error = "invalid open request (file permissions).";
_log.warn("Open request {} from destination server, info {}: "
+ "ERROR: {}.",
req, info, error);
_door.removeTpcPlaceholder(info.getFd());
return withError(req, kXR_InvalidRequest,
"tpc rendezvous for " + tpcKey
+ ": " + error);
case CANCELLED:
error = info.isExpired() ? "ttl expired" : "dst, path or org"
+ " did not match";
_log.warn("Open request {} from destination server, info {}: "
+ "CANCELLED: {}.",
req, info, error);
_door.removeTpcPlaceholder(info.getFd());
return withError(req, kXR_InvalidRequest,
"tpc rendezvous for " + tpcKey
+ ": " + error);
}
}

/*
* The request originated from the TPC client, indicating door
* is the source.
* The request originated from the client, indicating that this door is the source.
*/
if (opaque.containsKey("tpc.dst")) {
_log.debug("Open request {} from client to door as source, "
+ "info {}: OK.", req, info);
FileStatus status = _door.getFileStatus(fsPath,
subject,
restriction,
remoteHost);
FileStatus status = _door.getFileStatus(fsPath, subject, restriction, remoteHost);
int flags = status.getFlags();

if ((flags & kXR_readable) != kXR_readable) {
Expand All @@ -621,28 +629,16 @@ protected XrootdResponse<OpenRequest> doOnOpen(ChannelHandlerContext ctx, OpenRe
"not allowed to read file.");
}

info.addInfoFromOpaque(slfn, opaque);
info.addInfoFromOpaque(slfn, opaque); /** updates the status **/
return new OpenResponse(req, info.getFd(),
null, null,
status);
}

/*
* The request originated from the TPC client, indicating door
* is the destination.
*
* First check for TLS capability if this is required.
*
* Remove the rendezvous info (not needed),
* allow mover to start and redirect the client to the pool.
*
* It is not necessary to delegate the tpc information through the
* protocol, particularly the rendezvous key, because it is part of
* the opaque data, and if any of the opaque tpc info is missing
* from redirected call to the pool, the transfer will fail.
*
* However, the calling method will need to fetch a delegated
* proxy credential and add that to the protocol.
* The request originated from the client, indicating that this door is the destination.
* There is no need for tpcInfo stored on the destination, so we remove it and
* allow the write mover to be started on the selected pool.
*/
if (opaque.containsKey("tpc.src")) {
_log.debug("Open request {} from client to door as destination: OK;"
Expand Down Expand Up @@ -1289,8 +1285,8 @@ private FsPath createFullPath(String path)
}

/**
* Stack of maximum depth = 2. The first object present is considered the main login info.
* The second is valid only once and then should be discarded. This is to allow for passing (or
* Stack of maximum depth = 2. The first object present is considered the main login info. The
* second is valid only once and then should be discarded. This is to allow for passing (or
* not) multiple authorization tokens on the same session/connection.
*
* @return current info.
Expand Down

0 comments on commit 93d7afa

Please sign in to comment.