Skip to content

[C++][FlightRPC] FlightClient.authenticate is not thread safe #41919

@jcferretti

Description

@jcferretti

Describe the bug, including details regarding any error messages, version, and platform.

This is the cython code (pyx) implementation of flight_client.authenticate:

def authenticate(self, auth_handler, options: FlightCallOptions = None):

   1440     def authenticate(self, auth_handler, options: FlightCallOptions = None):
   1441         """Authenticate to the server.
   1442
   1443         Parameters
   1444         ----------
   1445         auth_handler : ClientAuthHandler
   1446             The authentication mechanism to use.
   1447         options : FlightCallOptions
   1448             Options for this call.
   1449         """
   1450         cdef:
   1451             unique_ptr[CClientAuthHandler] handler
   1452             CFlightCallOptions* c_options = FlightCallOptions.unwrap(options)
   1453
   1454         if not isinstance(auth_handler, ClientAuthHandler):
   1455             raise TypeError(
   1456                 "FlightClient.authenticate takes a ClientAuthHandler, "
   1457                 "not '{}'".format(type(auth_handler)))
   1458         handler.reset((<ClientAuthHandler> auth_handler).to_handler())
   1459         with nogil:
   1460             check_flight_status(
   1461                 self.client.get().Authenticate(deref(c_options),
   1462                                                move(handler)))

Note the object being stored inside the handler object is returned by to_handler defined later on the same file:

cdef PyClientAuthHandler* to_handler(self):

   2482 cdef class ClientAuthHandler(_Weakrefable):
   2483     """Authentication plugin for a client."""
   2484 
   [...]
   2500 
   2501     cdef PyClientAuthHandler* to_handler(self):
   2502         cdef PyClientAuthHandlerVtable vtable
   2503         vtable.authenticate = _client_authenticate
   2504         vtable.get_token = _get_token
   2505         return new PyClientAuthHandler(self, vtable)

The call to Authenticate on _flight.pyx line 1461 is to grpc_client.cc line 860:

Status Authenticate(const FlightCallOptions& options,

    860   Status Authenticate(const FlightCallOptions& options,
    861                       std::unique_ptr<ClientAuthHandler> auth_handler) override {
    862     auth_handler_ = std::move(auth_handler);
    863     ClientRpc rpc(options);
    864     return AuthenticateInternal(rpc);
    865   }

Many calls in the same file use the auth_handler_ member of the struct

std::shared_ptr<ClientAuthHandler> auth_handler_;

that is being assigned on line 862 above, eg, DoPut:

RETURN_NOT_OK(rpc->SetToken(auth_handler_.get()));

    997   Status DoPut(const FlightCallOptions& options,
    998                std::unique_ptr<internal::ClientDataStream>* out) override {
    999     using GrpcStream = ::grpc::ClientReaderWriter<pb::FlightData, pb::PutResult>;
   1000
   1001     auto rpc = std::make_shared<ClientRpc>(options);
   1002     RETURN_NOT_OK(rpc->SetToken(auth_handler_.get()));
   1003     std::shared_ptr<GrpcStream> stream = stub_->DoPut(&rpc->context);
   1004     *out = std::make_unique<GrpcClientPutStream>(std::move(rpc), std::move(stream));
   1005     return Status::OK();
   1006   }

SetToken on the same file:

Status SetToken(ClientAuthHandler* auth_handler) {

     86   /// \brief Add an auth token via an auth handler
     87   Status SetToken(ClientAuthHandler* auth_handler) {
     88     if (auth_handler) {
     89       std::string token;
     90       RETURN_NOT_OK(auth_handler->GetToken(&token));
     91       context.AddMetadata(kGrpcAuthHeader, token);
     92     }
     93     return Status::OK();
     94   }

There is a race between calling auth_handler_.get() to get out a raw pointer out of the auth_handler_ shared_ptr and using it inside SetToken, and another thread changing the value of auth_handler_ via the its assignment operator call in Authenticate, which can trigger the deletion of the previously held pointer value. That deletion can happen in another thread after the call to auth_handler_.get() to get the raw pointer value and before SetToken using that raw pointer value.

My team builds a server and client libraries based on flight. One of our customers ran into an issue while using hundreds of concurrent sessions to our service. Our client library was using calls to FlightClient.authenticate triggered by a timer once every 5 minutes; that concurrently with hundreds of calls to DoPut triggered the problem. We were able to reproduce the problem by artificially increasing the frequency of calls to FlightClient.authenticate to 3 seconds; we saw the same problem with either concurrent DoPut or DoGet. We saw the problem on pyarrow 16.0.0 on Ubuntu Linux 22.04.

More details about the symptoms we saw here: deephaven/deephaven-core#5489

Component(s)

FlightRPC

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions