You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document proposes an architecture for implementing
thread-safe access to the netcdf-c library. Here, the term
"thread-safe" means that multiple threads can access the
netcdf-c library safely (i.e. without interference or
deadlock or race conditions).
It is proposed that thread-safe operation is to be implemented
such that all calls to the netCDF-c API are protected by a binary semaphore
using a lock-unlock protocol. This means that all calls to the API
are "serialized" in the sense that each API call is completed before any
other call to the API can be executed. This means that in a multi-threaded
environment, it is possible for all threads to safely access the
netCDF-c library.
This approach implies some limitations on safety.
If two different threads attempt to access the same file,
then interference is still possible.
Using thread-safe access simultaneously with MPI parallelism may
not be safe. This is still untested.
Architectural Considerations
At the moment, the implementation of the netcdf-c API
resides in files in the libdispatch directory. Basically,
all the code in libdispatch fall into the following categories.
Dispatch functions -- These functions directly invoke methods
in the dispatch table and typically have this form.
int nc_xxx(...)
{
NC* ncp;
int stat = NC_check_id(ncid,&ncp);
if(stat != NC_NOERR) return stat;
return ncp->dispatch->XXX(...);
}
Extension functions -- These functions just invoke some other
function in the API, but possibly with some special values for the
narguments of the called function. Here is an example.
int nc_inq_varname(int ncid, int varid, char *name)
{
return nc_inq_var(ncid, varid, name, NULL, NULL, NULL, NULL);
}
Complex functions -- These functions do complex computation
including calling a variety of internal functions.
Internal functions -- All other code in libdispatch is considered internal.
Functions in classes 1 and 3 are considered to be part of the API
core. See Figure 1 to see the notional relationship between the
function classes.
[[img src=api1.png alt=API1]]
Locking Regime
The simplest approach to thread-safety is to surround all
calls to API functions with a LOCK/UNLOCK protocol.
Our proposal is to implement locking using a single, global binary
semaphore. This is extremely simple and is well-supported under all
versions of *nix* (using pthreads) as well as Windows (built-in).
One consequence of this decision is that there must be no recursive calls
to locked functions. If it happens, it will cause a deadlock. This means
specifically that core functions and internal functions cannot invoke
core functions (directly or transitively).
An example of adding locking to a core function is shown in this example.
int nc_xxx(...)
{
NC* ncp;
int stat = NC_NOERR;
LOCK();
if((stat=NC_check_id(ncid,&ncp)) != NC_NOERR) goto done;
stat = ncp->dispatch->XXX(...);
done:
UNLOCK();
return stat;
}
The done label is used to provide a single exit to ensure that UNLOCK
is invoked before exiting the function.
Note that we do not need to add locking to our class 1 (Extension)
functions since they just invoke a core function (class 2 or 3)
that does the actual locking. Because of this, it will pay to try
to convert as many API calls as possible to be extension functions.
Currently, there are a number of class 2/3 functions that could
be converted with small effort by revising the set of core functions.
Note also that we assume that all internal functions will be invoked
either by other internal functions or by core API functions that use
a locking protocol. Hence these internal functions do not need to
use a locking protocol.
Problem 1: Mostly Extension Functions
It turns out that there are a few functions that are mostly
extension functions except that they invoke some internal functions
to get information not available through the standard netcdfd-c API.
One example is the NCDEFAULT_get_vars function.
It invokes two internal functions:
NC_is_recvar
NC_getshape
The solution is to "expose" these internal functions in the core API
by providing wrappers for them that use the locking regime. Using this
approach, it should be possible to increase the number of extension functions
that do not need to directly use locking.
Problem 2: Internal to Core Function calls
This is the big problem is implementing thread-safety.
It turns out that some internal code invokes core API functions.
This mostly occurs inside the libdap2 and libdap4 code. This is a problem
because it violates the no recursive call rule and will lead to deadlock.
The simplest solution to this problem is to
change all recursive calls from the internal
code to the core API code to no longer call the
core API. Instead, the direct calls can, in most
cases, be changed to call directly into the
dispatch layer. The cost is increased
complexity in the internal code. To some degree,
this complexity can be mitigated by using macros
to hide the complexity. In a few cases, some extra
internal functions may have to be introduced into
the libdispatch code to make this change possible
or to simplify the required changes.
Steps to Implementing Proposed Architecture
The key to implementing the proposed architecture
is to slowly refactor the code in libdispatch
to properly segregate the extension functions from
the core API from the internal code.
I propose the following sequence of actions.
Create two new files: libdispatch/dextend.c
and libdispatch/dapi.c.
Move extension functions into dextend.c and the
core api functions into dapi.c.
Add extra functions in the dapi.c the expose functions
like NC_getshape (see above).
Identify the recursive calls in internal code. This can
be accomplished by temporarily renaming the functions in
dapi.c and dextend.c and then recompiling. That should flush
out all such recursive calls.
The text was updated successfully, but these errors were encountered:
Added this to the 'Thread Safety' github project. It's a relatively new feature (projects) and it feels like this is a good chance to evaluate whether or not they add anything to our workflow. I'm not suggesting that we enforce any particular convention for using it, rather lets see if it finds a natural place as a useful tool.
Introduction
This document proposes an architecture for implementing
thread-safe access to the netcdf-c library. Here, the term
"thread-safe" means that multiple threads can access the
netcdf-c library safely (i.e. without interference or
deadlock or race conditions).
It is proposed that thread-safe operation is to be implemented
such that all calls to the netCDF-c API are protected by a binary semaphore
using a lock-unlock protocol. This means that all calls to the API
are "serialized" in the sense that each API call is completed before any
other call to the API can be executed. This means that in a multi-threaded
environment, it is possible for all threads to safely access the
netCDF-c library.
This approach implies some limitations on safety.
then interference is still possible.
not be safe. This is still untested.
Architectural Considerations
At the moment, the implementation of the netcdf-c API
resides in files in the libdispatch directory. Basically,
all the code in libdispatch fall into the following categories.
in the dispatch table and typically have this form.
function in the API, but possibly with some special values for the
narguments of the called function. Here is an example.
including calling a variety of internal functions.
Functions in classes 1 and 3 are considered to be part of the API
core. See Figure 1 to see the notional relationship between the
function classes.
[[img src=api1.png alt=API1]]
Locking Regime
The simplest approach to thread-safety is to surround all
calls to API functions with a LOCK/UNLOCK protocol.
Our proposal is to implement locking using a single, global binary
semaphore. This is extremely simple and is well-supported under all
versions of
*nix*
(using pthreads) as well as Windows (built-in).One consequence of this decision is that there must be no recursive calls
to locked functions. If it happens, it will cause a deadlock. This means
specifically that core functions and internal functions cannot invoke
core functions (directly or transitively).
An example of adding locking to a core function is shown in this example.
The done label is used to provide a single exit to ensure that UNLOCK
is invoked before exiting the function.
Note that we do not need to add locking to our class 1 (Extension)
functions since they just invoke a core function (class 2 or 3)
that does the actual locking. Because of this, it will pay to try
to convert as many API calls as possible to be extension functions.
Currently, there are a number of class 2/3 functions that could
be converted with small effort by revising the set of core functions.
Note also that we assume that all internal functions will be invoked
either by other internal functions or by core API functions that use
a locking protocol. Hence these internal functions do not need to
use a locking protocol.
Problem 1: Mostly Extension Functions
It turns out that there are a few functions that are mostly
extension functions except that they invoke some internal functions
to get information not available through the standard netcdfd-c API.
One example is the NCDEFAULT_get_vars function.
It invokes two internal functions:
The solution is to "expose" these internal functions in the core API
by providing wrappers for them that use the locking regime. Using this
approach, it should be possible to increase the number of extension functions
that do not need to directly use locking.
Problem 2: Internal to Core Function calls
This is the big problem is implementing thread-safety.
It turns out that some internal code invokes core API functions.
This mostly occurs inside the libdap2 and libdap4 code. This is a problem
because it violates the no recursive call rule and will lead to deadlock.
The simplest solution to this problem is to
change all recursive calls from the internal
code to the core API code to no longer call the
core API. Instead, the direct calls can, in most
cases, be changed to call directly into the
dispatch layer. The cost is increased
complexity in the internal code. To some degree,
this complexity can be mitigated by using macros
to hide the complexity. In a few cases, some extra
internal functions may have to be introduced into
the libdispatch code to make this change possible
or to simplify the required changes.
Steps to Implementing Proposed Architecture
The key to implementing the proposed architecture
is to slowly refactor the code in libdispatch
to properly segregate the extension functions from
the core API from the internal code.
I propose the following sequence of actions.
and libdispatch/dapi.c.
core api functions into dapi.c.
like NC_getshape (see above).
using the exposed functions in cmake build does not install netcdf.3 man page #3.
be accomplished by temporarily renaming the functions in
dapi.c and dextend.c and then recompiling. That should flush
out all such recursive calls.
The text was updated successfully, but these errors were encountered: