Thread Safe netcdf-c library #382

DennisHeimbigner · 2017-03-23T19:06:39Z

Introduction

This document proposes an architecture for implementing
thread-safe access to the netcdf-c library. Here, the term
"thread-safe" means that multiple threads can access the
netcdf-c library safely (i.e. without interference or
deadlock or race conditions).

It is proposed that thread-safe operation is to be implemented
such that all calls to the netCDF-c API are protected by a binary semaphore
using a lock-unlock protocol. This means that all calls to the API
are "serialized" in the sense that each API call is completed before any
other call to the API can be executed. This means that in a multi-threaded
environment, it is possible for all threads to safely access the
netCDF-c library.

This approach implies some limitations on safety.

If two different threads attempt to access the same file,
then interference is still possible.
Using thread-safe access simultaneously with MPI parallelism may
not be safe. This is still untested.

Architectural Considerations

At the moment, the implementation of the netcdf-c API
resides in files in the libdispatch directory. Basically,
all the code in libdispatch fall into the following categories.

Dispatch functions -- These functions directly invoke methods
in the dispatch table and typically have this form.

        int nc_xxx(...)
        {
            NC* ncp;
            int stat = NC_check_id(ncid,&ncp);
            if(stat != NC_NOERR) return stat;
            return ncp->dispatch->XXX(...);
        }

Extension functions -- These functions just invoke some other
function in the API, but possibly with some special values for the
narguments of the called function. Here is an example.

        int nc_inq_varname(int ncid, int varid, char *name)
        {
           return nc_inq_var(ncid, varid, name, NULL, NULL, NULL, NULL);
        }

Complex functions -- These functions do complex computation
including calling a variety of internal functions.
Internal functions -- All other code in libdispatch is considered internal.

Functions in classes 1 and 3 are considered to be part of the API
core. See Figure 1 to see the notional relationship between the
function classes.

[[img src=api1.png alt=API1]]

Locking Regime

The simplest approach to thread-safety is to surround all
calls to API functions with a LOCK/UNLOCK protocol.

Our proposal is to implement locking using a single, global binary
semaphore. This is extremely simple and is well-supported under all
versions of *nix* (using pthreads) as well as Windows (built-in).

One consequence of this decision is that there must be no recursive calls
to locked functions. If it happens, it will cause a deadlock. This means
specifically that core functions and internal functions cannot invoke
core functions (directly or transitively).

An example of adding locking to a core function is shown in this example.

        int nc_xxx(...)
        {
            NC* ncp;
            int stat = NC_NOERR;
            LOCK();
	    if((stat=NC_check_id(ncid,&ncp)) != NC_NOERR) goto done;
            stat = ncp->dispatch->XXX(...);
        done:
            UNLOCK();
            return stat;
        }

The done label is used to provide a single exit to ensure that UNLOCK
is invoked before exiting the function.

Note that we do not need to add locking to our class 1 (Extension)
functions since they just invoke a core function (class 2 or 3)
that does the actual locking. Because of this, it will pay to try
to convert as many API calls as possible to be extension functions.
Currently, there are a number of class 2/3 functions that could
be converted with small effort by revising the set of core functions.

Note also that we assume that all internal functions will be invoked
either by other internal functions or by core API functions that use
a locking protocol. Hence these internal functions do not need to
use a locking protocol.

Problem 1: Mostly Extension Functions

It turns out that there are a few functions that are mostly
extension functions except that they invoke some internal functions
to get information not available through the standard netcdfd-c API.
One example is the NCDEFAULT_get_vars function.
It invokes two internal functions:

NC_is_recvar
NC_getshape

The solution is to "expose" these internal functions in the core API
by providing wrappers for them that use the locking regime. Using this
approach, it should be possible to increase the number of extension functions
that do not need to directly use locking.

Problem 2: Internal to Core Function calls

This is the big problem is implementing thread-safety.

It turns out that some internal code invokes core API functions.
This mostly occurs inside the libdap2 and libdap4 code. This is a problem
because it violates the no recursive call rule and will lead to deadlock.

The simplest solution to this problem is to
change all recursive calls from the internal
code to the core API code to no longer call the
core API. Instead, the direct calls can, in most
cases, be changed to call directly into the
dispatch layer. The cost is increased
complexity in the internal code. To some degree,
this complexity can be mitigated by using macros
to hide the complexity. In a few cases, some extra
internal functions may have to be introduced into
the libdispatch code to make this change possible
or to simplify the required changes.

Steps to Implementing Proposed Architecture

The key to implementing the proposed architecture
is to slowly refactor the code in libdispatch
to properly segregate the extension functions from
the core API from the internal code.

I propose the following sequence of actions.

Create two new files: libdispatch/dextend.c
and libdispatch/dapi.c.
Move extension functions into dextend.c and the
core api functions into dapi.c.
Add extra functions in the dapi.c the expose functions
like NC_getshape (see above).
Move, where possible, code from dapi.c to dextend.c
using the exposed functions in cmake build does not install netcdf.3 man page #3.
Identify the recursive calls in internal code. This can
be accomplished by temporarily renaming the functions in
dapi.c and dextend.c and then recompiling. That should flush
out all such recursive calls.

The text was updated successfully, but these errors were encountered:

WardF · 2017-03-23T19:31:29Z

Added this to the 'Thread Safety' github project. It's a relatively new feature (projects) and it feels like this is a good chance to evaluate whether or not they add anything to our workflow. I'm not suggesting that we enforce any particular convention for using it, rather lets see if it finds a natural place as a useful tool.

DennisHeimbigner · 2017-03-25T02:20:19Z

I can already see a problem with projects. There is no
obvious way to create inter-note links.

dopplershift · 2017-03-25T04:19:20Z

Just promote the notes to issues. Ryan

On Fri, Mar 24, 2017 at 8:20 PM DennisHeimbigner ***@***.***> wrote: I can already see a problem with projects. There is no obvious way to create inter-note links. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#382 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANhVlc-PZK9RbJp0BptvcY0Q0T7Ekooks5rpHnkgaJpZM4MnOSr> .

-- Ryan May, Ph.D. Software Engineer UCAR/Unidata Boulder, CO

DennisHeimbigner · 2017-03-25T18:19:29Z

Too bad if that is the only way because it pollutes the issue space. =Dennis

…

On 3/24/2017 10:19 PM, Ryan May wrote: Just promote the notes to issues. Ryan On Fri, Mar 24, 2017 at 8:20 PM DennisHeimbigner ***@***.***> wrote: > I can already see a problem with projects. There is no > obvious way to create inter-note links. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#382 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AANhVlc-PZK9RbJp0BptvcY0Q0T7Ekooks5rpHnkgaJpZM4MnOSr> > . > -- Ryan May, Ph.D. Software Engineer UCAR/Unidata Boulder, CO — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#382 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3P25oEzAx6oi8GtADlYhpGGQnBJrE1ks5rpJXJgaJpZM4MnOSr>.

kenash0625 · 2021-07-22T07:28:06Z

hello community:
can i have one writer thread and one reader thread simultaneously access the same netCDF-4 file?
show me some examples plz?

DennisHeimbigner added type/enhancement type/feature request labels Mar 23, 2017

DennisHeimbigner assigned DennisHeimbigner and WardF Mar 23, 2017

WardF added this to the future milestone Mar 23, 2017

WardF added this to Information in Thread Safety Mar 23, 2017

WardF modified the milestones: 4.4.3, future Mar 24, 2017

WardF modified the milestones: 4.6.0, future Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread Safe netcdf-c library #382

Thread Safe netcdf-c library #382

DennisHeimbigner commented Mar 23, 2017

WardF commented Mar 23, 2017

DennisHeimbigner commented Mar 25, 2017

dopplershift commented Mar 25, 2017 via email

DennisHeimbigner commented Mar 25, 2017 via email

kenash0625 commented Jul 22, 2021 •

edited

Loading

Thread Safe netcdf-c library #382

Thread Safe netcdf-c library #382

Comments

DennisHeimbigner commented Mar 23, 2017

Introduction

Architectural Considerations

Locking Regime

Problem 1: Mostly Extension Functions

Problem 2: Internal to Core Function calls

Steps to Implementing Proposed Architecture

WardF commented Mar 23, 2017

DennisHeimbigner commented Mar 25, 2017

dopplershift commented Mar 25, 2017 via email

DennisHeimbigner commented Mar 25, 2017 via email

kenash0625 commented Jul 22, 2021 • edited Loading

kenash0625 commented Jul 22, 2021 •

edited

Loading