Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread Safe netcdf-c library #382

Open
DennisHeimbigner opened this issue Mar 23, 2017 · 5 comments
Open

Thread Safe netcdf-c library #382

DennisHeimbigner opened this issue Mar 23, 2017 · 5 comments

Comments

@DennisHeimbigner
Copy link
Collaborator

Introduction

This document proposes an architecture for implementing
thread-safe access to the netcdf-c library. Here, the term
"thread-safe" means that multiple threads can access the
netcdf-c library safely (i.e. without interference or
deadlock or race conditions).

It is proposed that thread-safe operation is to be implemented
such that all calls to the netCDF-c API are protected by a binary semaphore
using a lock-unlock protocol. This means that all calls to the API
are "serialized" in the sense that each API call is completed before any
other call to the API can be executed. This means that in a multi-threaded
environment, it is possible for all threads to safely access the
netCDF-c library.

This approach implies some limitations on safety.

  1. If two different threads attempt to access the same file,
    then interference is still possible.
  2. Using thread-safe access simultaneously with MPI parallelism may
    not be safe. This is still untested.

Architectural Considerations

At the moment, the implementation of the netcdf-c API
resides in files in the libdispatch directory. Basically,
all the code in libdispatch fall into the following categories.

  1. Dispatch functions -- These functions directly invoke methods
    in the dispatch table and typically have this form.
        int nc_xxx(...)
        {
            NC* ncp;
            int stat = NC_check_id(ncid,&ncp);
            if(stat != NC_NOERR) return stat;
            return ncp->dispatch->XXX(...);
        }
  1. Extension functions -- These functions just invoke some other
    function in the API, but possibly with some special values for the
    narguments of the called function. Here is an example.
        int nc_inq_varname(int ncid, int varid, char *name)
        {
           return nc_inq_var(ncid, varid, name, NULL, NULL, NULL, NULL);
        }
  1. Complex functions -- These functions do complex computation
    including calling a variety of internal functions.
  2. Internal functions -- All other code in libdispatch is considered internal.

Functions in classes 1 and 3 are considered to be part of the API
core. See Figure 1 to see the notional relationship between the
function classes.

NOTSHOWN
[[img src=api1.png alt=API1]]

Locking Regime

The simplest approach to thread-safety is to surround all
calls to API functions with a LOCK/UNLOCK protocol.

Our proposal is to implement locking using a single, global binary
semaphore. This is extremely simple and is well-supported under all
versions of *nix* (using pthreads) as well as Windows (built-in).

One consequence of this decision is that there must be no recursive calls
to locked functions. If it happens, it will cause a deadlock. This means
specifically that core functions and internal functions cannot invoke
core functions (directly or transitively).

An example of adding locking to a core function is shown in this example.

        int nc_xxx(...)
        {
            NC* ncp;
            int stat = NC_NOERR;
            LOCK();
	    if((stat=NC_check_id(ncid,&ncp)) != NC_NOERR) goto done;
            stat = ncp->dispatch->XXX(...);
        done:
            UNLOCK();
            return stat;
        }

The done label is used to provide a single exit to ensure that UNLOCK
is invoked before exiting the function.

Note that we do not need to add locking to our class 1 (Extension)
functions since they just invoke a core function (class 2 or 3)
that does the actual locking. Because of this, it will pay to try
to convert as many API calls as possible to be extension functions.
Currently, there are a number of class 2/3 functions that could
be converted with small effort by revising the set of core functions.

Note also that we assume that all internal functions will be invoked
either by other internal functions or by core API functions that use
a locking protocol. Hence these internal functions do not need to
use a locking protocol.

Problem 1: Mostly Extension Functions

It turns out that there are a few functions that are mostly
extension functions except that they invoke some internal functions
to get information not available through the standard netcdfd-c API.
One example is the NCDEFAULT_get_vars function.
It invokes two internal functions:

  • NC_is_recvar
  • NC_getshape

The solution is to "expose" these internal functions in the core API
by providing wrappers for them that use the locking regime. Using this
approach, it should be possible to increase the number of extension functions
that do not need to directly use locking.

Problem 2: Internal to Core Function calls

This is the big problem is implementing thread-safety.

It turns out that some internal code invokes core API functions.
This mostly occurs inside the libdap2 and libdap4 code. This is a problem
because it violates the no recursive call rule and will lead to deadlock.

The simplest solution to this problem is to
change all recursive calls from the internal
code to the core API code to no longer call the
core API. Instead, the direct calls can, in most
cases, be changed to call directly into the
dispatch layer. The cost is increased
complexity in the internal code. To some degree,
this complexity can be mitigated by using macros
to hide the complexity. In a few cases, some extra
internal functions may have to be introduced into
the libdispatch code to make this change possible
or to simplify the required changes.

Steps to Implementing Proposed Architecture

The key to implementing the proposed architecture
is to slowly refactor the code in libdispatch
to properly segregate the extension functions from
the core API from the internal code.

I propose the following sequence of actions.

  1. Create two new files: libdispatch/dextend.c
    and libdispatch/dapi.c.
  2. Move extension functions into dextend.c and the
    core api functions into dapi.c.
  3. Add extra functions in the dapi.c the expose functions
    like NC_getshape (see above).
  4. Move, where possible, code from dapi.c to dextend.c
    using the exposed functions in cmake build does not install netcdf.3 man page #3.
  5. Identify the recursive calls in internal code. This can
    be accomplished by temporarily renaming the functions in
    dapi.c and dextend.c and then recompiling. That should flush
    out all such recursive calls.
@WardF
Copy link
Member

WardF commented Mar 23, 2017

Added this to the 'Thread Safety' github project. It's a relatively new feature (projects) and it feels like this is a good chance to evaluate whether or not they add anything to our workflow. I'm not suggesting that we enforce any particular convention for using it, rather lets see if it finds a natural place as a useful tool.

@WardF WardF added this to Information in Thread Safety Mar 23, 2017
@WardF WardF modified the milestones: 4.4.3, future Mar 24, 2017
@DennisHeimbigner
Copy link
Collaborator Author

I can already see a problem with projects. There is no
obvious way to create inter-note links.

@dopplershift
Copy link
Member

dopplershift commented Mar 25, 2017 via email

@DennisHeimbigner
Copy link
Collaborator Author

DennisHeimbigner commented Mar 25, 2017 via email

@WardF WardF modified the milestones: 4.6.0, future Jan 25, 2018
@kenash0625
Copy link

kenash0625 commented Jul 22, 2021

hello community:
can i have one writer thread and one reader thread simultaneously access the same netCDF-4 file?
show me some examples plz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
NetCDF 4.4.3
Awaiting triage
Thread Safety
Information
Development

No branches or pull requests

4 participants