Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading issue in WCSLIB even if input data is copied #16245

Open
astrofrog opened this issue Mar 27, 2024 · 38 comments · May be fixed by #16279 or #16411
Open

Multi-threading issue in WCSLIB even if input data is copied #16245

astrofrog opened this issue Mar 27, 2024 · 38 comments · May be fixed by #16279 or #16411
Labels
Bug external PRs and issues related to external packages vendored with Astropy (astropy.extern) multi-threading Upstream Fix Required wcs

Comments

@astrofrog
Copy link
Member

astrofrog commented Mar 27, 2024

In #16244 I mentioned a multi-threading issue that is due to the input array being passed through to the WCS C extension and being subsequently modified in-place. There is however another issue that occurs even if the input data is copied, and is illustrated by the following example:

import numpy as np
from astropy.wcs import WCS
from multiprocessing.pool import ThreadPool

wcs = WCS(naxis=2)
wcs.wcs.crpix = [-234.75, 8.3393]
wcs.wcs.cdelt = np.array([-0.066667, 0.066667])
wcs.wcs.crval = [0, -90]
wcs.wcs.ctype = ["RA---AIR", "DEC--AIR"]
wcs.wcs.set()

N = 1_000_000

pixel = np.random.randint(-1000, 1000, N * 2).reshape((N, 2)).astype(float)


def round_trip_transform(pixel):
    world = wcs.wcs.p2s(pixel.copy(), 1)['world']
    wcs.wcs.lng  # this access causes issues, without it all works
    pixel = wcs.wcs.s2p(world, 1)['pixcrd']
    return pixel


pool = ThreadPool(8)
results = pool.map(round_trip_transform, (pixel,) * 8)

for pixel2 in results:
    print(
        f"Mismatching elements: {np.sum(~np.isclose(pixel, pixel2))}"
    )

In this case I am using p2s and s2p but the issue can be seen also with wcs_pix2world and wcs_world2pix – the reason for using p2s and s2p directly is just to try and simplify the problem as much as possible to use just calls to the WCS C extension. The above example gives:

Mismatching elements: 92
Mismatching elements: 102
Mismatching elements: 116
Mismatching elements: 70
Mismatching elements: 106
Mismatching elements: 48
Mismatching elements: 88
Mismatching elements: 96

One of the weird things is that the line just accessing:

wcs.wcs.lng

appears to be required to trigger this issue. Calling wcs.wcs.lat also causes the issue, but not for example wcs.wcs.equinox which then works fine. Removing this line solves the issue.

Changing the function so that it does a copy of the WCS object:

def round_trip_transform(pixel):
    wcs2 = wcs.deepcopy()
    world = wcs2.wcs.p2s(pixel.copy(), 1)['world']
    wcs2.wcs.lng
    pixel = wcs2.wcs.s2p(world, 1)['pixcrd']
    return pixel

also fixes the issue.

@manodeep and I have been discussing this and he is investigating (but if anyone else has ideas feel free to chime in!)

@manodeep
Copy link
Contributor

manodeep commented Apr 1, 2024

I made some progress on this issue. The race condition could be created by an access (and only an access) to any of the four fields - lng, lat, lngtyp, lattyp. It turns out that all four fields have getters in Python that invoke the wcsset C function (via PyWcsprm_cset). And that extra call to wcsset is causing the race condition. I have confirmed that I can recreate a similar race condition effect by replacing the wcs.wcs.lng access with wcs.wcs.set()

I will report back as I get further insight on what exactly within wcsset is causing the race condition to occur (and why using a deepcopy avoids that race condition).

@manodeep
Copy link
Contributor

manodeep commented Apr 2, 2024

wcsset calls wcs_types - which, sets wcs->types[0:1] := 0 on the first call. However, on the second third and future calls to the wcs_types function, wcs->types[0:1] gets set to [2200, 2201]. This change triggers the race condition. When I reset wcs->types := [0, 0] at the end of the wcs_types function, the race condition disappears.

On a deepcopy, wcs->types shows the same behaviour - i.e., wcs->types is set to [0,0] initially and then [2200, 2201] on later calls. So something else is also happening on the deepcopy that prevents the race condition.

@astrofrog
Copy link
Member Author

What I don't quite understand is that wcs.wcs.set() is called before the multi threading so I don't understand why types is set to 0,0 at any point - shouldn't it be 2200,2201 and remain that?

@manodeep
Copy link
Contributor

manodeep commented Apr 3, 2024

I don't understand it either but presumably wcs->x.flag is altering the behaviour. For example, with the python code you posted + print statements:

print("getting wcs...",file=sys.stderr)
wcs = WCS(naxis=2)
print("getting wcs...done",file=sys.stderr)
wcs.wcs.crpix = [-234.75, 8.3393]
wcs.wcs.cdelt = np.array([-0.066667, 0.066667])
wcs.wcs.crval = [0, -90]
wcs.wcs.ctype = ["RA---AIR", "DEC--AIR"]
print("setting wcs...", file=sys.stderr)
wcs.wcs.set()
print("setting wcs...done",file=sys.stderr)

And, at the end of wcs_types() within wcs.c

  for(int i=0;i<naxis;i++) {
      fprintf(stderr,"Re-assigned: wcs->types[%d] = %d\n", i, wcs->types[i]);
  }

Plus, markers like this at the start and end of wcsset, wcs_types in wcs.c:

fprintf(stderr,"Entered %s in line %d\n",__FUNCTION__, __LINE__);
fprintf(stderr,"Exited %s in line %d\n",__FUNCTION__, __LINE__);

and recompiling and running the test code, I get the following:

getting wcs...
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 0
Re-assigned: wcs->types[1] = 0
Exited wcs_types in line 3178
Exited wcsset in line 2885
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 0
Re-assigned: wcs->types[1] = 0
Exited wcs_types in line 3178
Exited wcsset in line 2885
getting wcs...done
setting wcs...
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 2200
Re-assigned: wcs->types[1] = 2201
Exited wcs_types in line 3178
Exited wcsset in line 2885
setting wcs...done

So the third call (not the second) to wcs_types is what is setting the wcs_types array to [2200, 2201].

edit It's not wcs->flag/m_flag, wcs->types gets set when wcs->ctype is set.

(Sidenote: If wcs_types is translating the various forms of wcs->ctype, then it seems like a waste of cpu-time to call wcs_types when wcs->ctype is not set)

@manodeep
Copy link
Contributor

manodeep commented Apr 3, 2024

Another thing I don't understand is why re-setting types to [0,0] fixes the race. What does the types array control? For example, does resetting types mean that the transformations like world = wcs2.wcs.p2s(pix.copy(), 1)['world'] would be wrong?

@astrofrog
Copy link
Member Author

Just for the record here is an example I have been playing around with to try and see the issue @manodeep mentioned above:

import time
import numpy as np
from astropy.wcs import WCS

def show_repr_differences(repr1, repr2):
    for line1, line2 in zip(repr1.splitlines(), repr2.splitlines()):
        if line1 != line2:
            print(line1, line2)

wcs = WCS(naxis=2)
wcs.wcs.crpix = [-234.75, 8.3393]
wcs.wcs.cdelt = [-0.066667, 0.066667]
wcs.wcs.crval = [0, -90]
wcs.wcs.ctype = ["RA---AIR", "DEC--AIR"]
wcs.wcs.set()
r1 = repr(wcs.wcs)
wcs.wcs.set()
time.sleep(2.0)
r2 = repr(wcs.wcs)
show_repr_differences(r1, r2)

This returns, most of the time (but not fully deterministically), something like:

      types: 0x60000096c280       types: 0x600000968410
     tmpcrd: 0x60000096c290      tmpcrd: 0x600000968400

If the sleep is removed, the issue happens more rarely.

@manodeep
Copy link
Contributor

manodeep commented Apr 4, 2024

Ooo - using repr to diff within python is a great idea! I was printing everything and vimdiff'ing :)

The generated diff above, however, is expected. Calling wcs.wcs.set() invokes the C wcsset function, which in turn, calls the C function wcs_types. Within wcs_types, the pointer wcs->types is first freed (if already allocated), and then re-allocated. Presumably, without an intervening sleep, the OS simply re-allocates the last freed space that fulfills the allocation request and hence the types pointer usually gets back the same memory address.

Similarly, tmpcrd is freed and re-allocated from a call to linset in wcsset

@astrofrog
Copy link
Member Author

astrofrog commented Apr 4, 2024

@manodeep - ok sounds good, so out of curiosity what if you were to tweak the wcs_types code to avoid the freeing + reallocation, and just keeping it allocated if it already is and allocating if not? (of course I'm not suggesting this as an actual fix but more as an experiment to see if that also avoids the issues we are seeing).

@astrofrog
Copy link
Member Author

astrofrog commented Apr 4, 2024

So could it be that if one thread re-allocates types to a new address then between then and when the values are actually set, types could point to nonsense data in memory and another thread could be doing a conversion that could be affected by these garbage values? I guess maybe not since calloc is used so there shouldn't be garbage data and it should get initialized straight away?

@manodeep
Copy link
Contributor

manodeep commented Apr 4, 2024

@manodeep - ok sounds good, so out of curiosity what if you were to tweak the wcs_types code to avoid the freeing + reallocation, and just keeping it allocated if it already is and allocating if not? (of course I'm not suggesting this as an actual fix but more as an experiment to see if that also avoids the issues we are seeing).

In a similar attempt, I added an immediate return within wcsset as if (wcs->flag == WCSSET) return WCSERR_SUCCESS; - adding that if condition eliminated the race condition. However, I think that will produce incorrect results if someone changes wcs.ctype etc, after having previously set those values (so that wcs->flag == WCSSET but the wcs->types will correspond to the previous set of wcs->ctype and not the new values)

@manodeep
Copy link
Contributor

manodeep commented Apr 5, 2024

So could it be that if one thread re-allocates types to a new address then between then and when the values are actually set, types could point to nonsense data in memory and another thread could be doing a conversion that could be affected by these garbage values? I guess maybe not since calloc is used so there shouldn't be garbage data and it should get initialized straight away?

So what happens is that OS marks the memory area as free - the contents are NOT touched (unless specifically cleared out by user-code, which isn't the case here). So, either the other thread accesses the old memory area (with the correct values) or it accesses the new memory area (set to 0's by calloc). However, as I have seen, resetting types to [0, 0] seems to remove the race condition - so the calloc'ed memory should also be fine.

I still don't have a proper theory about why the race condition occurs. It looks to be related to wcs->types - since I can remove the race by resetting wcs->types = 0 within the wcs_types function. One option is that p2s finishes on one thread and s2p starts up before the p2s finishes on the second thread, and then the (small amount of) time where the different threads are doing different operations causes the incorrect results. But that would require types to depend on the operation (p2s/s2p) and I don't think that's true

@manodeep
Copy link
Contributor

manodeep commented Apr 5, 2024

So could it be that if one thread re-allocates types to a new address then between then and when the values are actually set, types could point to nonsense data in memory and another thread could be doing a conversion that could be affected by these garbage values? I guess maybe not since calloc is used so there shouldn't be garbage data and it should get initialized straight away?

So what happens is that OS marks the memory area as free - the contents are NOT touched (unless specifically cleared out by user-code, which isn't the case here). So, either the other thread accesses the old memory area (with the correct values) or it accesses the new memory area (set to 0's by calloc). However, as I have seen, resetting types to [0, 0] seems to remove the race condition - so the calloc'ed memory should also be fine.

May be what's happening is that the first part of the thread's work is done with the correct wcs->types and the second part is done with the wcs->types = 0, and of course, then the round-trip transform wouldn't yield the same results. This is a working possibility. I am going to check the world array values between a deepcopied wcs and a view wcs.

@manodeep
Copy link
Contributor

manodeep commented Apr 5, 2024

Removing the free and re-alloc did not fix the race.

To check again whether types might be the issue, I added an immediate return at the top of wcs_types with:

  if(wcs->types != NULL) return WCSERR_SUCCESS;

and that fixed the race condition. To me this confirms that the race condition is caused by something within the wcs_types function. One way to preserve correct behaviour would be to keep a static char * var (or adding another field within the wcsprm struct) with the previously set value of ctype[0:naxis] and then to return immediately from wcs_types if the current ctype matches the old ctype. That would avoid an error in the case where the user has changed ctype (I am not even sure if that's a thing that an user might/is-allowed-to do), plus avoiding unnecessary calculations when the types have already been set correctly in a previous invocation.

@manodeep
Copy link
Contributor

manodeep commented Apr 5, 2024

I might have a solution that does not break existing behaviour - assuming that it is okay to add a field to the struct wcsprm.

diff --git a/cextern/wcslib/C/wcs.c b/cextern/wcslib/C/wcs.c
index 1018371ce..7d7173c5e 100644
--- a/cextern/wcslib/C/wcs.c
+++ b/cextern/wcslib/C/wcs.c
@@ -188,6 +188,7 @@ int wcsinit(
     if (wcs->flag == -1) {
       wcs->tab   = 0x0;
       wcs->types = 0x0;
+      wcs->m_types = 0x0;
       wcs->lin.flag = -1;
     }
 
@@ -1729,6 +1730,7 @@ int wcsfree(struct wcsprm *wcs)
 
     // Allocated unconditionally by wcsset().
     if (wcs->types) free(wcs->types);
+    if (wcs->m_types) free(wcs->m_types);
 
     if (wcs->lin.crpix == wcs->m_crpix) wcs->lin.crpix = 0x0;
     if (wcs->lin.pc    == wcs->m_pc)    wcs->lin.pc    = 0x0;
@@ -1762,6 +1764,7 @@ int wcsfree(struct wcsprm *wcs)
   wcs->m_wtb = 0x0;
 
   wcs->types = 0x0;
+  wcs->m_types = 0x0;
 
   wcserr_clear(&(wcs->err));
 
@@ -2888,6 +2891,7 @@ int wcs_types(struct wcsprm *wcs)
 
 {
   static const char *function = "wcs_types";
+  static int prev_naxis = 0;
 
   const int  nalias = 6;
   const char aliases [6][4] = {"NCP", "GLS", "TPU", "TPV", "TNX", "ZPX"};
@@ -2908,12 +2912,37 @@ int wcs_types(struct wcsprm *wcs)
   const char *alt = "";
   if (*(wcs->alt) != ' ') alt = wcs->alt;
 
-
   int naxis = wcs->naxis;
-  if (wcs->types) free(wcs->types);
-  if ((wcs->types = calloc(naxis, sizeof(int))) == 0x0) {
+  if(wcs->types != NULL) {
+    if(prev_naxis != naxis) {
+      //to force the realloc and reset of wcs->m_types
+      prev_naxis = -1;
+    } else {
+      int changed_type = 0;
+      for(int i=0;i<naxis;i++) {
+        changed_type += (wcs->m_types[i] == wcs->types[i]) ? 0:1;
+      }
+      //this can also be true if naxis == 0 -> however, that shouldn't be
+      //the case since we are within a if (wcs->types != NULL): MS
+      if (changed_type == 0) {
+        return WCSERR_SUCCESS;//No change in wcs->type -> can return early
+      }
+    }
+  }
+
+
+  if (wcs->types == NULL || prev_naxis < naxis) {
+    wcs->types = reallocf(wcs->types, naxis*sizeof(wcs->types[0]));
+    memset(wcs->types, 0, naxis*sizeof(wcs->types[0]));
+    wcs->m_types = reallocf(wcs->m_types, naxis*sizeof(wcs->m_types[0]));
+    if(wcs->types == NULL || wcs->m_types == NULL) {
       return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
     }
+    for(int i=0;i<naxis;i++) {
+      wcs->m_types[i] = wcs->types[i];
+    }
+    prev_naxis = naxis;
+  }
 
   int *ndx = 0x0;
   for (int i = 0; i < naxis; i++) {
diff --git a/cextern/wcslib/C/wcs.h b/cextern/wcslib/C/wcs.h
index da430a2a4..3d8a958c6 100644
--- a/cextern/wcslib/C/wcs.h
+++ b/cextern/wcslib/C/wcs.h
@@ -2207,6 +2207,7 @@ struct wcsprm {
   int    *m_colax;
   char  (*m_cname)[72];
   double *m_crder, *m_csyer, *m_czphs, *m_cperi;
+  int *m_types; //stores previous value of wcs->types. used to compare with new value and return early from wcs_types() if they are the same
   struct auxprm *m_aux;
   struct tabprm *m_tab;
   struct wtbarr *m_wtb;

Made the following changes:

  • Added a new field (wcs->m_types of type int *) to struct wcsprm
  • Upon entry into wcs_types() function, if wcs->types is not NULL, then the values are checked against wcs->m_types. If both arrays are identical, then the wcs_types() returns successfully immediately.
  • if the values are not identical, then prev_naxis is set to -1 (which results in the condition in the next item being satisfied)
  • If wcs->types is NULL or if prev_naxis < naxis (i.e., more memory is required to store wcs->types), then the wcs->types/m_types arrays are reallocated (with a special reallocf function that frees the source pointer - need to confirm that that's available on Windows)
  • the naxis value is stored in a static int prev_naxis variable
  • rest of the wcs_types() function stays the same
  • added the corresponding free and reset for wcs->m_types into wcsfree()

This solves the race condition on my M2 Macbook laptop, not sure about other OS'. If this patch seems sensible then I can put that into a PR. Perhaps adding the test_wcs function as a regression test is also a good idea.

I am unsure whether modifying the struct wcsprm has knock-on effects and whether it would be better to add the patch upstream anyway.

@astrofrog
Copy link
Member Author

@manodeep - thanks for identifying a good fix for WCSLIB? I was wondering whether it would be easy enough to come up with a minimal C example that reproduces the bug, so that we can show this to Mark Calabretta? Ideally this fix should go straight into WCSLIB otherwise it will get undone next time we update WCSLIB.

@pllim pllim added external PRs and issues related to external packages vendored with Astropy (astropy.extern) Upstream Fix Required labels Apr 5, 2024
@pllim
Copy link
Member

pllim commented Apr 5, 2024

Gonna ping @mcara and @nden here for their FYI.

@mcara
Copy link
Contributor

mcara commented Apr 6, 2024

FMI, this is interesting... I tried modifying code as follows (in wcs.c) (I already know this does not work from a previous comment by @astrofrog ):

  int *new_types;
  int *old_types = wcs->types;
  if ((new_types = calloc(naxis, sizeof(int))) == 0x0)
  {
    return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
  }
  if (wcs->types) {
    memcpy(new_types, old_types, naxis * sizeof(int));
  }
  wcs->types = new_types;
  free(old_types);

This minimizes the time between switching buffers. In fact, wcs->types = new_types; should be atomic. I am still getting race condition. Maybe changes to WCSLIB should be more significant to make it thread-safe.

I think it is a great idea to make an easily reproducible example in C and bring this up to Mark Calabretta's attention.

@manodeep
Copy link
Contributor

manodeep commented Apr 6, 2024

After some more thought, I realise that my patch isn't quite right. Since wcs->types isn't exposed externally, my patch would not catch the case where the user has set ctype -- so the comparison has to be with the string prev_ctype and not the integer prev_types.

@mcara I am not sure I follow what you are trying to do. Did you mean to copy old_types into new_types and then just allow wcs->types to be set correctly (but with a race condition) within wcs_types()?

In any case, I will note that my (incorrect) patch still would not make the wcslib threadsafe. There are a few more memory assignments that occur, and memory assignments without a mutex within a parallel region are almost certain to have race conditions. To make wcslib fully thread-safe would be a bit more of an undertaking - mostly by avoiding/protecting the memory writes.

@manodeep
Copy link
Contributor

manodeep commented Apr 6, 2024

There might be an easier way out. Looks like all of the python setters call the note_change function which resets wcs->flag = 0. So, might be fine to return early from wcs_types() if wcs->flag == WCSSET (instead of going through cumbersome and potentially unsafe string comparisons)

@manodeep
Copy link
Contributor

manodeep commented Apr 6, 2024

This patch, based on checking the wcs->flag, works on my laptop:

diff --git a/cextern/wcslib/C/wcs.c b/cextern/wcslib/C/wcs.c
index 1018371ce..4eed871c6 100644
--- a/cextern/wcslib/C/wcs.c
+++ b/cextern/wcslib/C/wcs.c
@@ -2888,6 +2888,7 @@ int wcs_types(struct wcsprm *wcs)
 
 {
   static const char *function = "wcs_types";
+  static int prev_naxis = 0;
 
   const int  nalias = 6;
   const char aliases [6][4] = {"NCP", "GLS", "TPU", "TPV", "TNX", "ZPX"};
@@ -2910,10 +2911,18 @@ int wcs_types(struct wcsprm *wcs)
 
 
   int naxis = wcs->naxis;
-  if (wcs->types) free(wcs->types);
-  if ((wcs->types = calloc(naxis, sizeof(int))) == 0x0) {
+  if(wcs->types != NULL && wcs->flag == WCSSET && prev_naxis == naxis) {
+    return WCSERR_SUCCESS;//No change in wcs->type (otherwise, `note_change()` would reset wcs-flag to 0 -> can return early
+  }
+
+  if (wcs->types == NULL || prev_naxis < naxis) {
+    wcs->types = reallocf(wcs->types, naxis*sizeof(wcs->types[0]));
+    memset(wcs->types, 0, naxis*sizeof(wcs->types[0]));
+    if(wcs->types == NULL) {
       return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
     }
+    prev_naxis = naxis;
+  }
 
   int *ndx = 0x0;
   for (int i = 0; i < naxis; i++) {

Upside: Does not require a new field in the wcsprm struct, or the associated book-keeping for comparing strings (and potentially other fields to detect change).
Downside: Only works to solve the race condition within astropy but will not solve it within wcslib (or any other libraries that use the C wcslib directly)

@astrofrog
Copy link
Member Author

astrofrog commented Apr 6, 2024

@manodeep if the latest fix would only work for astropy anyway, what about just checking the flag variable in astropy and only calling wcsset if actually needed?

@manodeep
Copy link
Contributor

manodeep commented Apr 6, 2024

@astrofrog Yes but to avoid prefixing very wcsset call with a check for wcs->flag, it would be better to put the wcs->flag check and early return within wcsset.

@astrofrog
Copy link
Member Author

Ok sounds good - just to put it on the table, it is also an option to have our own wcsset wrapper that does the check to avoid prefexing every call, and the replacing the wrapper back once/if the fix ends up in WCSLIB. But maybe not worth it if the WCSLIB happens soon-ish.

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

Ooo - good idea! There is already the PyWcsprm_cset function that calls wcsset. We can add the check for wcs->flag == WCSSET and avoid the wcsset function call.

One slight wrinkle is that the WCSSET value is defined as static int within multiple files, and therefore, not available to use within wcslib_wrap.c. I can add a similar static int variable within wcslib_wrap.c for now, but may be I will check with Mark (by email) whether WCSSET can be defined as a macro within wcs.h.

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

This tiny patch fixes the race on my laptop and does not require any changes in the external wcslib

diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..0fc70db67 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
  * Helper functions                                                        *
  ***************************************************************************/
 
+static int WCSSET = 137;
+
 enum e_altlin {
   has_pc = 1,
   has_cd = 2,
@@ -1623,7 +1625,10 @@ PyWcsprm_cset(
   int status = 0;
 
   if (convert) wcsprm_python2c(&self->x);
+  if(self->x.flag != WCSSET) {
     status = wcsset(&self->x);
+  }
   if (convert) wcsprm_c2python(&self->x);
 
   if (status == 0) {

@astrofrog
Copy link
Member Author

@manodeep thabks! Is there any reason not to include some of the other code such as the c2python and python2c calls and some of the code below in the if statement?

@astrofrog
Copy link
Member Author

Could you open a PR with this change and also include a comment about the fact this check is important to make sure multi-threading works properly? (Just in case anyone ever looks in future)

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

@manodeep thabks! Is there any reason not to include some of the other code such as the c2python and python2c calls and some of the code below in the if statement?

Not sure I follow - the python2c and c2python codes (and everything else) are unchanged. I am only protecting the actual wcsset call within the if condition. The patch is literally 3 extra lines of code - one for the static int WCSSET declaration, and two for the if condition, including the closing braces. I can add the if condition separately, but that would still require the python2c and c2python functions - so made sense to add the if condition to the actual wcsset call site.

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

Could you open a PR with this change and also include a comment about the fact this check is important to make sure multi-threading works properly? (Just in case anyone ever looks in future)

Yes, and your python test code also should be included (though I am not sure how that test would be setup within astropy).

I will mention it again - this patch only solves one specific kind of thread-unsafe behaviour. Given that there are other memory writes to shared memory locations within struct wcs, wcslib remains thread-unsafe (at least within regions marked with Py_BEGIN_ALLOW_THREADS)

@astrofrog
Copy link
Member Author

astrofrog commented Apr 7, 2024

Not sure I follow - the python2c and c2python codes (and everything else) are unchanged. I am only protecting the actual wcsset call within the if condition

I guess what I mean is, is any of the remaining code in the function needed if we aren't actually calling wcsset?

In other words, why not just exit the cset function early before doing anything else?

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

Not sure I follow - the python2c and c2python codes (and everything else) are unchanged. I am only protecting the actual wcsset call within the if condition

I guess what I mean is, is any of the remaining code in the function needed if we aren't actually calling wcsset?

In other words, why not just exit the cset function early before doing anything else?

Yup - that works as well. That arrangement requires another c2python call but skips the rest of the function body with the early return. Happy to put in a PR with either one - is there one that is better aligned with the design philosophy of the rest of the codebase?

diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..afc71a4c5 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
  * Helper functions                                                        *
  ***************************************************************************/
 
+static int WCSSET = 137;
+
 enum e_altlin {
   has_pc = 1,
   has_cd = 2,
@@ -1623,6 +1625,11 @@ PyWcsprm_cset(
   int status = 0;
 
   if (convert) wcsprm_python2c(&self->x);
+  if(self->x.flag == WCSSET) {
+    if (convert) wcsprm_c2python(&self->x);
+    return 0;
+  }
+
   status = wcsset(&self->x);
   if (convert) wcsprm_c2python(&self->x);
 

@astrofrog
Copy link
Member Author

Just to make sure I understand, is python2c followed by c2python not a no op when we don't call wcsset? As in can't we skip the conversions altogether in that case?

@manodeep
Copy link
Contributor

manodeep commented Apr 7, 2024

I had kept those python2c and c2python functions in since that seemed to be the convention while calling C functions. But looks like the python2c and the c2python effectively translate nan<->undefined, which is not relevant in this case. Plus, the wcs->flag is also not included in the list of fields being fixed within wcsprm_fix_values (in pyutil.c). Since this will be an early return it should be okay to skip the round-trip conversion:

diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..2d31ca515 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
  * Helper functions                                                        *
  ***************************************************************************/
 
+static int WCSSET = 137;
+
 enum e_altlin {
   has_pc = 1,
   has_cd = 2,
@@ -1620,6 +1622,10 @@ PyWcsprm_cset(
     PyWcsprm* self,
     const int convert) {
 
+  if(self->x.flag == WCSSET) {
+    return 0;
+  }
+
   int status = 0;
 
   if (convert) wcsprm_python2c(&self->x);

This patch also solves the race condition

manodeep added a commit to manodeep/astropy that referenced this issue Apr 7, 2024
@manodeep manodeep linked a pull request Apr 7, 2024 that will close this issue
1 task
@manodeep
Copy link
Contributor

Here is a C code reproducer: https://gist.github.com/manodeep/23e5a2cc037752ce861bdb2998314137

@manodeep
Copy link
Contributor

Since I can't help myself, I profiled the C-code reproducer with the native Instruments app on my Macbook Air M2:

Seems that wcsp2s takes nearly 6x the time for wcss2p (specifically within this code section in airx2s)

        for (k = 0; k < 100; k++) {
          // Weighted division of the interval.
          lambda = (r2-r)/(r2-r1);
          if (lambda < 0.1) {
            lambda = 0.1;
          } else if (lambda > 0.9) {
            lambda = 0.9;
          }
          cosxi = x2 - lambda*(x2-x1);

          tanxi = sqrt(1.0-cosxi*cosxi)/cosxi;
          rt = -(log(cosxi)/tanxi + prj->w[1]*tanxi);

          if (rt < r) {
            if (r-rt < tol) break;
            r1 = rt;
            x1 = cosxi;
          } else {
            if (rt-r < tol) break;
            r2 = rt;
            x2 = cosxi;
          }
        }

Output in Instruments (Macbook Air M2 -> branch predictions not good compared to x86_64 branch predictions -> if's are expensive)

29.64% rt = -(log(cosxi)/tanxi + prj->w[1]*tanxi);
17.42% } else if (lambda > 0.9) {
13.57% *phip = atan2d(xj, -yj);
12.44% lambda = 0.1;
7.31% if (r-rt < tol) break;
5.33% cosxi = x2 - lambda*(x2-x1);
3.47% xi = acosd(cosxi);
3.04% lambda = 0.9;
1.95% if (rt-r < tol) break;

@pllim
Copy link
Member

pllim commented Apr 15, 2024

Just a drive-by comment and I am no expert, so feel free to ignore if I make no sense. If most cases are going to else if, would it make sense to swap so that the else if becomes the primary if instead?

@mcara
Copy link
Contributor

mcara commented Apr 15, 2024

Just a drive-by comment and I am no expert, so feel free to ignore if I make no sense. If most cases are going to else if, would it make sense to swap so that the else if becomes the primary if instead?

Yes, it would make sense. Depending on CPU, it may be able to predict the path. If you know for sure that one branch will be execute more than the other one, there may be a benefit (small or large depending on the CPU, if the code is in a loop, etc.)

@manodeep
Copy link
Contributor

Just a drive-by comment and I am no expert, so feel free to ignore if I make no sense. If most cases are going to else if, would it make sense to swap so that the else if becomes the primary if instead?

The percent values to the left are the amount of time taken by that line relative to the execution time, which may or may not relate to how frequently the branch was taken. On the usual x86_64 architecture, it does make sense to swap the orders to put the more likely taken branch up front, and that might be the case here too. A potentially better method might be to eliminate the if condition completely and use ternary expressions (? operators) instead (see this SO article)

(Your comment caused me to go and try to find out about the branch predictor hardware in ARMv8 - so your "drive-by" comments are appreciated :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug external PRs and issues related to external packages vendored with Astropy (astropy.extern) multi-threading Upstream Fix Required wcs
Projects
None yet
4 participants