New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threading issue in WCSLIB even if input data is copied #16245
Comments
I made some progress on this issue. The race condition could be created by an access (and only an access) to any of the four fields - I will report back as I get further insight on what exactly within |
wcsset calls wcs_types - which, sets On a deepcopy, |
What I don't quite understand is that wcs.wcs.set() is called before the multi threading so I don't understand why types is set to 0,0 at any point - shouldn't it be 2200,2201 and remain that? |
I don't understand it either but presumably print("getting wcs...",file=sys.stderr)
wcs = WCS(naxis=2)
print("getting wcs...done",file=sys.stderr)
wcs.wcs.crpix = [-234.75, 8.3393]
wcs.wcs.cdelt = np.array([-0.066667, 0.066667])
wcs.wcs.crval = [0, -90]
wcs.wcs.ctype = ["RA---AIR", "DEC--AIR"]
print("setting wcs...", file=sys.stderr)
wcs.wcs.set()
print("setting wcs...done",file=sys.stderr) And, at the end of for(int i=0;i<naxis;i++) {
fprintf(stderr,"Re-assigned: wcs->types[%d] = %d\n", i, wcs->types[i]);
} Plus, markers like this at the start and end of fprintf(stderr,"Entered %s in line %d\n",__FUNCTION__, __LINE__);
fprintf(stderr,"Exited %s in line %d\n",__FUNCTION__, __LINE__); and recompiling and running the test code, I get the following: getting wcs...
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 0
Re-assigned: wcs->types[1] = 0
Exited wcs_types in line 3178
Exited wcsset in line 2885
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 0
Re-assigned: wcs->types[1] = 0
Exited wcs_types in line 3178
Exited wcsset in line 2885
getting wcs...done
setting wcs...
Entered wcsset in line 2496
Entered wcs_types in line 2896
Re-assigned: wcs->types[0] = 2200
Re-assigned: wcs->types[1] = 2201
Exited wcs_types in line 3178
Exited wcsset in line 2885
setting wcs...done So the third call (not the second) to edit It's not (Sidenote: If |
Another thing I don't understand is why re-setting |
Just for the record here is an example I have been playing around with to try and see the issue @manodeep mentioned above: import time
import numpy as np
from astropy.wcs import WCS
def show_repr_differences(repr1, repr2):
for line1, line2 in zip(repr1.splitlines(), repr2.splitlines()):
if line1 != line2:
print(line1, line2)
wcs = WCS(naxis=2)
wcs.wcs.crpix = [-234.75, 8.3393]
wcs.wcs.cdelt = [-0.066667, 0.066667]
wcs.wcs.crval = [0, -90]
wcs.wcs.ctype = ["RA---AIR", "DEC--AIR"]
wcs.wcs.set()
r1 = repr(wcs.wcs)
wcs.wcs.set()
time.sleep(2.0)
r2 = repr(wcs.wcs)
show_repr_differences(r1, r2) This returns, most of the time (but not fully deterministically), something like:
If the sleep is removed, the issue happens more rarely. |
Ooo - using repr to diff within python is a great idea! I was printing everything and vimdiff'ing :) The generated diff above, however, is expected. Calling Similarly, |
@manodeep - ok sounds good, so out of curiosity what if you were to tweak the wcs_types code to avoid the freeing + reallocation, and just keeping it allocated if it already is and allocating if not? (of course I'm not suggesting this as an actual fix but more as an experiment to see if that also avoids the issues we are seeing). |
So could it be that if one thread re-allocates |
In a similar attempt, I added an immediate return within |
So what happens is that OS marks the memory area as free - the contents are NOT touched (unless specifically cleared out by user-code, which isn't the case here). So, either the other thread accesses the old memory area (with the correct values) or it accesses the new memory area (set to 0's by I still don't have a proper theory about why the race condition occurs. It looks to be related to |
May be what's happening is that the first part of the thread's work is done with the correct |
Removing the free and re-alloc did not fix the race. To check again whether types might be the issue, I added an immediate return at the top of if(wcs->types != NULL) return WCSERR_SUCCESS; and that fixed the race condition. To me this confirms that the race condition is caused by something within the |
I might have a solution that does not break existing behaviour - assuming that it is okay to add a field to the diff --git a/cextern/wcslib/C/wcs.c b/cextern/wcslib/C/wcs.c
index 1018371ce..7d7173c5e 100644
--- a/cextern/wcslib/C/wcs.c
+++ b/cextern/wcslib/C/wcs.c
@@ -188,6 +188,7 @@ int wcsinit(
if (wcs->flag == -1) {
wcs->tab = 0x0;
wcs->types = 0x0;
+ wcs->m_types = 0x0;
wcs->lin.flag = -1;
}
@@ -1729,6 +1730,7 @@ int wcsfree(struct wcsprm *wcs)
// Allocated unconditionally by wcsset().
if (wcs->types) free(wcs->types);
+ if (wcs->m_types) free(wcs->m_types);
if (wcs->lin.crpix == wcs->m_crpix) wcs->lin.crpix = 0x0;
if (wcs->lin.pc == wcs->m_pc) wcs->lin.pc = 0x0;
@@ -1762,6 +1764,7 @@ int wcsfree(struct wcsprm *wcs)
wcs->m_wtb = 0x0;
wcs->types = 0x0;
+ wcs->m_types = 0x0;
wcserr_clear(&(wcs->err));
@@ -2888,6 +2891,7 @@ int wcs_types(struct wcsprm *wcs)
{
static const char *function = "wcs_types";
+ static int prev_naxis = 0;
const int nalias = 6;
const char aliases [6][4] = {"NCP", "GLS", "TPU", "TPV", "TNX", "ZPX"};
@@ -2908,12 +2912,37 @@ int wcs_types(struct wcsprm *wcs)
const char *alt = "";
if (*(wcs->alt) != ' ') alt = wcs->alt;
-
int naxis = wcs->naxis;
- if (wcs->types) free(wcs->types);
- if ((wcs->types = calloc(naxis, sizeof(int))) == 0x0) {
+ if(wcs->types != NULL) {
+ if(prev_naxis != naxis) {
+ //to force the realloc and reset of wcs->m_types
+ prev_naxis = -1;
+ } else {
+ int changed_type = 0;
+ for(int i=0;i<naxis;i++) {
+ changed_type += (wcs->m_types[i] == wcs->types[i]) ? 0:1;
+ }
+ //this can also be true if naxis == 0 -> however, that shouldn't be
+ //the case since we are within a if (wcs->types != NULL): MS
+ if (changed_type == 0) {
+ return WCSERR_SUCCESS;//No change in wcs->type -> can return early
+ }
+ }
+ }
+
+
+ if (wcs->types == NULL || prev_naxis < naxis) {
+ wcs->types = reallocf(wcs->types, naxis*sizeof(wcs->types[0]));
+ memset(wcs->types, 0, naxis*sizeof(wcs->types[0]));
+ wcs->m_types = reallocf(wcs->m_types, naxis*sizeof(wcs->m_types[0]));
+ if(wcs->types == NULL || wcs->m_types == NULL) {
return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
}
+ for(int i=0;i<naxis;i++) {
+ wcs->m_types[i] = wcs->types[i];
+ }
+ prev_naxis = naxis;
+ }
int *ndx = 0x0;
for (int i = 0; i < naxis; i++) {
diff --git a/cextern/wcslib/C/wcs.h b/cextern/wcslib/C/wcs.h
index da430a2a4..3d8a958c6 100644
--- a/cextern/wcslib/C/wcs.h
+++ b/cextern/wcslib/C/wcs.h
@@ -2207,6 +2207,7 @@ struct wcsprm {
int *m_colax;
char (*m_cname)[72];
double *m_crder, *m_csyer, *m_czphs, *m_cperi;
+ int *m_types; //stores previous value of wcs->types. used to compare with new value and return early from wcs_types() if they are the same
struct auxprm *m_aux;
struct tabprm *m_tab;
struct wtbarr *m_wtb; Made the following changes:
This solves the race condition on my M2 Macbook laptop, not sure about other OS'. If this patch seems sensible then I can put that into a PR. Perhaps adding the I am unsure whether modifying the |
@manodeep - thanks for identifying a good fix for WCSLIB? I was wondering whether it would be easy enough to come up with a minimal C example that reproduces the bug, so that we can show this to Mark Calabretta? Ideally this fix should go straight into WCSLIB otherwise it will get undone next time we update WCSLIB. |
FMI, this is interesting... I tried modifying code as follows (in int *new_types;
int *old_types = wcs->types;
if ((new_types = calloc(naxis, sizeof(int))) == 0x0)
{
return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
}
if (wcs->types) {
memcpy(new_types, old_types, naxis * sizeof(int));
}
wcs->types = new_types;
free(old_types); This minimizes the time between switching buffers. In fact, I think it is a great idea to make an easily reproducible example |
After some more thought, I realise that my patch isn't quite right. Since @mcara I am not sure I follow what you are trying to do. Did you mean to copy In any case, I will note that my (incorrect) patch still would not make the |
There might be an easier way out. Looks like all of the python setters call the |
This patch, based on checking the diff --git a/cextern/wcslib/C/wcs.c b/cextern/wcslib/C/wcs.c
index 1018371ce..4eed871c6 100644
--- a/cextern/wcslib/C/wcs.c
+++ b/cextern/wcslib/C/wcs.c
@@ -2888,6 +2888,7 @@ int wcs_types(struct wcsprm *wcs)
{
static const char *function = "wcs_types";
+ static int prev_naxis = 0;
const int nalias = 6;
const char aliases [6][4] = {"NCP", "GLS", "TPU", "TPV", "TNX", "ZPX"};
@@ -2910,10 +2911,18 @@ int wcs_types(struct wcsprm *wcs)
int naxis = wcs->naxis;
- if (wcs->types) free(wcs->types);
- if ((wcs->types = calloc(naxis, sizeof(int))) == 0x0) {
+ if(wcs->types != NULL && wcs->flag == WCSSET && prev_naxis == naxis) {
+ return WCSERR_SUCCESS;//No change in wcs->type (otherwise, `note_change()` would reset wcs-flag to 0 -> can return early
+ }
+
+ if (wcs->types == NULL || prev_naxis < naxis) {
+ wcs->types = reallocf(wcs->types, naxis*sizeof(wcs->types[0]));
+ memset(wcs->types, 0, naxis*sizeof(wcs->types[0]));
+ if(wcs->types == NULL) {
return wcserr_set(WCS_ERRMSG(WCSERR_MEMORY));
}
+ prev_naxis = naxis;
+ }
int *ndx = 0x0;
for (int i = 0; i < naxis; i++) { Upside: Does not require a new field in the |
@manodeep if the latest fix would only work for astropy anyway, what about just checking the flag variable in astropy and only calling wcsset if actually needed? |
@astrofrog Yes but to avoid prefixing very wcsset call with a check for wcs->flag, it would be better to put the wcs->flag check and early return within wcsset. |
Ok sounds good - just to put it on the table, it is also an option to have our own wcsset wrapper that does the check to avoid prefexing every call, and the replacing the wrapper back once/if the fix ends up in WCSLIB. But maybe not worth it if the WCSLIB happens soon-ish. |
Ooo - good idea! There is already the PyWcsprm_cset function that calls One slight wrinkle is that the |
This tiny patch fixes the race on my laptop and does not require any changes in the external wcslib diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..0fc70db67 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
* Helper functions *
***************************************************************************/
+static int WCSSET = 137;
+
enum e_altlin {
has_pc = 1,
has_cd = 2,
@@ -1623,7 +1625,10 @@ PyWcsprm_cset(
int status = 0;
if (convert) wcsprm_python2c(&self->x);
+ if(self->x.flag != WCSSET) {
status = wcsset(&self->x);
+ }
if (convert) wcsprm_c2python(&self->x);
if (status == 0) { |
@manodeep thabks! Is there any reason not to include some of the other code such as the c2python and python2c calls and some of the code below in the if statement? |
Could you open a PR with this change and also include a comment about the fact this check is important to make sure multi-threading works properly? (Just in case anyone ever looks in future) |
Not sure I follow - the python2c and c2python codes (and everything else) are unchanged. I am only protecting the actual |
Yes, and your python test code also should be included (though I am not sure how that test would be setup within astropy). I will mention it again - this patch only solves one specific kind of thread-unsafe behaviour. Given that there are other memory writes to shared memory locations within |
I guess what I mean is, is any of the remaining code in the function needed if we aren't actually calling wcsset? In other words, why not just exit the cset function early before doing anything else? |
Yup - that works as well. That arrangement requires another diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..afc71a4c5 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
* Helper functions *
***************************************************************************/
+static int WCSSET = 137;
+
enum e_altlin {
has_pc = 1,
has_cd = 2,
@@ -1623,6 +1625,11 @@ PyWcsprm_cset(
int status = 0;
if (convert) wcsprm_python2c(&self->x);
+ if(self->x.flag == WCSSET) {
+ if (convert) wcsprm_c2python(&self->x);
+ return 0;
+ }
+
status = wcsset(&self->x);
if (convert) wcsprm_c2python(&self->x);
|
Just to make sure I understand, is python2c followed by c2python not a no op when we don't call wcsset? As in can't we skip the conversions altogether in that case? |
I had kept those python2c and c2python functions in since that seemed to be the convention while calling C functions. But looks like the python2c and the c2python effectively translate nan<->undefined, which is not relevant in this case. Plus, the diff --git a/astropy/wcs/src/wcslib_wrap.c b/astropy/wcs/src/wcslib_wrap.c
index aa5b9da1c..2d31ca515 100755
--- a/astropy/wcs/src/wcslib_wrap.c
+++ b/astropy/wcs/src/wcslib_wrap.c
@@ -42,6 +42,8 @@
* Helper functions *
***************************************************************************/
+static int WCSSET = 137;
+
enum e_altlin {
has_pc = 1,
has_cd = 2,
@@ -1620,6 +1622,10 @@ PyWcsprm_cset(
PyWcsprm* self,
const int convert) {
+ if(self->x.flag == WCSSET) {
+ return 0;
+ }
+
int status = 0;
if (convert) wcsprm_python2c(&self->x); This patch also solves the race condition |
Here is a C code reproducer: https://gist.github.com/manodeep/23e5a2cc037752ce861bdb2998314137 |
Since I can't help myself, I profiled the C-code reproducer with the native Instruments app on my Macbook Air M2: Seems that for (k = 0; k < 100; k++) {
// Weighted division of the interval.
lambda = (r2-r)/(r2-r1);
if (lambda < 0.1) {
lambda = 0.1;
} else if (lambda > 0.9) {
lambda = 0.9;
}
cosxi = x2 - lambda*(x2-x1);
tanxi = sqrt(1.0-cosxi*cosxi)/cosxi;
rt = -(log(cosxi)/tanxi + prj->w[1]*tanxi);
if (rt < r) {
if (r-rt < tol) break;
r1 = rt;
x1 = cosxi;
} else {
if (rt-r < tol) break;
r2 = rt;
x2 = cosxi;
}
} Output in Instruments (Macbook Air M2 -> branch predictions not good compared to x86_64 branch predictions -> if's are expensive) 29.64% rt = -(log(cosxi)/tanxi + prj->w[1]*tanxi);
17.42% } else if (lambda > 0.9) {
13.57% *phip = atan2d(xj, -yj);
12.44% lambda = 0.1;
7.31% if (r-rt < tol) break;
5.33% cosxi = x2 - lambda*(x2-x1);
3.47% xi = acosd(cosxi);
3.04% lambda = 0.9;
1.95% if (rt-r < tol) break; |
Just a drive-by comment and I am no expert, so feel free to ignore if I make no sense. If most cases are going to |
Yes, it would make sense. Depending on CPU, it may be able to predict the path. If you know for sure that one branch will be execute more than the other one, there may be a benefit (small or large depending on the CPU, if the code is in a loop, etc.) |
The percent values to the left are the amount of time taken by that line relative to the execution time, which may or may not relate to how frequently the branch was taken. On the usual (Your comment caused me to go and try to find out about the branch predictor hardware in ARMv8 - so your "drive-by" comments are appreciated :)) |
In #16244 I mentioned a multi-threading issue that is due to the input array being passed through to the WCS C extension and being subsequently modified in-place. There is however another issue that occurs even if the input data is copied, and is illustrated by the following example:
In this case I am using
p2s
ands2p
but the issue can be seen also withwcs_pix2world
andwcs_world2pix
– the reason for usingp2s
ands2p
directly is just to try and simplify the problem as much as possible to use just calls to the WCS C extension. The above example gives:One of the weird things is that the line just accessing:
appears to be required to trigger this issue. Calling
wcs.wcs.lat
also causes the issue, but not for examplewcs.wcs.equinox
which then works fine. Removing this line solves the issue.Changing the function so that it does a copy of the WCS object:
also fixes the issue.
@manodeep and I have been discussing this and he is investigating (but if anyone else has ideas feel free to chime in!)
The text was updated successfully, but these errors were encountered: