You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the desitarget.geomask.match() function only works if there are no duplicates in the input arguments.
this is correctly written in the docstr; however, not everyone carefully reads a docstr :)
if I m correct, with the current version, when the arguments contain duplicates, the code just proceeds, but the outputs are not meaningful; so it s likely the user will be hurt by that.
that s why it would be better/safer to add a check that none of the arguments have duplicates; and just return an errror if they do.
something like:
if np.unique(A).size != len(A):
msg = "A contains duplicates"
log.error(msg)
raise ValueError(msg)
if np.unique(B).size != len(B):
msg = "B contains duplicates"
log.error(msg)
raise ValueError(msg)
also, with writing the code snippet above, I m wondering if A and B should only be 1d-array...
The text was updated successfully, but these errors were encountered:
add an argument to say if the check for duplicates should be done or not;
set the default to True;
e.g.: match(A, B, check_for_dups=True); and also for the "twin" function: match_to(A, B, check_for_dups=True).
would that make sense?
the motivation for that is that this check can take few seconds for 1e8 arrays, and if one is really looking for speed -- and already knows that the inputs don t have duplicates, this could be a "loss" of time.
[context: I ve a in-development fiberassign update using match(), to speed up the code when running at once several hundreds of tiles with several tens of millions of targets]
Ah, yes, sorry, this fell off my plate for a while.
Would it be better to set the default to check_for_dups=False for strict backward-compatibility? It's likely people use this function in other places where speed is important.
the
desitarget.geomask.match()
function only works if there are no duplicates in the input arguments.this is correctly written in the docstr; however, not everyone carefully reads a docstr :)
if I m correct, with the current version, when the arguments contain duplicates, the code just proceeds, but the outputs are not meaningful; so it s likely the user will be hurt by that.
that s why it would be better/safer to add a check that none of the arguments have duplicates; and just return an errror if they do.
something like:
also, with writing the code snippet above, I m wondering if A and B should only be 1d-array...
The text was updated successfully, but these errors were encountered: