Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement of the "refpts" argument #547

Merged
merged 13 commits into from
Jan 14, 2023
Merged

Improvement of the "refpts" argument #547

merged 13 commits into from
Jan 14, 2023

Conversation

lfarv
Copy link
Contributor

@lfarv lfarv commented Dec 30, 2022

This is major upgrade concerning the definition of the common "refpts" argument.

Up to now, the "refpts" value is restricted to two forms : integer form and boolean form. Three utility functions: get_cells, get_refpts and get_elements can convert other forms like element FamNames, element types or filter functions to the two basic forms.

The refpts argument can be processed by 4 main functions:

  • uint32_refpts: sanity check and conversion to the integer form
  • bool_refpts: sanity check and conversion to the bool form
  • refpts_count: count of the selected lattice elements
  • refpts_iterator: iterate through the selected lattice elements

Here we will upgrade these 4 functions to accept any form accepted until now by get_cells, get_refpts... So now, the refpts argument may one of:

  1. an integer in the range [-len(lattice), len(lattice)-1] selecting an element according to the python
    indexing rules. As a special case, len(lattice) is allowed and refers to the end of the last element,
  2. an ordered list of such integers without duplicates,
  3. a numpy array of booleans of maximum length len(lattice)+1, where selected elements are True,
  4. None, meaning an empty selection,
  5. All, new keyword meaning "all possible reference points": the entrance of all
    elements plus the end of the last element,
  6. End, new keyword selecting the end of the last element,
  7. an element type, selecting all the elements of that type in
    the lattice, e.g. at.Sextupole,
  8. a string, selecting all the elements whose FamName attribute matches it.
    Unix shell-style wildcards are accepted, e.g. "Q[FD]*",
  9. a callable filtfunc such that filtfunc(elem) is True for selected elements

This opens new possibilities without breaking anything.

@lfarv
Copy link
Contributor Author

lfarv commented Dec 30, 2022

Consequences

Directly indexing a Lattice with the new formats also works. Examples:

  • ring["Q[FD]*"] is the list of elements whose name matches "Q[FD]*"
  • ring[Quadrupole] is the list of all quadrupoles

The three utility functions mentioned above still behave as before, but are now redondant:

  • ring.get_cells(key) is now almost equivalent to ring.bool_refpts(key): it converts any form to an array of bool. Except for the treatment of str arguments: it is considered as an attribute name in get_cells and as an element name if bool_refpts,
  • ring.get_refpts(key) is exactly the same as ring.uint32_refpts(key): it converts any form to an array of int,
  • ring.get_elements(key) is almost the same as ring[key], except that the result is either a list or a Lattice.

Caution

The new features should automatically appear in all functions having a "refpts" argument, without any modifications.

  • There is however a restriction:
    Unlike the corresponding Lattice methods, the 3 functions uint32_refpts, bool_refpts and refpts_count do not have a "ring" argument, so they cannot accept the new formats. Consequently, all functions using them instead of the corresponding Lattice method will still be limited to the present behaviour. Switching to the Lattice method will open the new features. For instance this has been done in lattice_pass and the find_orbit* functions

  • There may be some AT functions implicitly assuming that "refpts" is an array of integers, which is wrong. They will fail up to now if it is an array of bool. They will a fortiori fail with the new "refpts" formats. Before making use of refpts, one should always make sure it has been converted to the desired form

@lfarv
Copy link
Contributor Author

lfarv commented Dec 30, 2022

Compatibility

A few changes have been made for cleanliness, but can be reverted:

  • In the case of a wrong integer input, a ValueError was raised. It is now an IndexError, which is more correct,
  • An option for printing the name of matching elements (for name-based selection) is removed,
  • Indexing with lists of float instead of int was allowed. It now raises a TypeError,
  • In addition to the selection based on element classes, get_refpts also offered a selection based on element instances. It is removed from the list of "refpts" formats.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 2, 2023

There are large modifications to the documentation (lattice.utils and lattice.lattice_object). You can have a preview of the new version here.

@lfarv lfarv added enhancement Python For python AT code labels Jan 2, 2023
get_cells(ring, attrname) -> BoolRefpts
get_cells(ring, attrname, attrvalue) -> BoolRefpts
Returns a bool array of element indices, selecting ring elements.

With minor modifications, this function can be replaced by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to indicate with this comment? You have already mentioned elsewhere that this does not have the same behaviour as Lattice.bool_refpts.

Are we to treat one of these as deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we to treat one of these as deprecated?

I would say so (see below) …

@MJGaughran
Copy link
Contributor

I think a lot of my confusion when reading these changes is due to how the original functions behave. E.g. get_cells() returns bool refpts (and what is a cell here?), while get_refpts() returns integer refpts etc. Not only that, get_refpts() actually requires refpts as input, so its name is even stranger.

This pull request 'opens new possibilities', but I wonder whether we should put more effort into directing people to a subset of these functions?

Are we able to consider deprecating certain functions, now that we have more experience using pyat? We still haven't made a major version release of pyat.

I'm much more interested in hearing your thoughts on this topic, than actually demanding these changes as part of your pull request. Perhaps you already have long-term plans for the full release of pyat that I am not aware of.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 3, 2023

@MJGaughran : You are pointing out some of the reasons driving these changes:

We try here to ease a very common task in AT: selecting elements in a lattice, whether you want to act of these elements, or want to define "points of interest" to get results. The present behaviour is the exact copy of the Matlab way: indexing a lattice with either an array of integers or an array of booleans (by the way, the name of get_cells also comes from that: a lattice is a "cell array", and an element is a "cell").

So in most usual cases, this requires a 2-step sequence: 1st generate the indexing array with get_cells (or get_refpts), and 2nd provide this array to the "computing" function.

What I propose here is to make this simpler by skipping the 1st step and having all functions having a refpts argument (there are a lot) accept all the various selectors accepted by get_cells/get_refpts.

but I wonder whether we should put more effort into directing people to a subset of these functions?

Are we able to consider deprecating certain functions, now that we have more experience using pyat? We still haven't made a major version release of pyat.

I agree with you, a subset of the existing function would now be sufficient. For naming reasons, I would favour using bool_refpts and uint32_refpts, and forgetting get_cells and get_refpts. In simple cases, they are not needed any more. For more sophisticated selections , the intermediate form of a bool array is very convenient by allowing the use of the element-wise boolean operators "|", "&" to complement or combine selections. bool_refpts is good enough for that.

However, I don't think we can still make breaking changes to the interface: there is already too much user code around. But there is a lot to do in the documentation, to, as you said, 'direct people to a subset'. For instance rewrite this. Should we also hide some functions from the API doc (get_cells, get_refpts, get_elements) ?

Now, concerning a 'full release' of PyAT… I'd rather think that we are already in an infinitely evolutive situation. We missed version 1.0.0, but we could jump to it any time…

@MJGaughran
Copy link
Contributor

To me, the clear interface would include get_elements and get_refpts. Why do we need two different but effectively equivalent forms for the refpts (bool and int)?

A function such as get_elements also makes sense in terms of OOP in Python, I would have thought.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 4, 2023

Why do we need two different but effectively equivalent forms for the refpts (bool and int)?

These two forms are exactly the ones used in numpy's advanced indexing. Each has its advantages, depending what you want to do. In particular, boolean array indexing is very convenient, as I showed above. I think we should stick to that!

To me, the clear interface would include get_elements and get_refpts

  • ring.get_elements(key) is the same as ring[key). get_elements will remain for compatibility, the question is whether we want to deprecate it or not,
  • get_refpts is now strictly identical to uint32_refpts. I prefer uint32_refpts because the name clearly shows what it does (it's a matter of taste). As for get_elements, it will remain available for compatibility, and if you think we should keep it in the doc, no problem.

So I understand that you would rather keep all functions documented as they are now, that's fine for me. If you have ideas to modify and improve the updated documentation, tell me.

@MJGaughran
Copy link
Contributor

You have made good arguments for picking those. I prefer having verbs in method names but that's only an opinion.

My inclination is to mark the other functions as deprecated, regardless of whether we actually come to remove them. Could removing these functions from documentation make it harder to know how to migrate to the preferred methods?

@lfarv
Copy link
Contributor Author

lfarv commented Jan 6, 2023

I prefer having verbs in method names but that's only an opinion.

That makes sense, I take note…

My inclination is to mark the other functions as deprecated, regardless of whether we actually come to remove them. Could removing these functions from documentation make it harder to know how to migrate to the preferred methods?

So I'll add a deprecation message in the 3 functions, more precise than the present sentence.

@swhite2401
Copy link
Contributor

All this looks fine for me, I support having self explanatory function name in fact the names: get_uint32_refpts and get_bool_repfts would make more sense to me. get_elements is perfectly fine in this sense.

However, I am wondering what a deprecation warning means if we do not intend to remove the function, is it really necessary or useful?
Can't we consider removing all "useless" functions in a future release? As long as this is clearly documented and associated with a single release and version number increment, I suppose this is fine. This would make the code easier to maintain over the long run rather that keeping a large number of functions doing the same thing no?

@MJGaughran
Copy link
Contributor

I have to agree with all of Simon's points.

A deprecation warning means new users will know to use the improved interface. It means we can drop the older functions from any tutorials etc, and using this module is less confusing.

I would have pushed to remove these functions for a major version 2, regardless of what we agreed at this point.

@MJGaughran
Copy link
Contributor

MJGaughran commented Jan 9, 2023

If this is our intention, it is perhaps a better idea to more carefully construct our preferred interface at this stage, rather than trying to bodge things to keep backwards compatibility.

If we can merge in any outstanding behaviour changes for a version 1 release, we can then look towards an improved interface for version 2.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 9, 2023

Trying to summarise your comments:

Now (this PR):

  • we switch to get_bool_repfts and get_uint32_refpts for the 2 remaining functions. Unless you have ideas for a shorter name,
  • we put a deprecation warning in the doc of all other functions for the exact reason given by @MJGaughran : new users are pushed to use the new interface.

On the medium term:

I'm very reluctant to remove the useless functions. Backward compatibility is crucial, I'm very concerned by comments like "My code was working 6 months ago and it fails now…". Keep aliases does not cost anything and we can keep the code clean by moving them all to a dedicated module (deprecated.py, it can be done right now). With that, they don't clutter the code any more.
We can emit a warning at run time, pushing users to update their code

On the long term:

At some point we can first completely remove the useless functions from the doc
Then, decide to remove them or not…

@swhite2401
Copy link
Contributor

Ok for me, a separate module to hold deprecated functions is a good compromise.
I would keep get_elements in the standard distribution without any warning though.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 10, 2023

@swhite2401 : Ok to leave get_elements as it is.
No deprecation mention, but still a line mentioning that ring[refpts] is a shorter and more intuitive way to get the same as ring.get_elements(refpts)

@lfarv
Copy link
Contributor Author

lfarv commented Jan 10, 2023

Replacing Lattice.bool_refpts by Lattice.get_bool_refpts everywhere in PyAT means modifying many files. I would not do that in this branch.

So I'll move useless files in a deprecated.py module, create and document the new methods right now in this PR and upgrade all the files already modified until now. The rest of the upgrade will be done later in another PR.

@lfarv
Copy link
Contributor Author

lfarv commented Jan 10, 2023

@swhite2401 and @MJGaughran : Is it ok for you now ?

@lfarv
Copy link
Contributor Author

lfarv commented Jan 11, 2023

@swhite and @MJGaughran: A last question before finalising te thing:
Since we are now defining the final interface, I come back to the naming. What is your preference for bool_refpts (similar for uint32_refpts:

  • bool_refpts: present name, best compatibility,
  • get_bool_refpts: the present choice in this PR,
  • get_bool_index: the most explicit, my preference,
  • get_bool: shorter…
  • anything shirt and explicit that you may suggest

Thanks for your help…

@swhite2401
Copy link
Contributor

Hello @lfarv, in fact I do agree with you refpts is a historical naming that is a bit obscure for new users, changing it to index would be much nicer

@lfarv lfarv merged commit a28a44c into master Jan 14, 2023
@lfarv lfarv deleted the refpts_improvement branch January 14, 2023 19:46
@lfarv lfarv mentioned this pull request Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Python For python AT code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants