Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize sorting of class bodies #11

Open
bwhmather opened this issue Jun 14, 2021 · 12 comments
Open

Finalize sorting of class bodies #11

bwhmather opened this issue Jun 14, 2021 · 12 comments
Labels
help wanted Extra attention is needed question Further information is requested
Milestone

Comments

@bwhmather
Copy link
Owner

Before sorting topologically, ssort currently pre-sorts class members in the following order:

  • Docstring.
  • Special attributes (__slots__ and __doc__).
  • Regular class attributes.
  • Lifecycle d'under methods (e.g. __init__ or __del__) in fixed order.
  • Regular methods in original order.
  • All other d'under methods, in fixed order.

I think this is basically sensible but, unlike topological sorting of top-level statements, there is no strong argument for it to be this way.

I'd like to invite comments or opinions on how this order could be improved before the 1.0 release.

@bwhmather bwhmather added help wanted Extra attention is needed question Further information is requested labels Jun 14, 2021
@bwhmather bwhmather added this to the 1.0 milestone Jun 14, 2021
@bwhmather bwhmather changed the title Sorting of class bodies Finalize sorting of class bodies Jun 14, 2021
@bwhmather
Copy link
Owner Author

One issue, raised by hacker new user drcongo, is the use of Meta classes in django models. These currently get mixed in with regular methods. One solution might be to special case methods rather than properties and group everything that isn't a method at the top.

@drcongo
Copy link

drcongo commented Feb 9, 2022

Hi @bwhmather - here's the order in our internal guidelines I mentioned on HN, given that this is quite Django specific, it doesn't go as far as your proposed list (ie: there's a lot of dunder methods you never see in Django models), and it also accounts for some Django specifics like @cached_property decorators...

  • Docstring
  • Inner classes
  • attributes
  • class methods
  • methods
  • defined properties (using the @Property or @cached_property decorators
  • private methods
  • __repr__ / __str__ etc. methods

Like I say, this is very Django specific, and also possibly quite specific to the way we do things, so could be to niche for ssort but also could be helpful to think about for future use cases.

edit: Oh, and private methods in that list is just the underscore prefixed methods.

@bwhmather
Copy link
Owner Author

@drcongo - I think ssort is always going to group private methods above any public methods for consistency with function sorting and the "always scroll up for dependencies, down for uses" rule. Do you group lifecycle (__init__/__new__) methods with class methods?

@bwhmather
Copy link
Owner Author

I've been thinking about this a bit more since last week, and I think there are two questions I have which it might be possible to answer by scanning a large representative corpus of python code:

  1. How common are hard references between class members? What categories do they typically fall into?
  2. Are inner classes commonly used for anything other than the django Meta or pydantic Config pattern?

@bwhmather
Copy link
Owner Author

Order proposed by @AnonymouX47 in #26:

  • __slots__
  • Class data attributes
    • Public
    • Private
    • Mangled
  • Special methods
    • Instantiation-related come first e.g __init__, __new__.
    • Others come after in alphabetical order.
  • Properties
  • Methods
  • Public methods
  • "Private" methods (Starting with single underscore)
  • Mangled-name methods (Starting with double underscore)
  • etc...

@bwhmather
Copy link
Owner Author

One thing that annoyed me today. I have a class that controls the lifecycle of a resource. It has a startup method, a request_shutdown method, and a few methods in between them that allow it to respond to events. The methods should naturally be sorted in the order in which they are called.

Unfortunately, some of the error handling paths in the event handling methods need to be able to shutdown the resource. This means that all the shutdown code needs to go at the top. I don't like this.

@AnonymouX47
Copy link

AnonymouX47 commented Feb 17, 2022

The fact is... no matter how much deliberation happens, the group order chosen will be opinionated in one way or the other.

So, I propose that a reasonable default order be chosen but beyond that, a user should be able to customize the grouping and group ordering as desired and even (maybe later) be able to include user-defined groups and change the group a statement belongs to.


So, ssort can define a default set of groups and a default group ordering for top-level statements and class members separately.

For the custom group ordering, there could be:

  • a CLI option or config option for a global setting
  • a particularly-styled comment at the top of a module (after imports) for module-specific setting
  • a particularly-styled comment at the top of a class (after the docstring) for class-specific setting

For the addition of custom groups and to change the group a statement belongs:

  • a particularly-styled comment on the same line as a statement (or its header, for definitions) or just immediately above the statement.

A statement can belong to only one group.


The particularly-styled comment can be of the form:

# ssort: <group1>, <group2>, <group3>

for custom group ordering

and

# ssort: <group>

for custom grouping.
If <group> isn't among the default, then it's added to the set of groups for that module or class.

@AnonymouX47
Copy link

My suggestion above goes beyond the scope of the topic, I simply put it all down as it came to mind... but please help reference it in a more appropriate issue, if any.

Thanks.

@drcongo
Copy link

drcongo commented Feb 17, 2022

I quite like @AnonymouX47's order, it's not dissimilar to ours and makes sense to my brain, but yeah, it's likely to be a very personal preference for most users.

@bwhmather
Copy link
Owner Author

I think we're all broadly in agreement. I'm inclined to follow the spirit of @AnonymouX47's opening paragraph but am leaning towards enabling customization by trying to make the default only enforce things which are relatively uncontroversial.

I like the idea of customized group ordering. It works well in isort, where you can also add rules for creating custom groups, but it's a little beyond what I feel confident dedicating the time to implement right now. It's also something we can do post 1.0 without breaking backwards compatibility. I think I will open a new issue with a "help wanted" tag, and possibly revisit it later.

Regarding a default grouping, there certainly seems to be no disagreement that we should have one, or on the positioning of docstrings, __slots__, class attributes, and lifecycle methods (__init__, __new__, etc). @AnonymouX47 puts all d'under methods at the top, whereas @drcongo and I would put them at the bottom. ssort currently puts private methods towards as a side-effect of topological sorting. ssort also doesn't currently differentiate between class, static and regular methods.

My current feeling is that there are really only two unforgivable problems with the current order:

  • Inner-classes get shifted randomly into the middle.
  • Topological sorting based on self references is a bit annoying in practice as it tends to mess up the groupings.

The first is bad enough that we can probably just treat it as a bug, and fix it. For the second, I think there are basically two options for how we can dial it back:

  • Abandon topological sorting based on self references entirely.
  • Apply it only to private methods.

@AnonymouX47
Copy link

AnonymouX47 commented Feb 17, 2022

@AnonymouX47 puts all d'under methods at the top, whereas @drcongo and I would put them at the bottom.

I personally do that just to keep them together with the lifecycle methods, since they're also d'under methods.

ssort currently puts private methods towards as a side-effect of topological sorting.

I think that's okay. When custom group ordering is implemented, maybe an optional "private" group could be added.

ssort also doesn't currently differentiate between class, static and regular methods.

I feel that's okay for the default... personally, I don't.
An option may also be added to enable such grouping.


I think inner classes should definitely be on top as @drcongo suggested.

As for topological sorting based on self references, I think applying it to private methods only will be a better option.


Just wanna put this out there...

I feel ssort should be a tool to help any programmer achieve what they want, not one enforcing opinionated conventions (without a valid reason) but with reasonable defaults.

*Personally, I'm not a fan of conventions lacking a valid reason.

@cgahr
Copy link
Collaborator

cgahr commented Apr 29, 2023

I thought about this as well, since ssort resorted my class in an inconsistent way I did not like.

Consider the following class:

class Config:
    def _read(self):
        return {}

    def __init__(self):
        self.reset()
        self._config = self._read()

    def _write(self):

    def reset(self):
        self._write()

    def reload(self):
        self._config = self._read()

    def __getitem__(self, name):
        return self._config[name]

    def __setitem__(self, name, value):
        self._config[name] = value
        self._write()

    def __str__(self):
        return ""

The above sorting is the one proposed by ssort.

My main issue is that

  • _read is before __init__
  • _write is not before __init__
  • while __init__ uses both (with reset being the step in between).

Solution

Assuming we have a class (or a module) without recursion, I propose the following approach:

Step 1: build the dependency graph

For my example it looks as follows:

class Config
├── def __init__: ...
│   ├── def _read: ...
│   └── def reset: ...
│       └── def _write: ...
├── def reload: ...
│   └── def _write: ...
├── def __getitem__: ...
├── def __setitem__: ...
│   └── def _write: ...
└── def __str__: ...

The dependency graph forms a acyclic tree. We can use this property to sort it.

Step 2: Sort all childs of the root

The root of the graph is the Config class. It's children are:

  • __init__
  • __getitem__
  • __setitem__
  • __str__
  • reload
    We sort these methods. The sort order could be lexiographic or a any partial ordering on the methods.

One choice could be lexiographic but __new__ before __init__ and __str__ at the end. Using this ordering we get:

def __init__: ...
├── def _read: ...
└── def reset: ...
    └── def _write: ...
def __getitem__: ...
def __setitem__: ...
└── def _write: ...
def reload: ...
└── def _write: ...
def __str__: ...

The relative position of these methods is now fixed.

Step 3: Repeat step 2 with new roots.

Next we remove the root from the dependency graph, consider each of its childs as new roots and repeat step 2 for each tree in the forest. Recall that all trees in the forest are already sorted!

Again, we add each roots child before the root and sort lexigraphically (inside of each tree):

def _read: ...
def reset: ...
└── def _write: ...
def __init__: ...
def __getitem__: ...
def _write: ...
def __setitem__: ...
def _write: ...
def reload: ...
def __str__: ...

Step 4+: We repeat this procedure recursively until there are no trees left:

def _read: ...
def _write: ...
def reset: ...
def __init__: ...
def __getitem__: ...
def _write: ...
def __setitem__: ...
def _write: ...
def reload: ...
def __str__: ...

Finally: remove duplicates

As a last step, we remove all methods the appear multiple times, starting from the bottom and we get the following order:

def _read: ...
def _write: ...
def reset: ...
def __init__: ...
def __getitem__: ...
def __setitem__: ...
def reload: ...
def __str__: ...

This way of sorting would be, IMO, the most logical way of sorting functions or methods in a class.

TL, DR (now for modules, not classes)

  1. build the dependency tree between all functions
  2. root functions are functions that no other function depends on.
  3. we sort the root functions (given some order)
  4. we remove the root functions from the dependency tree
  5. repeat step 2. to 4. until the dependency tree is empty
  6. remove all duplicates starting from the bottom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants