-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make groups, irreps, and gspaces into stateless, picklable singletons #78
base: master
Are you sure you want to change the base?
Conversation
Hi @kalekundert This is amazing, thanks a lot for all this precious work!! 😄 If you don't mind, this will take me a bit of time to really process :/ In principle, I fully support the singleton strategy! Let me reply to a couple of points in the meantime:
This method should indeed not be expected to have a "deterministic" behaviour.
A function of this kind should internally ensure all irreps up to frequency 3 have been instantiated so far. I agree that this situation is a bit tricky, though. Indeed, as you pointed out, some unittests might fail sometimes depending on their execution order (this is because
Unfortunately, neither of these options are possible.
The user should mostly rely on the second methods for deterministic behaviour, but this requires a lot of ad-hoc code for each group. This is also why I preferred preserving the
If we deprecate
I didn't do that for groups and representations since it would make the already verbose representation worse.
Does this break the singleton strategy? Two identical gspaces might be generated 1) manually by setting a particular subgroup id and 2) via a factory function. In these two cases, they will have a different representation (so I suppose also a different private attribute).
Amazing, thaaaanks! Again, this is quite some changes to process and I need to get more familiar with the metaclasses before properly commenting on this PR, sorry. However, I will try to keep this discussion going (as I get more familiar with your code) since I think this is a really important issue in the library. I really appreciate your PR, it will be very useful! 🎉 Best, |
No worries, I definitely understand. Take your time, and let me know if you have any questions about how anything works.
Would it be possible to limit irreps based on their size? That is, generate all irreps that are 4x4 or smaller (in arbitrary order)? And if this is possible, would it be useful? Would the functions calling
If we deprecate That said, I assume that the
In case it's useful, here are all the tests I found that seem to depend on the state of the
I only looked at the last one in detail. The error ultimately comes from
The singleton strategy treats names as a special case. They aren't really part of the object state, since they don't affect how the object behaves, but they're useful for debugging and therefore worth keeping around. So what happens is that the names are ignored when (i) deciding whether to create a new instance or reuse an existing one, (ii) comparing two objects, and (iii) hashing an object. But, they're pickled and unpickled. Going back to your example with two identical gspaces created in different ways, the name would comes from whichever was created first (because that's the only object that will ever be created). I can see that being confusing in some situations, but I think it's made up for by the extra clarity in the most common situations. It's also worth noting that groups already have names (and in fact have this same issue where you can get different names depending on whether or not you use a factory, e.g. the Klein 4 group), so I had to come up with some way of handling names anyways. Once I did that, it was natural to apply the same logic to gspaces. Strictly speaking, it would be more principled to remove the name arguments completely (since they're not state), and to let each group/gspace subclass figure out what its name should be from the information it has. For some of the groups with factory functions, this would look something like:
The biggest downside to this approach is that it would make it impossible for users to give names to novel groups/gspaces that they construct. But that's just another way of saying that these classes are immutable, which is what they should be. I initially dismissed this approach for being too limiting, but now that I think about it more, I actually think it's better than what I did. Let me know if you have any objections, otherwise I'll probably switch to something more like this.
The issue for me comes from composite objects like For what it's worth, here's what the python docs have to say about the
Reading this now, I realize that I probably should've used the angle-bracket convention instead of my ad-hoc square-bracket convention. I do think it's best to include type names in repr strings, like the docs recommend, but I also don't care too strongly one way or the other, so I'm happy to go back to the old reprs if you prefer them. After all, you'll probably have to read these reprs more often than me. 😉 |
This also wouldn't work :( Think about SO(2): all the irreps are 2 dimensional. I am not aware of any simple and general way to generate a finite set of irreps procedurally for any group unfortunately...
This is the main reason why I want to keep the maximum_frequency keyword to initialize the cache. Because the tensor product decomposition is a general code which is agnostic of the group, it can only access the list of irreps cached so far. I thought about this for some time in the past, but I couldn't come up with a better strategy than the current one: the user must pre-cache sufficiently many irreps and then the backend of the library access the I hope this clarifies why I care about this maximum_frequency argument.
I think this decomposition of tensor products is essentially the only place in the Maybe a good solution is considering groups as singleton as you recommend and the irreps cache as a separate entity.
This solution sounds good for me!
I'd avoid this solution, since the same abstract group could have different names.
Do you think this can be more easily fixed within the repr of the R3Conv class?
This is also a really good point actually. I will think a bit more about this Thanks for the helpful discussion! |
Ok, I know you've told me twice now that there's no way to generate the necessary finite sets of irreps for arbitrary groups, so I apologize if I'm beating a dead horse, but I want to revisit the infinite generator idea. I think the reason I'm so stuck on trying to find a way to generate irreps on the fly is that (i) it seems like the most conceptually "right" thing to do and (ii) everything works if the group is initialized with enough irreps, so it seems like it should be possible (in the worst case) to find some way of continually re-initializing the group until there are enough irreps. It's clear how an infinite irrep generator would be implemented for finite groups (just yield all the irreps) and rotational groups (yield the irreps in frequency order). The only other groups in the codebase, I believe, are the direct product and double groups. The double group is basically just a direct product between a group and itself, so the direct product case is the only one we need to consider. Right now, the direct product group considers all pairs of cached irreps from the two groups being combined. For rotational groups, this means it considers all the irreps below the frequency threshold that was used to populate the cache. We can write an infinite generator that similarly considers all pairs below any given frequency before any pairs above that frequency. The trick is to alternate between the two groups every time a new irrep is needed: from itertools import cycle
def direct_product_irreps(g1, g2):
irrep_iterators = [
g1.yield_all_irreps(),
g2.yield_all_irreps(),
]
irreps = [[], []]
for i in cycle([0, 1]):
irrep_i = next(irrep_iterators[i])
irreps[i].append(irrep_i)
for irrep_j in irreps[(i + 1) % 2]:
yield (irrep_i, irrep_j)[::1 if i == 0 else -1]
# Some extra logic would be needed to terminate in the case of finite
# groups, but that wouldn't change anything conceptually. This would start by yielding all the same irreps as if the groups were both initialized with
Is there something I'm overlooking? I think this would effectively behave in the same way as the code already does, but without requiring any externally observable state. In fact, it might even result in fewer Clebsh-Gordon coefficients needing to be calculated, since it goes in order from low to high frequencies. (Although I don't know if those calculations have any meaningful effect on runtime. Probably not, because they only need to be calculated once.)
In order for this to work, the group wouldn't be able to provide access to the cache, because doing so would effectively make the group mutable. The cache itself also couldn't be a global object, because the whole point is that global objects should never be mutable. So the cache would either have to be a wrapper around the group, or provided alongside the group. The former would probably be easier to implement, especially if the cache wrapper were to implement the whole group interface. I think the end result would actually be pretty similar (most of the time) to the way this branch behaves right now, where the maximum frequency is just considered part of the group's state. This might not be as bad of a solution as I thought earlier. What it really means is that if you instantiate (for example) O2 twice with different maximum frequencies, you'll get two different objects that each have their own separate caches. There's probably some potential for confusion if you try to have two different O2 instances interact with each other (because the fact that they're different might not be obvious), but I think the most common thing anyways is just to create a single gspace with a single group and to use it for everything. I do think that if we take this approach, we'd still have to modify the
Ok, I'll leave it as it is.
Well, it was pretty easy to fix by changing the gspace/representation/irrep reprs (I didn't actually change the group repr, since I never has any trouble recognizing group names), so I wouldn't say that it would be easier to fix by changing |
My reason for making this PR is that I wanted to be able to pickle escnn modules for the purpose of making checkpoints during training. Before I started, I noticed #37, which has the same goal. However, it hasn't been merged because @Gabri95 thought it would be better to take advantage of the fact that group, irrep, and gspace instances don't really have state. The problems come from trying to pickle complicated internal caches maintained by those objects, but in principle there should be no need to even try do that.
I really agree with this approach, so that's what I tried to implement here. A big part of my implementation is a general-purpose singleton class. Groups are already singletons, but in a way that relies on (i) each subclass properly implementing the
_keys()
and_generator()
methods and (ii) everyone using factory functions instead of constructors. I simplified this by moving all the singleton logic into a metaclass. Metaclasses can control every aspect of the object instantiation process (here's a good intro if you're not familiar), and this one intercepts the arguments to the constructor and uses them to decide whether to instantiate a new object or return an existing one. It also uses this information to pickle, compare, and hash the objects in question.Irreps are also already singletons, although they're simpler because they have a single natural factory—the
Group.irrep()
method—and they don't have any subclasses. In this case, I don't think my new singleton metaclass would be much of an improvement. It wouldn't really simplify anything, as groups would still need to keep track of their own irreps, so I just added the necessary pickling code and kept everything else mostly as is. I did move the singleton logic out of the various group subclasses and into the base class, to reduce code duplication.GSpaces are not currently singletons, but it would make sense for them to be, so I applied my singleton metaclass. That was the only change I really had to make.
Even after doing all of the above, it was still not possible to pickle
R3Conv
. The issue was thatFieldType
references aRepresentation
, and representations need to have a function that they can use to turn group elements into matrices. The default function is a direct sum of irreps, possibly with a change-of-basis. I made this function pickleable by usingfunctools.partial()
, although another option would've been to turn it into a class (as in #37).Important unresolved issue!
This PR is not ready to merge yet, because I ran into one conceptual issue that I'm not sure how to resolve. The issue relates to (i) the
maximum_frequency
argument to all the orthogonal groups (O2, SO2, O3, SO3), (ii) theGroup.irreps()
method, and (iii) the fact that singleton objects should be immutable.When you instantiate (for example) the O2 group, you can specify a maximum frequency. This determines how many irreps are pre-calculated (although you can always get more by calling
irrep()
with the necessary arguments). If you later try to instantiate the O2 group again, because groups are singletons, you'll always get the same original object. But if you asked for more frequencies than before, that object will be updated with more irreps.This is a problem because of the
irreps()
method, which simply returns all of the irreps that have been requested up to that point. Since this can change, either as more objects are instantiated with higher maximum frequencies, or as theirrep()
method is called, these groups are not immutable. This leads to fragile, long-range dependencies in the code. For example:This code works, because the first instantiation of the O2 group puts the second in the right state to succeed. But if the first line is changed, the second will mysteriously stop working. It turns out that this exact situation happens a bunch of times in the unit tests, and probably happens in real-life code as well. The potential for this kind of "spooky action at a distance" is why singleton objects (or any form of global state) should be immutable.
As I see it, the problem is the
irreps()
method. It shouldn't reveal the number of cached irreps. That means it either needs to accept an argument specifying how many irreps to return (i.e. a dimension or something like that), or it needs to be a generator that can literally yield an infinite number of irreps. I'm not sure which approach is best, and both seem like they could be very disruptive. So I want to get your thoughts on the matter before doing anything.For what it's worth, this branch currently treats the maximum frequency parameter as part of the group's "state". I made that decision back when I didn't understand the code as well, and I've come to realize that it's basically the worst of both worlds. Conceptually, it's wrong because it's the same group no matter how many frequencies you look at. Practically, it doesn't change the fact that
irreps()
is mutable, e.g. ifirrep()
is called.Minor changes that aren't directly related to the main goal of the PR:
I added the class name to the
__repr__()
for a number of classes, because without that information I was having a hard time debugging things. I tried to stick to the following convention:ClassName(*args, **kwargs)
for reprs that would actually reconstruct the object if copy-and-pasted into a python interpreter, andClassName[extra info]
for reprs that have more free-form formatting. I think this is a substantial improvement, but I can revert it if you prefer things the way they were.I gave the GSpaces names based on the factory function that was used to create them.
I reimplemented some hash functions. Now that groups, irreps, and gspaces are all hashable, they can easily be incorporated into hashes as necessary.
I refactored some test code, mostly to remove code duplication, and I fixed some failing tests.
I copied the Python
.gitignore
file from github/gitignore into the project. It recognizes a bunch of files that tend to pop up in python projects.