Skip to content

Fixed by checking if tile needs to be re-created#626

Merged
coreylammie merged 8 commits intomasterfrom
jub/fix-609
Apr 24, 2024
Merged

Fixed by checking if tile needs to be re-created#626
coreylammie merged 8 commits intomasterfrom
jub/fix-609

Conversation

@jubueche
Copy link
Collaborator

Related issues

Issue #609

Description

Tile is re-created and reference between optimizer and tile is cut.

Details

For the torch tile, we don't re-create the tile unless we need to.

@jubueche jubueche requested a review from maljoras March 11, 2024 13:55
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
need_to_recreate = not hasattr(
self, "rpu_config"
) or "TorchInferenceRPUConfig" not in str(self.rpu_config.__class__)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In base.py it should not be tested for any specific RPUConfig that is not scalable and will introduce hard to find errors later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that.

self._create_simulator_tile(x_size, d_size, self.rpu_config)
if need_to_recreate
else self.tile
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that this is really trying to fix things at the wrong place. The tile itself should handle the need to recreate. You can just overload _create_simulator_tile for the TorchTile as a no-op if you do not want it to be created

Copy link
Collaborator Author

@jubueche jubueche Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't work because we also use it in the __init__ of SimulatorTileWrapper.

Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
@jubueche
Copy link
Collaborator Author

@maljoras @kkvtran there seem to be some rounding issues in the tests.

@jubueche
Copy link
Collaborator Author

@kkvtran there is an issue in the tests with python 3.8 in a part of the code that I didn't touch.

@kkvtran
Copy link
Contributor

kkvtran commented Mar 28, 2024

I restarted the travis builds and everything built successfully now.

Copy link
Collaborator

@maljoras maljoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if you write the test with the AIHWKIT Testcases as provided in the helpers. Than one can use the test for more than just one tile class. This would improve the coverage

from aihwkit.simulator.configs.utils import BoundManagementType, NoiseManagementType
from aihwkit.nn.conversion import convert_to_analog


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the tile paramteric tests in tests/helper so that one can easily use this test for all AnalogTile and not just TorchInferenceTile ? Also there is actually a test that should have triggered this, why was that test passing? See https://github.com/IBM/aihwkit/blob/master/tests/test_utils.py#L200


# Recreate the tile.
self.tile = self._create_simulator_tile(x_size, d_size, self.rpu_config)
self.tile = self._recreate_simulator_tile(x_size, d_size, self.rpu_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that might be a solution, however, does the "load_rpu_config" mechanism works with that change still?

I still think that there is just some issue with this call here:

super(Module, self).__setstate__(state)

It might be that there is some tricky inheritance issue, maybe introduced with a new torch version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, OK, probably the issue is that torch interferes with the recreation because in case of the TorchTIle the SimulatorTile is also a Module so torch messes around here as well.
https://github.com/IBM/aihwkit/blob/master/src/aihwkit/simulator/tiles/torch_tile.py#L37
The recreate solution might be OK and quite clean if everything works otherwise

@coreylammie coreylammie dismissed maljoras’s stale review April 24, 2024 13:57

Adressed. Let's merge this so it is no longer pending and add the suggsted additional test casses in a seperate PR.

@coreylammie coreylammie merged commit 0a28281 into master Apr 24, 2024
@coreylammie coreylammie deleted the jub/fix-609 branch April 24, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants