Fixed by checking if tile needs to be re-created#626
Conversation
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
src/aihwkit/simulator/tiles/base.py
Outdated
| need_to_recreate = not hasattr( | ||
| self, "rpu_config" | ||
| ) or "TorchInferenceRPUConfig" not in str(self.rpu_config.__class__) | ||
|
|
There was a problem hiding this comment.
In base.py it should not be tested for any specific RPUConfig that is not scalable and will introduce hard to find errors later.
src/aihwkit/simulator/tiles/base.py
Outdated
| self._create_simulator_tile(x_size, d_size, self.rpu_config) | ||
| if need_to_recreate | ||
| else self.tile | ||
| ) |
There was a problem hiding this comment.
It looks like that this is really trying to fix things at the wrong place. The tile itself should handle the need to recreate. You can just overload _create_simulator_tile for the TorchTile as a no-op if you do not want it to be created
There was a problem hiding this comment.
I think this won't work because we also use it in the __init__ of SimulatorTileWrapper.
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
Signed-off-by: Julian Buechel <jub@zurich.ibm.com>
|
@kkvtran there is an issue in the tests with python 3.8 in a part of the code that I didn't touch. |
|
I restarted the travis builds and everything built successfully now. |
maljoras
left a comment
There was a problem hiding this comment.
Would be nice if you write the test with the AIHWKIT Testcases as provided in the helpers. Than one can use the test for more than just one tile class. This would improve the coverage
| from aihwkit.simulator.configs.utils import BoundManagementType, NoiseManagementType | ||
| from aihwkit.nn.conversion import convert_to_analog | ||
|
|
||
|
|
There was a problem hiding this comment.
Could you use the tile paramteric tests in tests/helper so that one can easily use this test for all AnalogTile and not just TorchInferenceTile ? Also there is actually a test that should have triggered this, why was that test passing? See https://github.com/IBM/aihwkit/blob/master/tests/test_utils.py#L200
|
|
||
| # Recreate the tile. | ||
| self.tile = self._create_simulator_tile(x_size, d_size, self.rpu_config) | ||
| self.tile = self._recreate_simulator_tile(x_size, d_size, self.rpu_config) |
There was a problem hiding this comment.
Ok, that might be a solution, however, does the "load_rpu_config" mechanism works with that change still?
I still think that there is just some issue with this call here:
aihwkit/src/aihwkit/simulator/tiles/module.py
Line 116 in cfccd18
It might be that there is some tricky inheritance issue, maybe introduced with a new torch version.
There was a problem hiding this comment.
But, OK, probably the issue is that torch interferes with the recreation because in case of the TorchTIle the SimulatorTile is also a Module so torch messes around here as well.
https://github.com/IBM/aihwkit/blob/master/src/aihwkit/simulator/tiles/torch_tile.py#L37
The recreate solution might be OK and quite clean if everything works otherwise
Adressed. Let's merge this so it is no longer pending and add the suggsted additional test casses in a seperate PR.
Related issues
Issue #609
Description
Tile is re-created and reference between optimizer and tile is cut.
Details
For the torch tile, we don't re-create the tile unless we need to.