Conversation
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
|
/te-ci L0 |
There was a problem hiding this comment.
Maybe add a field in the recipe that has this exepected tensor class? Also, I guess for completeness we should not only deal with QuantizedTensor, but also with the *Base (e.g. Float8TensorBase) classes (I know that they are not used for weights in fp8_model_init).
There was a problem hiding this comment.
Maybe add a field in the recipe that has this exepected tensor class?
That is what I also like more. I did not do so due to circular import.
Now, I fixed it with local import, if this is acceptable.
But probably, we need some redesign to better fix this circular import issue (in another PR).
In the future, I would also propose to add a field in the recipe with quantizer class etc.
There was a problem hiding this comment.
Also, I guess for completeness we should not only deal with QuantizedTensor, but also with the *Base (e.g. Float8TensorBase) classes (I know that they are not used for weights in fp8_model_init).
Fixed
There was a problem hiding this comment.
@timmoon10 I vaguealy recall that getattr from nn.Module was slow and that you created a faster function for it at some point, do you remember the details?
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
ecb8b96 to
e64188b
Compare
for more information, see https://pre-commit.ci
|
/te-ci L0 |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
|
TE1.x did not save the recipe in the state. With this fix, TE1.x checkpoint can be loaded in ToT TE2.x. (for both TE versions, used the same ToT Megatron-LM) |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
|
/te-ci L0 |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci L0 |
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
/te-ci L0 |
| _NUM_MAX_UB_STREAMS = 3 | ||
| _MIN_STREAM_PRIORITY, _MAX_STREAM_PRIORITY = None, None | ||
| layers_atomic_ring_exchange = [] | ||
| _QUANTIZED_WEIGHT_TENSOR_TYPES = ( |
There was a problem hiding this comment.
We recently merged the change to introduce QuantizedTensorBase class so you should be able to remove this.
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
|
/te-ci L0 |
| from transformer_engine.pytorch.tensor.float8_blockwise_tensor import Float8BlockwiseQTensor | ||
| from transformer_engine.pytorch.tensor.float8_tensor import Float8Tensor | ||
| from transformer_engine.pytorch.tensor.mxfp8_tensor import MXFP8Tensor | ||
| from transformer_engine.pytorch.tensor.quantized_tensor import QuantizedTensor |
There was a problem hiding this comment.
Actually you can't do it that way - this recipe class is in common, so is used in the JAX version as well. You will need to do the import and setting in the __post_init__ method as well (with try/except block).
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci L0 |
| def __post_init__(self) -> None: | ||
| assert self.fp8_format != Format.E5M2, "Pure E5M2 training is not supported." | ||
| try: | ||
| from transformer_engine.pytorch.tensor.float8_tensor import Float8Tensor |
There was a problem hiding this comment.
Not a fan of adding framework specific logic in the core lib at all, even if it is guarded with try-except. So far we haven't had to do this and probably shouldn't introduce it here unless a strong reason exists to do so.
How about we reverse the logic and instead of including tensor info in the recipe, we include some recipe info in the tensor, and that should achieve the same goal of being able to match the two together, right?
| if isinstance(tensor, QuantizedTensorBase) and not isinstance( | ||
| tensor, recipe.expected_tensor_class | ||
| ): |
There was a problem hiding this comment.
| if isinstance(tensor, QuantizedTensorBase) and not isinstance( | |
| tensor, recipe.expected_tensor_class | |
| ): | |
| if isinstance(tensor, QuantizedTensorBase) and not tensor.recipe_name() == recipe.__class__.__name__: |
There was a problem hiding this comment.
Something like this to address my above comment maybe? This would require some hardcoding, i.e. each of the tensor types would be associated with a fixed "string" as recipe class name but I think that should be ok? In my example MXFP8Tensor.recipe_name() would return MXFP8BlockScaling. We could think of better ideas.
There was a problem hiding this comment.
This is a good approach, although string-based comparison is not that robust (what if name is changed, inheritance is not naturally supported etc.).
What about the following update of your proposal:
# In each tensor class (e.g., Float8Tensor)
@classmethod
def get_compatible_recipe_types(cls):
return [DelayedScaling, Float8CurrentScaling]
and then we just check isinstance(recipe, tuple(tensor.__class__.get_compatible_recipe_types()))
There was a problem hiding this comment.
But this is still a workaround, fitted for this particular case.
Another one could be a static recipe_to_tensor_map defined in the pytorch module.
Although if there is a stronger demand for a framework-specific logic in the recipe, we might consider to define abstract base recipe classes in the common module and move the framework-specific implementations to their respective modules (class DelayedScaling(BaseDelayedScaling)). But not sure if we need that.
There was a problem hiding this comment.
The framework specific logic lives in the quantizers, the recipe itself makes sense in common.
I like the get_compatible_recipe_types idea. Some early nitpicks to help review later:
@classmethod
def get_compatible_recipes(cls):
return (DelayedScaling, Float8CurrentScaling)
and we can check isinstance(recipe, tensor.get_compatible_recipes())
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
|
/te-ci L0 |
* Check tensor-recipe compatibility Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Description
Enables recipe update on the fly. Added a user warning when this happens.
Enables TE1.x checkpoint loading.
Implemented check that is supposed to catch user-defined bug, when there is a mismatch between the recipes in
fp8_model_initandfp8_autocast.Example case to check: recipe is
DelayedScaling(DelayedScalingis set infp8_autocast()), but the weight tensor isMXFP8Tensor(MXFP8BlockScalingis set infp8_model_init()).Type of change
Changes
Please list the changes introduced in this PR:
Checklist: