-
Notifications
You must be signed in to change notification settings - Fork 481
TE Gemma tutorial attempt#2 #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
TE Gemma tutorial attempt#2 #1839
Conversation
03729bc
to
2a514cf
Compare
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…mma_tutorial_base
2a514cf
to
4757bfa
Compare
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
5d7538e
to
93960fd
Compare
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
588fcd6
to
6cd3c1a
Compare
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…ransformerEngine into te_gemma_tutorial_base_test
for more information, see https://pre-commit.ci
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…mma_tutorial_base Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
…gine into te_gemma_tutorial_base
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…mma_tutorial_base
016ff52
to
502fd1e
Compare
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
40b596a
to
c15e2cc
Compare
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
c263819
to
65778f1
Compare
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
This tutorial is not visible in the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was focusing more on documentation. The first part with HF finetuning is ok (I have some small suggestions, but if we merge it without any change I think it would be ok).
For the second part - I generally agree with high-level concepts, but it needs to be polished. So I think it will be good if you read it once again and look for minor issues and then I will have a look once again. I left some comments with areas in which something is missing or they left after some removed parts.
I think also that if we present something as an example we should be sure that it works correctly - so maybe consider copying the code from all the cells into some test file and add it to the CI pipeline. But this may be part of other PR, this is big enough.
I was not looking deeply into the code - it seems that some code is left commented and I am not sure if this is final form. If you want me to look into that, let me know.
I think this is a good moment to think more about the future of TE docs. We were adding many unrelated tutorials/examples lately (attention, export to onnx, offloading in near future) and our docs seem quite unstructured at this point. We have section "tutorials and examples" when there is code explaining some fp8 recipes and these tutorials. Maybe it is worth to split it into tutorial part and example part. In tutorial path we can have only descriptions of the most important features and in examples composition of these things like gemma/llama finetuning/generation.
We lack good descriptions of many good features we have inside - blockwise scaling is not in the docs, moe support, context parallelism. Maybe some refactoring and adding more structure and adding descriptions of these things is good idea for the future. @ptrendx
docs/examples/te_gemma/tutorial_accelerate_hf_gemma_finetuning_with_te.ipynb
Show resolved
Hide resolved
docs/examples/te_gemma/tutorial_accelerate_hf_gemma_finetuning_with_te.ipynb
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is nice in current form - simple extension of llama tutorial.
Having llama and gemma tutorial next to each other seems to be quite weird, but I do not see better solution and we should leave it as it is imo.
"source": [ | ||
"\n", | ||
"<figure align=\"center\">\n", | ||
"<img src=\"./media/plot.svg\">\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now think this picture is an overkill. I'll plan to remove it
@@ -266,6 +271,11 @@ def pre_step( | |||
for k, v in self.sequences.items(): | |||
self.sequences_pre_step[k] = v - step_dict[k] | |||
|
|||
pre_step_seqlens = torch.Tensor(list(self.sequences_pre_step.values())).to( | |||
dtype=torch.int32, device="cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is on CPU? I haven't tried to deeply understand what's going on here yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a self.pre_step_seqlens
variable in InferenceParams which is on GPU and this temp variable is used to populate it. But I see the confusion - this call is redundant as torch.Tensor is created on CPU by default. I'll remove it
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Description
Adds a tutorial to showcase how to:
TransformerEngine
layer in place of HuggingFace'sGemmaDecoderLayer
in Gemma models.fp8_model_init
to optimize generation timesAttempt#1 @ #829
Type of change