Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions models/Gemma/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Gemma

[Gemma](https://ai.google.dev/gemma/docs) is a family of decoder-only, text-to-text large language models for English language, built from the same research and technology used to create the [Gemini models](https://blog.google/technology/ai/google-gemini-ai/). Gemma models have open weights and offer pre-trained variants and instruction-tuned variants. These models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your own cloud infrastructure, democratizing access to state-of-the-art AI models and helping foster innovation for everyone.
For more details, refer the the [Gemma model card](https://ai.google.com/gemma/docs/model_card) released by Google.
For more details, refer the the [Gemma model card](https://ai.google.dev/gemma/docs/model_card) released by Google.


## Customizing Gemma with NeMo Framework
Expand Down Expand Up @@ -53,7 +53,11 @@ docker pull nvcr.io/nvidia/nemo:24.01.gemma
The best way to run this notebook is from within the container. You can do that by launching the container with the following command

```bash
docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma
docker run -it --rm --gpus all --ipc host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma
```

Then, from within the container, start the jupyter server with
Then, from within the container, start the jupyter server with

```bash
jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0
```
8 changes: 6 additions & 2 deletions models/Gemma/lora.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",
"[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",
"\n",
"In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [companion notebook](./sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which Gemma model you use. For the 7 billion parameter variant of Gemma, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do. \n",
"\n",
Expand Down Expand Up @@ -74,10 +74,14 @@
"The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n",
"\n",
"```bash\n",
"docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n",
"docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n",
"```\n",
"\n",
"Then, from within the container, start the jupyter server with\n",
"\n",
"```bash\n",
"jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n",
"```\n",
"\n"
]
},
Expand Down
9 changes: 6 additions & 3 deletions models/Gemma/sft.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",
"[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",
"\n",
"In this tutorial, we'll go over a specific kind of customization -- full parameter supervised fine-tuning for instruction following (also known as SFT). To learn how to perform Low-rank adapter (LoRA) tuning to follow a specific output format, see the [companion notebook](./lora.ipynb). For LoRA, we'll show how you can kick off a multi-GPU training job with an example script so that you can train on 8 GPUs. The exact number of GPUs needed will depend on which model you use and what kind of GPUs you use, but we recommend using 8 A100-80GB GPUs.\n",
"\n",
Expand Down Expand Up @@ -72,11 +72,14 @@
"The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n",
"\n",
"```bash\n",
"docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n",
"docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n",
"```\n",
"\n",
"Then, from within the container, start the jupyter server with\n",
"\n"
"\n",
"```bash\n",
"jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n",
"```"
]
},
{
Expand Down