diff --git a/models/Gemma/README.md b/models/Gemma/README.md index d2650c0a4..fdd78b46d 100644 --- a/models/Gemma/README.md +++ b/models/Gemma/README.md @@ -1,7 +1,7 @@ # Gemma [Gemma](https://ai.google.dev/gemma/docs) is a family of decoder-only, text-to-text large language models for English language, built from the same research and technology used to create the [Gemini models](https://blog.google/technology/ai/google-gemini-ai/). Gemma models have open weights and offer pre-trained variants and instruction-tuned variants. These models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your own cloud infrastructure, democratizing access to state-of-the-art AI models and helping foster innovation for everyone. -For more details, refer the the [Gemma model card](https://ai.google.com/gemma/docs/model_card) released by Google. +For more details, refer the the [Gemma model card](https://ai.google.dev/gemma/docs/model_card) released by Google. ## Customizing Gemma with NeMo Framework @@ -53,7 +53,11 @@ docker pull nvcr.io/nvidia/nemo:24.01.gemma The best way to run this notebook is from within the container. You can do that by launching the container with the following command ```bash -docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma +docker run -it --rm --gpus all --ipc host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma ``` -Then, from within the container, start the jupyter server with \ No newline at end of file +Then, from within the container, start the jupyter server with + +```bash +jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0 +``` \ No newline at end of file diff --git a/models/Gemma/lora.ipynb b/models/Gemma/lora.ipynb index a8509185a..a88c98753 100644 --- a/models/Gemma/lora.ipynb +++ b/models/Gemma/lora.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", + "[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", "\n", "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [companion notebook](./sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which Gemma model you use. For the 7 billion parameter variant of Gemma, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do. \n", "\n", @@ -74,10 +74,14 @@ "The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n", "\n", "```bash\n", - "docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", + "docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", "```\n", "\n", "Then, from within the container, start the jupyter server with\n", + "\n", + "```bash\n", + "jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n", + "```\n", "\n" ] }, diff --git a/models/Gemma/sft.ipynb b/models/Gemma/sft.ipynb index f5e63357e..a78fb64a7 100644 --- a/models/Gemma/sft.ipynb +++ b/models/Gemma/sft.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", + "[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", "\n", "In this tutorial, we'll go over a specific kind of customization -- full parameter supervised fine-tuning for instruction following (also known as SFT). To learn how to perform Low-rank adapter (LoRA) tuning to follow a specific output format, see the [companion notebook](./lora.ipynb). For LoRA, we'll show how you can kick off a multi-GPU training job with an example script so that you can train on 8 GPUs. The exact number of GPUs needed will depend on which model you use and what kind of GPUs you use, but we recommend using 8 A100-80GB GPUs.\n", "\n", @@ -72,11 +72,14 @@ "The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n", "\n", "```bash\n", - "docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", + "docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", "```\n", "\n", "Then, from within the container, start the jupyter server with\n", - "\n" + "\n", + "```bash\n", + "jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n", + "```" ] }, {