From b999b836d3915f638c109a33ed6740ed48f6c24b Mon Sep 17 00:00:00 2001 From: Neal Vaidya Date: Wed, 21 Feb 2024 06:19:31 -0800 Subject: [PATCH 1/2] add command to start jupyterlab --- models/Gemma/README.md | 8 ++++++-- models/Gemma/lora.ipynb | 6 +++++- models/Gemma/sft.ipynb | 7 +++++-- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/models/Gemma/README.md b/models/Gemma/README.md index d2650c0a4..c5b838c67 100644 --- a/models/Gemma/README.md +++ b/models/Gemma/README.md @@ -53,7 +53,11 @@ docker pull nvcr.io/nvidia/nemo:24.01.gemma The best way to run this notebook is from within the container. You can do that by launching the container with the following command ```bash -docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma +docker run -it --rm --gpus all --ipc host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma ``` -Then, from within the container, start the jupyter server with \ No newline at end of file +Then, from within the container, start the jupyter server with + +```bash +jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0 +``` \ No newline at end of file diff --git a/models/Gemma/lora.ipynb b/models/Gemma/lora.ipynb index a8509185a..0314b5afa 100644 --- a/models/Gemma/lora.ipynb +++ b/models/Gemma/lora.ipynb @@ -74,10 +74,14 @@ "The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n", "\n", "```bash\n", - "docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", + "docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", "```\n", "\n", "Then, from within the container, start the jupyter server with\n", + "\n", + "```bash\n", + "jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n", + "```\n", "\n" ] }, diff --git a/models/Gemma/sft.ipynb b/models/Gemma/sft.ipynb index f5e63357e..7d40ea4fc 100644 --- a/models/Gemma/sft.ipynb +++ b/models/Gemma/sft.ipynb @@ -72,11 +72,14 @@ "The best way to run this notebook is from within the container. You can do that by launching the container with the following command\n", "\n", "```bash\n", - "docker run -it --rm --gpus all --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", + "docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.gemma\n", "```\n", "\n", "Then, from within the container, start the jupyter server with\n", - "\n" + "\n", + "```bash\n", + "jupyter lab --no-browser --port=8080 --allow-root --ip 0.0.0.0\n", + "```" ] }, { From 851e5681f17eda3818ae2a32eb8701a6bf607a85 Mon Sep 17 00:00:00 2001 From: Neal Vaidya Date: Wed, 21 Feb 2024 06:24:11 -0800 Subject: [PATCH 2/2] Fix link to gemma model card --- models/Gemma/README.md | 2 +- models/Gemma/lora.ipynb | 2 +- models/Gemma/sft.ipynb | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/models/Gemma/README.md b/models/Gemma/README.md index c5b838c67..fdd78b46d 100644 --- a/models/Gemma/README.md +++ b/models/Gemma/README.md @@ -1,7 +1,7 @@ # Gemma [Gemma](https://ai.google.dev/gemma/docs) is a family of decoder-only, text-to-text large language models for English language, built from the same research and technology used to create the [Gemini models](https://blog.google/technology/ai/google-gemini-ai/). Gemma models have open weights and offer pre-trained variants and instruction-tuned variants. These models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your own cloud infrastructure, democratizing access to state-of-the-art AI models and helping foster innovation for everyone. -For more details, refer the the [Gemma model card](https://ai.google.com/gemma/docs/model_card) released by Google. +For more details, refer the the [Gemma model card](https://ai.google.dev/gemma/docs/model_card) released by Google. ## Customizing Gemma with NeMo Framework diff --git a/models/Gemma/lora.ipynb b/models/Gemma/lora.ipynb index 0314b5afa..a88c98753 100644 --- a/models/Gemma/lora.ipynb +++ b/models/Gemma/lora.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", + "[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", "\n", "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [companion notebook](./sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which Gemma model you use. For the 7 billion parameter variant of Gemma, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do. \n", "\n", diff --git a/models/Gemma/sft.ipynb b/models/Gemma/sft.ipynb index 7d40ea4fc..a78fb64a7 100644 --- a/models/Gemma/sft.ipynb +++ b/models/Gemma/sft.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Gemma](https://ai.google.com/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", + "[Gemma](https://ai.google.dev/gemma/docs/model_card) is a groundbreaking new open model in the Gemini family of models from Google. Gemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. Gemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize Gemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n", "\n", "In this tutorial, we'll go over a specific kind of customization -- full parameter supervised fine-tuning for instruction following (also known as SFT). To learn how to perform Low-rank adapter (LoRA) tuning to follow a specific output format, see the [companion notebook](./lora.ipynb). For LoRA, we'll show how you can kick off a multi-GPU training job with an example script so that you can train on 8 GPUs. The exact number of GPUs needed will depend on which model you use and what kind of GPUs you use, but we recommend using 8 A100-80GB GPUs.\n", "\n",