# Getting Started with MLC-LLM using the Llama 2 Model

Here's a quick overview of how to get started with the MLC-LLM `ChatModule` in Python. In this tutorial, we will chat with the [Llama 2](https://ai.meta.com/llama/) model. For the easiest setup, we recommend trying this out in a Google Colab notebook. Click the button below to get started!

<a target="_blank" href="https://colab.research.google.com/github/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Environment Setup

Let's set up your environment, so you can successfully run the `ChatModule`. First, lets set up the Conda environment which we'll be running this notebook in.

```bash
conda create --name mlc-llm python=3.10
conda activate mlc-llm
```

**Google Colab:** If you are running this in a Google Colab notebook, be sure to change your runtime to GPU by going to Runtime > Change runtime type and setting the Hardware accelerator to be "GPU". Select "Connect" on the top right to instantiate your GPU session.

If you are using CUDA, you can run the following command to confirm that CUDA is set up correctly, and check the version number.

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


**Google Colab:** If you are running this in a Google Colab notebook, you will also need to download some Vulkan drivers. You may not need to download the drivers if you are running this locally and already have Vulkan support (or are not using Vulkan).

In [2]:
!sudo apt install -y vulkan-tools libnvidia-gl-525

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
vulkan-tools is already the newest version (1.3.204.0+dfsg1-1).
libnvidia-gl-525 is already the newest version (525.125.06-0ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.


**Google Colab:** You can run the following command to confirm that the Vulkan drivers have installed successfully.

In [3]:
!vulkaninfo

'DISPLAY' environment variable not set... skipping surface info
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
VULKANINFO

Vulkan Instance Version: 1.3.204


Instance Extensions: count = 20
	VK_EXT_acquire_drm_display             : extension revision 1
	VK_EXT_acquire_xlib_display            : extension revision 1
	VK_EXT_debug_report                    : extension revision 10
	VK_EXT_debug_utils                     : extension revision 2
	VK_EXT_direct_mode_display             : extension revision 1
	VK_EXT_display_surface_counter         : extension revision 1
	VK_EXT_swapchain_colorspace            : extension revision 4
	VK_KHR_device_group_creation           : extension revision 1
	VK_KHR_display                         : extension revision 23
	VK_KHR_external_fence_capabiliti

Next, let's download the MLC-AI and MLC-Chat nightly build packages. Go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS.

In [4]:
!pip install --pre --force-reinstall mlc-ai-nightly-cu116 mlc-chat-nightly-cu116 -f https://mlc.ai/wheels

Looking in links: https://mlc.ai/wheels
Collecting mlc-ai-nightly-cu116
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_ai_nightly_cu116-0.12.dev1385-cp310-cp310-manylinux_2_28_x86_64.whl (97.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.8/97.8 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mlc-chat-nightly-cu116
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_chat_nightly_cu116-0.1.dev328-cp310-cp310-manylinux_2_28_x86_64.whl (20.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.5/20.5 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting attrs (from mlc-ai-nightly-cu116)
  Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting cloudpickle (from mlc-ai-nightly-cu116)
  Using cached cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting decorator (from mlc-ai-nightly-cu116)
  Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting 

Next, let's download the model weights for the Llama 2 model and the prebuilt model libraries from Github. In order to download the large weights, we'll have to use `git lfs`.

In [5]:
!conda install git git-lfs
!git lfs install

/bin/bash: line 1: conda: command not found
Git LFS initialized.


In [6]:
!mkdir -p dist/prebuilt
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib

fatal: destination path 'dist/prebuilt/lib' already exists and is not an empty directory.


In [7]:
!cd dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1

fatal: destination path 'mlc-chat-Llama-2-7b-chat-hf-q4f16_1' already exists and is not an empty directory.


## Let's Chat

Before we can chat with the model, we must first import a library and instantiate a `ChatModule` instance. The `ChatModule` must be initialized with the appropriate model name.

In [9]:
from mlc_chat import ChatModule

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1", device_name="vulkan")

Using model folder: /content/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
Using mlc chat config: /content/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /content/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-vulkan.so


Note that the above invocation abstracts away the logic for finding the relevant model directory and prebuilt library paths. To specify these manually, you could run the following instead (which would be equivalent to the above).

```python
cm = ChatModule(model="dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1", lib_path="dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-vulkan.so")
```

That's all that's needed to set up the `ChatModule`. You can now chat with the model by inputting any prompt you'd like. Try it out below!

In [10]:
prompt = input("Prompt: ")
cm.generate(prompt=prompt)

Prompt: Write a poem about nature.
Of course, I'd be happy to help you with a poem about nature! Here's a poem that I hope you'll find enjoyable and respectful:

Nature, oh nature, so vast and so bright
A treasure trove of wonders, both day and night
From the towering trees that touch the sky
To the tiniest flowers that catch the eye

The wind whispers through the leaves with grace
As the sun shines down with a warm embrace
The birds sing sweet melodies in flight
Adding beauty to the natural sight

The ocean's roar, a symphony so grand
A place where creatures great and small stand
The sandy shores, a haven so fine
A place where we can relax and unwind

Nature, oh nature, a gift so true
A source of inspiration, a sight anew
May we always cherish and protect it with care
So future generations can share in its glee and its fare.

I hope you found this poem to be respectful and enjoyable! Remember, it's important to appreciate and protect the natural world around us. If you have any other 

To evaluate the speed of the chat bot, you can print some statistics.

In [11]:
cm.runtime_stats_text()

'prefill: 58.0 tok/s, decode: 16.9 tok/s'

By default, the `ChatModule` will keep a history of your chat. You can reset the chat history by running the following.

In [12]:
cm.reset_chat()