# Getting Started with MLC-LLM using the Llama 2 Model

Here's a quick overview of how to get started with the MLC-LLM `ChatModule` in Python. In this tutorial, we will chat with the [Llama 2](https://ai.meta.com/llama/) model. For the easiest setup, we recommend trying this out in a Google Colab notebook. Click the button below to get started!

<a target="_blank" href="https://colab.research.google.com/github/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Environment Setup

Let's set up your environment, so you can successfully run the `ChatModule`. First, lets set up the Conda environment which we'll be running this notebook in.

```bash
conda create --name mlc-llm python=3.10
conda activate mlc-llm
```

**Google Colab:** If you are running this in a Google Colab notebook, be sure to change your runtime to GPU by going to Runtime > Change runtime type and setting the Hardware accelerator to be "GPU". Select "Connect" on the top right to instantiate your GPU session.

If you are using CUDA, you can run the following command to confirm that CUDA is set up correctly, and check the version number.

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


**Google Colab:** If you are running this in a Google Colab notebook, you will also need to download some Vulkan drivers. You may not need to download the drivers if you are running this locally and already have Vulkan support (or are not using Vulkan).

In [2]:
!sudo apt install -y vulkan-tools libnvidia-gl-525

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libnvidia-common-525 libvulkan1 mesa-vulkan-drivers
The following NEW packages will be installed:
  libnvidia-common-525 libnvidia-gl-525 libvulkan1 mesa-vulkan-drivers
  vulkan-tools
0 upgraded, 5 newly installed, 0 to remove and 16 not upgraded.
Need to get 196 MB of archives.
After this operation, 497 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libvulkan1 amd64 1.3.204.1-2 [128 kB]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  libnvidia-common-525 525.125.06-0ubuntu1 [10.5 kB]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  libnvidia-gl-525 525.125.06-0ubuntu1 [186 MB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 mesa-vulkan-drivers amd64 23.0.4-0ubuntu1~22.04.1 [9,521 kB]
Get:5 http://archive

**Google Colab:** You can run the following command to confirm that the Vulkan drivers have installed successfully.

In [3]:
!vulkaninfo

'DISPLAY' environment variable not set... skipping surface info
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
VULKANINFO

Vulkan Instance Version: 1.3.204


Instance Extensions: count = 20
	VK_EXT_acquire_drm_display             : extension revision 1
	VK_EXT_acquire_xlib_display            : extension revision 1
	VK_EXT_debug_report                    : extension revision 10
	VK_EXT_debug_utils                     : extension revision 2
	VK_EXT_direct_mode_display             : extension revision 1
	VK_EXT_display_surface_counter         : extension revision 1
	VK_EXT_swapchain_colorspace            : extension revision 4
	VK_KHR_device_group_creation           : extension revision 1
	VK_KHR_display                         : extension revision 23
	VK_KHR_external_fence_capabiliti

Next, let's download the MLC-AI and MLC-Chat nightly build packages. Go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS.

In [4]:
!pip install --pre --force-reinstall mlc-ai-nightly-cu116 mlc-chat-nightly-cu116 -f https://mlc.ai/wheels

Looking in links: https://mlc.ai/wheels
Collecting mlc-ai-nightly-cu116
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_ai_nightly_cu116-0.12.dev1385-cp310-cp310-manylinux_2_28_x86_64.whl (97.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.8/97.8 MB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mlc-chat-nightly-cu116
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_chat_nightly_cu116-0.1.dev328-cp310-cp310-manylinux_2_28_x86_64.whl (20.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.5/20.5 MB[0m [31m71.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting attrs (from mlc-ai-nightly-cu116)
  Downloading attrs-23.1.0-py3-none-any.whl (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.2/61.2 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cloudpickle (from mlc-ai-nightly-cu116)
  Downloading cloudpickle-2.2.1-py3-none-any.whl (2

Next, we can clone the [MLC-LLM project](https://github.com/mlc-ai/mlc-llm).

In [5]:
!git clone https://github.com/mlc-ai/mlc-llm.git
!cd mlc-llm && git submodule update --init --recursive

Cloning into 'mlc-llm'...
remote: Enumerating objects: 6045, done.[K
remote: Counting objects: 100% (953/953), done.[K
remote: Compressing objects: 100% (265/265), done.[K
remote: Total 6045 (delta 722), reused 810 (delta 682), pack-reused 5092[K
Receiving objects: 100% (6045/6045), 20.38 MiB | 13.37 MiB/s, done.
Resolving deltas: 100% (3789/3789), done.
Submodule '3rdparty/argparse' (https://github.com/p-ranav/argparse) registered for path '3rdparty/argparse'
Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) registered for path '3rdparty/googletest'
Submodule '3rdparty/tokenizers-cpp' (https://github.com/mlc-ai/tokenizers-cpp) registered for path '3rdparty/tokenizers-cpp'
Submodule '3rdparty/tvm' (https://github.com/mlc-ai/relax.git) registered for path '3rdparty/tvm'
Cloning into '/content/mlc-llm/3rdparty/argparse'...
Cloning into '/content/mlc-llm/3rdparty/googletest'...
Cloning into '/content/mlc-llm/3rdparty/tokenizers-cpp'...
Cloning into '/content/m

Next, let's download the model weights for the Llama 2 model and the prebuilt model libraries from Github. In order to download the large weights, we'll have to use `git lfs`.

In [6]:
!conda install git git-lfs
!git lfs install

/bin/bash: line 1: conda: command not found
Git LFS initialized.


In [7]:
!mkdir -p mlc-llm/dist/prebuilt
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git mlc-llm/dist/prebuilt/lib

Cloning into 'mlc-llm/dist/prebuilt/lib'...
remote: Enumerating objects: 202, done.[K
remote: Counting objects: 100% (67/67), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 202 (delta 45), reused 44 (delta 24), pack-reused 135[K
Receiving objects: 100% (202/202), 48.36 MiB | 16.88 MiB/s, done.
Resolving deltas: 100% (138/138), done.
Updating files: 100% (51/51), done.


In [8]:
!cd mlc-llm/dist/prebuilt && git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1

Cloning into 'mlc-chat-Llama-2-7b-chat-hf-q4f16_1'...
remote: Enumerating objects: 126, done.[K
remote: Counting objects:  33% (1/3)[Kremote: Counting objects:  66% (2/3)[Kremote: Counting objects: 100% (3/3)[Kremote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 126 (delta 0), reused 3 (delta 0), pack-reused 123[K
Receiving objects: 100% (126/126), 497.08 KiB | 9.75 MiB/s, done.
Filtering content: 100% (116/116), 3.53 GiB | 82.94 MiB/s, done.


## Let's Chat

Before we can chat with the model, we must first import a few libraries and instantiate a `ChatModule` instance.

In [9]:
%cd mlc-llm

/content/mlc-llm


In [10]:
from mlc_chat import ChatModule
import tvm

from IPython.display import clear_output

We must invoke the `ChatModule` with the appropriate device type, such as `vulkan`, `cuda`, etc.

In [11]:
cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1", device_name="vulkan")

Using model folder: /content/mlc-llm/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
Using mlc chat config: /content/mlc-llm/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /content/mlc-llm/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-vulkan.so


That's all that's needed to set up the `ChatModule`. You can now chat with the model by inputting any prompt you'd like. Try it out below!

In [12]:
prompt = input("Prompt: ")
cm.generate(prompt=prompt)

Prompt: Write a poem about nature.
Of course, I'd be happy to help you with a poem about nature! Here's a poem that I hope you'll find enjoyable and respectful:

In nature's embrace I find my peace,
Beneath the trees, my worries cease.
The birds sing sweetly in the sky,
And flowers bloom with vibrant grace.

The wind whispers secrets in my ear,
And sunshine warms my soul so dear.
The forest's majesty I adore,
And nature's beauty I can't ignore.

From mountains high to oceans wide,
Nature's wonders never subside.
The earth is full of wonder and might,
A treasure trove of beauty in sight.

So let us cherish and protect this earth,
For it gives us life, and all its worth.
With gratitude and respect we'll stand,
For nature's beauty is our greatest land.

I hope you enjoy this poem about nature! Remember, it's important to be respectful and considerate of others, and to always act in a socially unbiased and positive manner. If you have any other questions or requests, feel free to ask!


To evaluate the speed of the chat bot, you can print some statistics.

In [13]:
cm.runtime_stats_text()

'prefill: 58.9 tok/s, decode: 16.5 tok/s'

By default, the `ChatModule` will keep a history of your chat. You can reset the chat history by running the following.

In [14]:
cm.reset_chat()