description

title

keywords

image

slug

last_update

no_comments

Quantized Llama2-7B with MLC LLM on Jetson

Edge

reComputer

Jetson

Llama2

MLC LLM

https://files.seeedstudio.com/wiki/wiki-platform/S-tempor.png

/Quantized_Llama2_7B_with_MLC_LLM_on_Jetson

date	author
04/1/2024	Jiahao

false

Quantized Llama2-7B with MLC LLM on Jetson

Introduction

In recent years, large language models such as GPT-3 have revolutionized natural language processing tasks. However, most of these models are trained on large-scale datasets, which require powerful computing resources and are not suitable for deployment on edge devices. To address this issue, researchers have developed quantization techniques to compress large models into smaller ones without sacrificing performance.

In this project, we introduce a quantized version of Llama2-7B, a large language model trained on 1.5TB of data, and deploy it on the Jetson Orin. We also leverage the Machine Learning Compiler Large Language Modle(MLC LLM) to accelerate the inference speed of the model. By deploying the quantized Llama2-7B with MLC LLM on the Jetson Orin NX, developers can build powerful natural language processing applications that deliver high accuracy and low latency on edge devices.

Hardware components

reComputer(Or other devices based on Jetson)

Get One Now 🖱️

Install dependencies:

sudo apt-get update && sudo apt-get install git python3-pip

git clone --depth=1 https://github.com/dusty-nv/jetson-containers

cd jetson-containers pip3 install -r requirements.txt

cd ./data && git clone https://github.com/LJ-Hao/MLC-LLM-on-Jetson-Nano.git && cd ..

Install and run contiainer

first step: install image

./run.sh --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN> $(./autotag mlc) /bin/bash -c 'ln -s $(huggingface-downloader meta-llama/Llama-2-7b-chat-hf) /data/models/mlc/dist/models/Llama-2-7b-chat-hf'

use sudo docker images to check wether the image is installed or not

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

./run.sh $(./autotag mlc) \
python3 -m mlc_llm.build \
--model Llama-2-7b-chat-hf \
--quantization q4f16_ft \
--artifact-path /data/models/mlc/dist \
--max-seq-len 4096 \
--target cuda \
--use-cuda-graph \
--use-flash-attn-mqa

Thrid step: Run and enter docker

./run.sh <YOUR IMAGE NAME> 
#for me dustynv/mlc:51fb0f4-builder-r35.4.1 check result of first step

Let's run it

run Llama without quanifing without MLC LLM quantified

cd /data/MLC-LLM-on-Jetson && python3 Llama-2-7b-chat-hf.py

you can see without quanifing with MLC, Jetson Nano 16GB can load the model but cant not run.

run Llama with quanifing with MLC LLM quantified

cd /data/MLC-LLM-on-Jetson && python3 Llama-2-7b-chat-hf-q4f16_ft.py

here is the result:

Video of running Llama with MLC on Jetson Orin NX 16GB:

Project Outlook

In this project, we have demonstrated how to deploy a quantized version of Llama2-7B with MLC LLM on the Jetson Orin. With the powerful computing capabilities of the Jetson Orin, developers can build natural language processing applications that deliver high accuracy and low latency on edge devices. In the future, we will continue to explore the potential of deploying large language models on edge devices and develop more efficient and optimized deployment methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized_Llama2_7B_with_MLC_LLM_on_Jetson.md

Quantized_Llama2_7B_with_MLC_LLM_on_Jetson.md

Quantized Llama2-7B with MLC LLM on Jetson

Introduction

Hardware components

Install dependencies:

Install and run contiainer

first step: install image

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

Thrid step: Run and enter docker

Let's run it

run Llama without quanifing without MLC LLM quantified

run Llama with quanifing with MLC LLM quantified

Video of running Llama with MLC on Jetson Orin NX 16GB:

Project Outlook

Files

Quantized_Llama2_7B_with_MLC_LLM_on_Jetson.md

Latest commit

History

Quantized_Llama2_7B_with_MLC_LLM_on_Jetson.md

File metadata and controls

Quantized Llama2-7B with MLC LLM on Jetson

Introduction

Hardware components

Install dependencies:

Install and run contiainer

first step: install image

second step: Install Llama2-7b-chat-hf and Use MLC quantify the model

Thrid step: Run and enter docker

Let's run it

run Llama without quanifing without MLC LLM quantified

run Llama with quanifing with MLC LLM quantified

Video of running Llama with MLC on Jetson Orin NX 16GB:

Project Outlook