Name		Name	Last commit message	Last commit date
parent directory ..
chat_app		chat_app
images		images
README.md		README.md

README.md

Meta Llama: Next generation of Meta's Language Model

TorchServe supports serving Meta Llama in a number of ways. The examples covered in this document range from someone new to TorchServe learning how to serve Meta Llama with an app, to an advanced user of TorchServe using micro batching and streaming response with Meta Llama.

🦙💬 Meta Llama Chatbot

Example Link

This example shows how to deploy a llama chat app using TorchServe. We use streamlit to create the app

This example is using llama-cpp-python.

You can run this example on your laptop to understand how to use TorchServe, how to scale up/down TorchServe backend workers and play around with batch_size to see its effect on inference time

Meta Llama with HuggingFace

Example Link

This example shows how to serve meta-llama/Meta-Llama-3-70B-Instruct model with limited resource using HuggingFace. It shows the following optimizations 1) HuggingFace accelerate. This option can be activated with low_cpu_mem_usage=True. 2) Quantization from bitsandbytes using load_in_8bit=True The model is first created on the Meta device (with empty weights) and the state dict is then loaded inside it (shard by shard in the case of a sharded checkpoint).

Llama 2 on Inferentia

Example Link

PyTorch Blog

This example shows how to serve the Llama 2 model on AWS Inferentia2 for text completion with micro batching and streaming response support.

Inferentia2 uses Neuron SDK which is built on top of PyTorch XLA stack. For large model inference transformers-neuronx package is used that takes care of model partitioning and running inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama

llama

chat_app

chat_app

images

images

README.md

README.md

README.md

Meta Llama: Next generation of Meta's Language Model

🦙💬 Meta Llama Chatbot

Example Link

Meta Llama with HuggingFace

Example Link

Llama 2 on Inferentia

Example Link

PyTorch Blog

Files

llama

Directory actions

More options

Directory actions

More options

Latest commit

History

llama

Folders and files

parent directory

Meta Llama: Next generation of Meta's Language Model

🦙💬 Meta Llama Chatbot

Meta Llama with HuggingFace

Llama 2 on Inferentia