# Wasm Chat

`Wasm-chat` allows you to chat with LLMs of [GGUF](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/README.md) format both locally and via chat service.


`Wasm-chat` is driven by [WasmEdge Runtime](https://wasmedge.org/), a popular WebAssmbly runtime. It is also an attempt to run AI tasks within wasm containers.

## Chat Locally

### Setup

To deploy WasmEdge Runtime and `wasm-infer.wasm` in your local system, run the following command in your terminal:

In [None]:
!curl -sSf https://raw.githubusercontent.com/second-state/wasm-llm/main/deploy.sh | bash

### Get model

For the purpose of demonstration, we use a smaller model `TinyLlama-1.1B-Chat-v0.3`. Run the following command to download the model:

In [None]:
!curl -LO https://huggingface.co/second-state/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf

More models and sample code can be found in [GGUF Models](https://github.com/second-state/wasm-llm/blob/main/MODEL.md).

### Code: One-turn Conversation

The following code shows a one-turn conversation with the model:

In [2]:
from langchain.chat_models.wasm_chat import WasmChatLocal, PromptTemplateType
from langchain.schema.messages import AIMessage, HumanMessage, SystemMessage

model_file = "tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf"

chat = WasmChatLocal(model_file=model_file, prompt_template=PromptTemplateType.ChatML)

system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of France?")
messages = [system_message, user_message]
chat_result = chat(messages)
print(f"[Bot] {chat_result.content}")

assert isinstance(chat_result, AIMessage)
assert isinstance(chat_result.content, str)
assert "Paris" in chat_result.content

## Chat via API Service

Compare with chat locally, chat via API service is more convenient. You don't need to install WasmEdge Runtime and download the model. You can chat with the model by sending a HTTP request to the API service.

More importantly, by following the steps in [README](https://github.com/second-state/llama-utils/tree/main/api-server#readme), you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available.

In [None]:
from langchain.chat_models.wasm_chat import WasmChatService
from langchain.schema.messages import AIMessage, HumanMessage, SystemMessage

ip_addr = "<service ip address>"
port = "<service port>"

chat = WasmChatService(service_ip_addr=ip_addr, service_port=port)
system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of France?")
messages = [system_message, user_message]
response = chat(messages)

assert isinstance(chat_result, AIMessage)
assert isinstance(chat_result.content, str)
assert "Paris" in chat_result.content