llama.cs for Unity

llama.cs is a simple implementation of an LLM Chat built on top of llama.cs, the C# binding for llama.cpp. It includes llama.cs, high-level APIs like LLM, LLMHost, and a Chat UI.

Youtube: https://www.youtube.com/watch?v=6V4KhO6lM04

Introduction

This asset serves as an excellent starting point for exploring and utilizing Large Language Models (LLMs) within the Unity environment. It comes equipped with the llama.cs binding, precompiled binaries for Windows (instructions for compiling binaries on other platforms, available in the llama.cpp repository), and a straightforward and simplistic user interface. Whether you are a developer looking to integrate LLMs into your Unity projects or an enthusiast eager to experiment with language models, this asset provides the essential tools.

Features

Single-file llama.cs: The C# binding for llama.cpp
High-level LLM wrapper implementation
Optimized Chat UI with Virtualized Scroll View

System Requirements

Please refer to this section for information on memory and disk requirements. The recommended machine is one with an 8GB Nvidia GPU, a modern processor, and 32GB RAM.

Getting Started

Open Assets/Battlehub/Chat scene
Download orca-2-7b.Q5_K_S.gguf here.
Copy orca-2-7b.Q5_K_S.gguf to Assets/StreamingAssets folder
Enter play mode

Note You can use the following link to download the model: https://huggingface.co/TheBloke/Orca-2-7B-GGUF/resolve/main/orca-2-7b.Q5_K_S.gguf?download=true

Note You might want to replace llama.dll and llava_shared.dll with one that fits your platform (see Assets\Battlehub\LLama\Plugins folder): https://github.com/ggerganov/llama.cpp/releases/tag/b2667

Note if you have an NVIDIA GPU, use the cuda version for best performance:

Note The current supported version is b2667.

Note To build llama.cpp from source, please refer to the following section: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage

Definitions

llama.cs

A single-file P/Invoke binding for llama.cpp.

LLM.cs

High-level wrapper for the llama.cs API.

LLMHost.cs

MonoBehaviour responsible for loading/unloading LLM and dispatching calls from the main to LLM thread.

Chat UI

Implementation of Chat UI, comprising ChatUI.prefab, ChatUI.cs, and VirtualScroll.cs.

Config Files

Config files are JSON files with serialized gpt_params structure. Example config files can be found in the Assets/StreamingAssets/Configs folder. empty-gpt_params.json is an empty template with all available parameters, while orca-2-7b-gpt_params.json is a config file used for demonstration purposes in the Getting Started section.

To load a config file using the Chat UI, use the "Load gpt params" button.

Examples

Dummy LLM Implementation

This example demonstrates how to replace the default LLM implementation with your own.

using System.Collections.Generic;
using System.Threading;

namespace Battlehub.LLama.Examples
{
    public class DummyLLM : ILLM
    {
        private string m_input;

        public IEnumerator<Response> Initialize()
        {
            yield return new Response("Initializing", ResponseType.Log);
            yield return new Response("Initialized", ResponseType.InitCompleted);
        }

        public void Input(string input, byte[][] images = null)
        {
            m_input = input;
        }

        public IEnumerator<Response> Loop()
        {
            yield return new Response("Chat started. Enter \"Stop\" to end.", ResponseType.Log);
            while (true)
            {
                yield return new Response(ResponseType.Eos);
                
                if (m_input == "Stop")
                {
                    break;
                }

                yield return new Response(ResponseType.Bos);

                string[] tokens = m_input.Split(' ');
                for (int i = 0; i < tokens.Length; i++) 
                {
                    Thread.Sleep(100);
                    yield return new Response($"{tokens[i]} ", ResponseType.Token);
                }
            }

            yield return new Response("Chat Ended", ResponseType.Log);
            yield return new Response(ResponseType.EndOfText);
        }

        public void Dispose()
        {
        }
    }
}

namespace Battlehub.LLama.Examples
{
    public class DummyLLMHost : LLMHost
    {
        protected override ILLM CreateLLM()
        {
            return new DummyLLM();
        }
    }
}

Note
The complete example can be found in Assets/Battlehub/LLama/Examples/DummyLLM.

LLM Client

This example illustrates how to utilize the LLM host for communication with LLM.

using System.IO;
using System.Text;
using System.Threading.Tasks;
using UnityEngine;

namespace Battlehub.LLama.Examples
{
    public class LLMClient : MonoBehaviour
    {
        private ILLMHost m_host;

        [SerializeField]
        private string m_configPath;

        private void Start()
        {
            if (!File.Exists($"{Application.streamingAssetsPath}/orca-2-7b.Q5_K_S.gguf"))
            {
                Debug.LogWarning("Download orca-2-7b.Q5_K_M.gguf and move it to the StreamingAssets folder. <a href=\"https://huggingface.co/TheBloke/Orca-2-7B-GGUF/resolve/main/orca-2-7b.Q5_K_S.gguf?download=true\">https://huggingface.co/TheBloke/Orca-2-7B-GGUF/resolve/main/orca-2-7b.Q5_K_S.gguf?download=true</a>");
            }

            if (string.IsNullOrEmpty(m_configPath))
            {
                m_configPath = $"{Application.streamingAssetsPath}/Configs/orca-2-7b-gpt_params.json";
            }

            m_host = gameObject.AddComponent<LLMHost>();
            m_host.Response += OnResponse;
            m_host.ConfigPath = m_configPath;
        }

        private StringBuilder m_stringBuilder = new StringBuilder();
        private async void OnResponse(Response response)
        {
            if (response.ResponseType == ResponseType.InitCompleted)
            {
                Debug.Log("InitCompleted");
            }
            else if (response.ResponseType == ResponseType.Bos)
            {
                m_stringBuilder.Clear();
            }
            else if (response.ResponseType == ResponseType.Token)
            {
                m_stringBuilder.Append(response.Data);
            }
            else if (response.ResponseType == ResponseType.Eos)
            {
                await Task.Yield();

                if (m_stringBuilder.Length == 0)
                {
                    const string prompt1 = "Hi, Friend. Come up with a few lines of a story";
                    Debug.Log(prompt1);

                    m_host.SendRequest(prompt1);
                }
                else
                {
                    Debug.Log(m_stringBuilder.ToString());
                    m_stringBuilder.Clear();

                    const string prompt2 = "What happened next?";
                    Debug.Log(prompt2);

                    m_host.SendRequest(prompt2);
                }
            }
            else
            {
                Debug.Log(response.Data);
            }
        }
    }
}

Note
The complete example can be found in Assets/Battlehub/LLama/Examples/LLMClient.

Support

If you cannot find something in the documentation or have any questions, please feel free to send an email to Battlehub@outlook.com or ask directly in this support group. Keep up the great work in your development journey! 😊

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Docs/Images		Docs/Images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs/Images