# Intro to HoloViz

HoloViz is a suite of high-level Python tools that are designed to work together to make visualizing data a breeze, from conducting exploratory data analysis to deploying complex dashboards.

The core HoloViz projects are as follows:

- [Panel](https://panel.holoviz.org): Create interactive dashboards in Jupyter notebooks or standalone apps
- [hvPlot](https://hvplot.holoviz.org): Quickly and interactively explore data with a familiar API
- [HoloViews](https://holoviews.org): Interactive plotting experience
- [GeoViews](http://geoviews.org): Geographic extension of HoloViews
- [Datashader](https://datashader.org): Render big data images in a browser
- [Lumen](https://lumen.holoviz.org/): Construct no-code dashboards from simple YAML specifications
- [Colorcet](https://colorcet.holoviz.org/): Plot with perceptually based colormaps
- [Param](https://param.holoviz.org): Declaratively code in Python

## What is Panel

Today, the focus is on Panel.

Panel packs many pre-built frontend components that are **usable with Python**.

That means you can convert your static Python scripts into interactive ones--**no Javascript necessary**!

In [None]:
import panel as pn
pn.extension()

## Basic Panel Tutorial

Let's start out building an interactive app that allows the user to print a custom message.

Currently it's hard coded to `"Hello World"`.

In [None]:
print("Hello World!")

We can give the user more control by introducing a `TextInput` widget.

In [None]:
message_input = pn.widgets.TextInput(value="Hello World!")

Then, we can `pn.bind` the widget's `param.value` to the callback, `echo_message`, which simply echos the input value on change.

Note: it's important to prefix `value` with `param`--without it, there will be no updates!

In [None]:
def echo_message(message):
    return f"<i>{message}</i>"

message_ref = pn.bind(echo_message, message=message_input.param.value)

Next, create a simple layout to see the results.

Try typing unique in the widget to see the message update!

In [None]:
pn.Column(message_input, message_ref)

To recap, we:

1. instantiated a widget (`TextInput`).
2. defined a function `echo_message`
3. bounded the function to the widget's *param* value
4. laid out the the widget and the bound reference

![recap](images/.png)

Here's all the code cells collected into one!

In [None]:
import panel as pn
pn.extension()

message_input = pn.widgets.TextInput(value="Hello World!")

def echo_message(message):
    return f"<i>{message}</i>"

message_ref = pn.bind(echo_message, message=message_input.param.value)

pn.Column(message_input, message_ref)

Doing this repeatedly is key to creating more complex apps with Panel, so let's do a quick exercise.

Your goal is to create a widget that will toggle the message to upper case if activated by filling out the ellipses (`...`)!

Hint: check out the [Component gallery](https://panel.holoviz.org/reference/index.html) to see what widgets are available to accomplish this goal (one of them starts with a T, but there are multiple solutions!).

In [None]:
import panel as pn
pn.extension()

message_input = pn.widgets.TextInput(value="Hello World!")
toggle_upper = ...

def echo_message(message, toggle_upper):
    ...
    return f"<i>{message}</i>"

message_ref = pn.bind(echo_message, message=message_input.param.value, toggle_upper=...)

pn.Column(message_input, message_ref)

Congrats on building an interactive Panel app! 🎉

## Basic Panel ChatInterface

Now, introducing `pn.chat.ChatInterface`, which is a component that packages all the steps you just learned to provide convenient features for developing a Chat UI with LLMs!

Try typing a message and pressing enter to send!

In [None]:
chat = pn.chat.ChatInterface()
chat

You might have noticed that it echoes the message you entered, but it doesn't reply... not fun (yet).

To make it reply, all we have to do is set a `callback`, like `pn.bind`, but with a caveat: it needs these three arguments: `contents`, `user`, and `instance`.

Now when you try sending a message in the chat interface, it will be echoed back in italics!

In [None]:
def echo_message(contents: str, user: str, instance: pn.chat.ChatInterface):
    return f"<i>{contents}</i>"

chat.callback = echo_message

You might have seen services, like OpenAI and Mistral, stream tokens as they arrive.

We can simulate streaming tokens by looping through the contents of the user's input, concatenating the characters to the final message, and `yield`ing it.

Since there's no serious computation, it'll run too fast for us to perceive streaming--thus `time.sleep`.

Here's the latest code collected into one (and also `callback` within instantation).

In [None]:
import time
import panel as pn
pn.extension()

def stream_echo_message(contents: str, user: str, instance: pn.chat.ChatInterface):
    message = ""
    for char in contents:
        time.sleep(0.1)  # to simulate a serious computation
        message += char
        yield f"<i>{message}</i>"

chat = pn.chat.ChatInterface(callback=stream_echo_message)
chat

Awesome! Now let's make it much more interesting by connecting an LLM, like the quantized Mistral Instruct 7B model through ExLlama (so no API key necessary)!

Here, we:
1. download the quantized model (if it doesn't exist already) in exl2 format
2. instantiate the model; first checking the cache
3. calls the chat completion through the streaming generator
4. stream the chunks

In [None]:
import panel as pn
from huggingface_hub import snapshot_download
from exllamav2 import(
    ExLlamaV2,
    ExLlamaV2Config,
    ExLlamaV2Cache,
    ExLlamaV2Tokenizer,
)

from exllamav2.generator import (
    ExLlamaV2BaseGenerator,
    ExLlamaV2Sampler
)
from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2StreamingGenerator

model_directory = snapshot_download(
    repo_id="turboderp/Mistral-7B-v0.2-exl2", revision="2.5bpw"
)  # 1.

# 2.
if model_directory in pn.state.cache:
    generator = pn.state.cache[model_directory]
else:
    config = ExLlamaV2Config()
    config.model_dir = model_directory
    config.prepare()
    model = ExLlamaV2(config)
    cache = ExLlamaV2Cache(model, lazy=True)
    model.load_autosplit(cache)
    tokenizer = ExLlamaV2Tokenizer(config)
    generator = ExLlamaV2BaseGenerator(model, cache, tokenizer)
    settings = ExLlamaV2Sampler.Settings()
    settings.temperature = 0.85
    settings.top_k = 50
    settings.top_p = 0.8
    settings.token_repetition_penalty = 1.01
    settings.disallow_tokens(tokenizer, [tokenizer.eos_token_id])
    generator = ExLlamaV2StreamingGenerator(model, cache, tokenizer)
    pn.state.cache[model_directory] = generator


def stream_response(contents: str, user: str, instance: pn.chat.ChatInterface):
    input_ids = tokenizer.encode(contents, add_bos=False)
    generator.begin_stream_ex(input_ids, settings)

    message = ""
    for _ in range(256):
        result = generator.stream_ex()
        if result["eos"]:
            break
        message += result["chunk"]  # 4.
        yield message


chat = pn.chat.ChatInterface(callback=stream_response)
chat

For posterity, we can use `llama-cpp-python` for quantized models too!

`llama-cpp` can run on both CPU and GPU, and has an API that mimics OpenAI's API. Personally, I use it because I don't have any spare GPUs lying around and it runs extremely well on my local Mac M2 Pro! It also handles chat template formats internally so it's just a matter of specifying a the proper `chat_format` key.

Here, we:
1. download the quantized model (if it doesn't exist already) in GGUF format
2. instantiate the model; first checking the cache
3. serialize all messages into `transformers` format (new)
4. calls the chat completion Openai-like API on the messages
5. stream the chunks

In [1]:
import llama_cpp
import panel as pn
from huggingface_hub import hf_hub_download
pn.extension()

model_path = hf_hub_download(
    "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    "mistral-7b-instruct-v0.2.Q5_K_M.gguf",
)  # 1.

# 2.
if model_path in pn.state.cache:
    llama = pn.state.cache[model_path]
else:
    llama = llama_cpp.Llama(
        model_path=model_path,
        n_gpu_layers=-1,
        chat_format="mistral-instruct",
        n_ctx=2048,
        logits_all=True,
        verbose=False,
    )
    pn.state.cache[model_path] = llama

def stream_response(contents: str, user: str, instance: pn.chat.ChatInterface):
    messages = instance.serialize()  # 3.
    response = llama.create_chat_completion_openai_v1(messages=messages, stream=True)  # 4.

    message = ""
    for chunk in response:
        part = chunk.choices[0].delta.content or ""
        message += part
        yield message  # 5.

chat = pn.chat.ChatInterface(callback=stream_response)
chat

We can even give the model a personality by setting a system message!

Note, Mistral Instruct does NOT support the `system` role.

In [None]:
import llama_cpp
import panel as pn
from huggingface_hub import hf_hub_download

pn.extension()

system_message = "You are an excessively passionate Pythonista."

model_path = hf_hub_download(
    "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    "mistral-7b-instruct-v0.2.Q5_K_M.gguf",
)  # 1.

# 2.
if model_path in pn.state.cache:
    llama = pn.state.cache[model_path]
else:
    llama = llama_cpp.Llama(
        model_path=model_path,
        n_gpu_layers=-1,
        chat_format="mistral-instruct",
        n_ctx=2048,
        logits_all=True,
        verbose=False,
    )
    pn.state.cache[model_path] = llama

def stream_response(contents: str, user: str, instance: pn.chat.ChatInterface):
    messages = [
        {"role": "user", "content": system_message}
    ] + instance.serialize()  # 3.
    response = llama.create_chat_completion_openai_v1(
        messages=messages, stream=True
    )  # 4.

    message = ""
    for chunk in response:
        part = chunk.choices[0].delta.content or ""
        message += part
        yield message  # 5.


chat = pn.chat.ChatInterface(callback=stream_response)
chat

We can make this Chat UI improved by using templates.

In [None]:
template = pn.template.FastListTemplate(main=[chat], title="Chatbot", accent="#A01346")
template.show()

Your turn! Try aggregating all you've learned to customize the personality of the chatbot on the go!

Again, replace the ellipses with the appropriate code snippets!

In [None]:
import llama_cpp
import panel as pn
from pydantic import BaseModel
from huggingface_hub import hf_hub_download

pn.extension()

system_message = "You are an excessively passionate Pythonista."

model_path = hf_hub_download(
    "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    "mistral-7b-instruct-v0.2.Q5_K_M.gguf",
)  # 1.

# 2.
if model_path in pn.state.cache:
    llama = pn.state.cache[model_path]
else:
    llama = llama_cpp.Llama(
        model_path=model_path,
        n_gpu_layers=-1,
        chat_format="mistral-instruct",
        n_ctx=2048,
        logits_all=True,
        verbose=False,
    )
    pn.state.cache[model_path] = llama

def stream_response(contents: str, user: str, instance: pn.chat.ChatInterface):
    messages = [
        {"role": "user", "content": ...}
    ] + instance.serialize()  # 3.
    response = llama.create_chat_completion_openai_v1(
        messages=messages, stream=True
    )  # 4.

    message = ""
    for chunk in response:
        part = chunk.choices[0].delta.content or ""
        message += part
        yield message  # 5.


system_input = ...
chat = pn.chat.ChatInterface(callback=stream_response)
template = pn.template.FastListTemplate(
    main=[chat], sidebar=[...], title="Chatbot", accent="#A01346"
)
template.show()