![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/llama.cpp/PromptAssember_with_AutoGGUFModel.ipynb)

# PromptAssembler with AutoGGUFModel

Let's keep in mind a few things before we start 😊

- llama.cpp support in the form of the `AutoGGUFModel` was introduced in `Spark NLP 5.5.0`, enabling quantized LLM inference on a wide range of devices. Please make sure you have upgraded to the latest Spark NLP release.
- The `PromptAssembler` was introduced in `Spark NLP 5.5.1` to enable the construction of message prompts.

This notebook will show you how you can construct your own message prompts for the AutoGGUFModel.

## Install and Start Spark NLP

- Let's install and setup Spark NLP (if running it Google Colab)
- This part is pretty easy via our simple script

In [None]:
# Only execute this if you are on Google Colab
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Let's start Spark with Spark NLP included via our simple `start()` function

In [None]:
import sparknlp

# let's start Spark with Spark NLP with GPU enabled. If you don't have GPUs available remove this parameter.
spark = sparknlp.start(gpu=True)
print(sparknlp.version())

Let's create a `PromptAssembler` and use it to recreate the following conversation between a chatbot and a user:

```
SYSTEM: You are a helpful assistant.
ASSISTANT: Hello there! How can I help you today?
USER: I need help with organizing my room, give me some advice.
```

First we need to structure our messages in our Spark DataFrame correctly. For each row, the PromptAssembler expects an array of two-tuples. The first field should be the role and the second field the message. We will call this column `message`.

In [None]:
messages = [
    ("system", "You are a helpful assistant."),
    ("assistant", "Hello there! How can I help you today?"),
    ("user", "I need help with organizing my room, give me some advice."),
]
df = spark.createDataFrame([[messages]]).toDF("messages")
df.show(truncate=False)

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|messages                                                                                                                                                        |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{system, You are a helpful assistant.}, {assistant, Hello there! How can I help you today?}, {user, I need help with organizing my room, give me some advice.}]|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+



Let's create the PromptAssembler to generate the prompts. We will use the template from [llama3.1 (extracted from the gguf file)](https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF?show_file_info=Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf).

By default, the `addAssistant` parameter is set to `True`, so a assistant header will be appended to the end.

In [None]:
from sparknlp.base import *


template = (
    "{{- bos_token }} {%- if custom_tools is defined %} {%- set tools = custom_tools %} {%- "
    "endif %} {%- if not tools_in_user_message is defined %} {%- set tools_in_user_message = true %} {%- "
    'endif %} {%- if not date_string is defined %} {%- set date_string = "26 Jul 2024" %} {%- endif %} '
    "{%- if not tools is defined %} {%- set tools = none %} {%- endif %} {#- This block extracts the "
    "system message, so we can slot it into the right place. #} {%- if messages[0]['role'] == 'system' %}"
    " {%- set system_message = messages[0]['content']|trim %} {%- set messages = messages[1:] %} {%- else"
    ' %} {%- set system_message = "" %} {%- endif %} {#- System message + builtin tools #} {{- '
    '"<|start_header_id|>system<|end_header_id|>\\n\\n" }} {%- if builtin_tools is defined or tools is '
    'not none %} {{- "Environment: ipython\\n" }} {%- endif %} {%- if builtin_tools is defined %} {{- '
    '"Tools: " + builtin_tools | reject(\'equalto\', \'code_interpreter\') | join(", ") + "\\n\\n"}} '
    '{%- endif %} {{- "Cutting Knowledge Date: December 2023\\n" }} {{- "Today Date: " + date_string '
    '+ "\\n\\n" }} {%- if tools is not none and not tools_in_user_message %} {{- "You have access to '
    'the following functions. To call a function, please respond with JSON for a function call." }} {{- '
    '\'Respond in the format {"name": function name, "parameters": dictionary of argument name and its'
    ' value}.\' }} {{- "Do not use variables.\\n\\n" }} {%- for t in tools %} {{- t | tojson(indent=4) '
    '}} {{- "\\n\\n" }} {%- endfor %} {%- endif %} {{- system_message }} {{- "<|eot_id|>" }} {#- '
    "Custom tools are passed in a user message with some extra guidance #} {%- if tools_in_user_message "
    "and not tools is none %} {#- Extract the first user message so we can plug it in here #} {%- if "
    "messages | length != 0 %} {%- set first_user_message = messages[0]['content']|trim %} {%- set "
    'messages = messages[1:] %} {%- else %} {{- raise_exception("Cannot put tools in the first user '
    "message when there's no first user message!\") }} {%- endif %} {{- "
    "'<|start_header_id|>user<|end_header_id|>\\n\\n' -}} {{- \"Given the following functions, please "
    'respond with a JSON for a function call " }} {{- "with its proper arguments that best answers the '
    'given prompt.\\n\\n" }} {{- \'Respond in the format {"name": function name, "parameters": '
    'dictionary of argument name and its value}.\' }} {{- "Do not use variables.\\n\\n" }} {%- for t in '
    'tools %} {{- t | tojson(indent=4) }} {{- "\\n\\n" }} {%- endfor %} {{- first_user_message + '
    "\"<|eot_id|>\"}} {%- endif %} {%- for message in messages %} {%- if not (message.role == 'ipython' "
    "or message.role == 'tool' or 'tool_calls' in message) %} {{- '<|start_header_id|>' + message['role']"
    " + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }} {%- elif 'tool_calls' in "
    'message %} {%- if not message.tool_calls|length == 1 %} {{- raise_exception("This model only '
    'supports single tool-calls at once!") }} {%- endif %} {%- set tool_call = message.tool_calls[0]'
    ".function %} {%- if builtin_tools is defined and tool_call.name in builtin_tools %} {{- "
    "'<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}} {{- \"<|python_tag|>\" + tool_call.name + "
    '".call(" }} {%- for arg_name, arg_val in tool_call.arguments | items %} {{- arg_name + \'="\' + '
    'arg_val + \'"\' }} {%- if not loop.last %} {{- ", " }} {%- endif %} {%- endfor %} {{- ")" }} {%- '
    "else %} {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}} {{- '{\"name\": \"' + "
    'tool_call.name + \'", \' }} {{- \'"parameters": \' }} {{- tool_call.arguments | tojson }} {{- "}" '
    "}} {%- endif %} {%- if builtin_tools is defined %} {#- This means we're in ipython mode #} {{- "
    '"<|eom_id|>" }} {%- else %} {{- "<|eot_id|>" }} {%- endif %} {%- elif message.role == "tool" '
    'or message.role == "ipython" %} {{- "<|start_header_id|>ipython<|end_header_id|>\\n\\n" }} {%- '
    "if message.content is mapping or message.content is iterable %} {{- message.content | tojson }} {%- "
    'else %} {{- message.content }} {%- endif %} {{- "<|eot_id|>" }} {%- endif %} {%- endfor %} {%- if '
    "add_generation_prompt %} {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }} {%- endif %} "
)

promptAssembler = (
    PromptAssembler()
    .setInputCol("messages")
    .setOutputCol("prompt")
    .setChatTemplate(template)
)

Let's see how the final prompt looks like.

In [None]:
promptAssembler.transform(df).select("prompt.result").show(truncate=False)

[Stage 8:>                                                          (0 + 1) / 1]

jsl-llama: Extracted 'libjllama.so' to '/tmp/libjllama.so'
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                                                                                

Now you can feed the prompt to a llama3.1 model loaded with AutoGGUFModel. Depending on your messages, you might need to the chat template or system prompt in the AutoGGUFModel. For example:

In [None]:
from sparknlp.annotator import AutoGGUFModel

autoGGUFModel = (
    AutoGGUFModel.loadSavedModel("path/to/llama3.1", spark)
    .setInputCols("prompt")
    .setOutputCol("completions")
    .setBatchSize(4)
    .setNGpuLayers(99)
    .setUseChatTemplate(False)  # Don't apply the chat template
    .setSystemPrompt(
        "Your system prompt"
    )  # Set custom system prompt if not specified in the messages. Leave empty for default.
)