<a href="https://colab.research.google.com/github/TrelisResearch/trelis-colab-chat/blob/function-calling/Jupyter_Llama_Function_Calling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *About Jupyter Llama - with function calling!!!*

A Chat Assistant built on Llama 2.
- Upload pdf or text files for analysis.
- No data goes to OpenAI.
- No data is used for training language models.

Built by Trelis. Find us on [HuggingFace](https://huggingface.co/Trelis).

---
## Getting Set Up
- You can run Trelis Chat on a free Google Colab Notebook.
- Save a copy of this notebook: Go to File -> Save a copy in Drive. (optional, but needed if you want to make changes).
- Go to the menu -> Runtime -> Change Runtime Type - Select GPU (T4).
- Then go to Runtime -> Run all.
- It takes about 5-7 mins* for the installation (which all happens in the cloud in this notebook).
- Once all cells have run, you'll find the chat interface at the bottom.

Trelis has no access to your data when you run this notebook. All of your data remains within your Google Drive and Google's computers.

*Optionally, you can comment back in the code below to mount Google Drive. This will download the model to your Google Drive, bringing down the total start time to about 3 mins.

#### Internet Search Code

In [1]:
from getpass import getpass
import requests
import os
import json

def test_bing_search(api_key, endpoint, query="test"):
    """Make a test request to the Bing API and return whether it was successful."""
    try:
        # Append the API path to the endpoint
        endpoint = endpoint + "/v7.0/search"
        headers = { 'Ocp-Apim-Subscription-Key': api_key }
        params = { 'q': query }
        response = requests.get(endpoint, headers=headers, params=params)
        response.raise_for_status()
        return True
    except Exception as e:
        print(f"An error occurred during the request: {str(e)}")
        return False

def initialize_bing_search():
    # Ask the user if they want to enable Bing Search
    enable_bing_search = input("Do you want to enable Internet Search? (yes/no): ").lower()

    # Check the user's response
    if enable_bing_search == "yes":
        # Provide a link to the instructions for getting the Bing API Key and Endpoint
        print("\nTo enable Internet Search, you'll need a Bing API Key and Endpoint.")
        print("Sign in to https://portal.azure.com")
        print("Click 'Create Resource'")
        print("Search for 'Bing Search v7'")
        print("Click 'Create' to get your key and endpoint")

        while True:
            # Ask the user for their Bing API Key
            API_KEY = getpass("\nPlease enter your Bing API Key: ")
            BING_ENDPOINT = getpass("Please enter your Bing Endpoint: ")

            # Validate the user's inputs
            if API_KEY.strip() == "" or BING_ENDPOINT.strip() == "":
                print("Both API Key and Endpoint are required. Please try again.")
            elif not test_bing_search(API_KEY, BING_ENDPOINT):
                print("Unable to make a request with the provided API Key and Endpoint. Please check them and try again.")
            else:
                return (API_KEY, BING_ENDPOINT)

    elif enable_bing_search == "no":
        print("Internet Search disabled.")
        return None, None

    else:
        print("Unrecognized input. Please respond with 'yes' or 'no'.")

Do you want to enable Internet Search? (yes/no): yes

To enable Internet Search, you'll need a Bing API Key and Endpoint.
Sign in to https://portal.azure.com
Click 'Create Resource'
Search for 'Bing Search v7'
Click 'Create' to get your key and endpoint

Please enter your Bing API Key: ··········
Please enter your Bing Endpoint: ··········


## Set up Bing Search

In [None]:
API_KEY, BING_ENDPOINT = initialize_bing_search()

## Google Drive Mounting (optional, default is off)
- Allows you to download the model to Google Drive for faster startup next time.
- Uncomment the code below to mount Google Drive

In [12]:
import os
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [13]:
## Allow the model to be saved to Google Drive for faster startup next time

# This is the path to the Google Drive folder.
drive_path = "/content/drive"

# This is the path where you want to store your cache.
cache_dir_path = os.path.join(drive_path, "My Drive/huggingface_cache")

# Check if the Google Drive folder exists. If it does, use it as the cache_dir.
# If not, set cache_dir to None to use the default Hugging Face cache location.
if os.path.exists(drive_path):
    cache_dir = cache_dir_path
    os.makedirs(cache_dir, exist_ok=True) # Ensure the directory exists
else:
    cache_dir = None

print(cache_dir)

/content/drive/My Drive/huggingface_cache


# Language Model Installation



## Set Runtime

In [2]:
# Set the runtime to cpu or gpu. Leave as gpu for Google Colab.
runtime = "gpu"  # OR "cpu"

if runtime == "cpu":
    runtimeFlag = "cpu"
elif runtime == "gpu":
    runtimeFlag = "cuda:0"
else:
    print("Invalid runtime. Please set it to either 'cpu' or 'gpu'.")
    runtimeFlag = None

cache_dir = None # by default, don't set a cache directory
print("Runtime flag is:", runtimeFlag)

Runtime flag is: cuda:0


## Set up Internet Search

In [3]:
pip install azure-cognitiveservices-search-websearch msrest

Collecting azure-cognitiveservices-search-websearch
  Downloading azure_cognitiveservices_search_websearch-2.0.0-py2.py3-none-any.whl (35 kB)
Collecting msrest
  Downloading msrest-0.7.1-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.4/85.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting msrestazure<2.0.0,>=0.4.32 (from azure-cognitiveservices-search-websearch)
  Downloading msrestazure-0.6.4-py2.py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.5/40.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting azure-common~=1.1 (from azure-cognitiveservices-search-websearch)
  Downloading azure_common-1.1.28-py2.py3-none-any.whl (14 kB)
Collecting azure-core>=1.24.0 (from msrest)
  Downloading azure_core-1.28.0-py3-none-any.whl (185 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.4/185.4 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
Collecting isodate>=0.6.0 

In [73]:
def search_bing(query):
    try:
        # Append the API path to the endpoint
        endpoint = BING_ENDPOINT + "/v7.0/search"
        headers = { 'Ocp-Apim-Subscription-Key': API_KEY }
        params = { 'q': query }
        response = requests.get(endpoint, headers=headers, params=params)
        response.raise_for_status()

        result = response.json()

        if result["webPages"]["value"]:
            markdown_results = ""
            for item in result["webPages"]["value"]:
                markdown_results += f"#### [{item['name']}]({item['url']}) \n {item['snippet']} \n\n"
            return markdown_results
        else:
            return "No results found"
    except Exception as e:
        return f"An unexpected error occurred: {e}"

search_bing_metadata = {
        "function": "search_bing",
        "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
        "arguments": [
            {
                "name": "query",
                "type": "string",
                "description": "The search query string"
            }
        ]
    }

In [5]:
FUNCTIONS = {
    "search_bing": search_bing
}

In [6]:
print(search_bing("OpenAI GPT-4"))


[{'id': 'https://api.bing.microsoft.com/api/v7/#WebPages.0', 'name': 'GPT-4 - OpenAI', 'url': 'https://openai.com/product/gpt-4', 'isFamilyFriendly': True, 'displayUrl': 'https://openai.com/product/gpt-4', 'snippet': 'GPT-4 is more creative and collaborative than ever before. It can generate, edit, and iterate with users on creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style. Input', 'dateLastCrawled': '2023-08-02T06:07:00.0000000Z', 'language': 'en', 'isNavigational': True}, {'id': 'https://api.bing.microsoft.com/api/v7/#WebPages.1', 'name': 'GPT-4 - OpenAI', 'url': 'https://openai.com/research/gpt-4', 'isFamilyFriendly': True, 'displayUrl': 'https://openai.com/research/gpt-4', 'snippet': 'We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 

In [7]:
# Call function with a test query
test_query = "OpenAI"
results = search_bing(test_query)

print(results)

[{'id': 'https://api.bing.microsoft.com/api/v7/#WebPages.0', 'name': 'OpenAI', 'url': 'https://openai.com/', 'isFamilyFriendly': True, 'displayUrl': 'https://openai.com', 'snippet': 'Careers at OpenAI. Developing safe and beneficial AI requires people from a wide range of disciplines and backgrounds. View careers. I encourage my team to keep learning. Ideas in different topics or fields can often inspire new ideas and broaden the potential solution space. Lilian Weng Applied AI at OpenAI.', 'deepLinks': [{'name': 'API', 'url': 'https://openai.com/product', 'snippet': 'GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. Learn about GPT-4. Tabs. Advanced reasoning; Creativity; Visual input; Longer context; With broad general knowledge and domain expertise, GPT-4 can follow complex instructions in natural language and solve difficult problems with accuracy.'}, {'name': 'Research', 'url': 'https://openai.com/research/', 'snippet': 'A research agenda for asses

In [67]:
# Set the SYSTEM PROMPT

DEFAULT_SYSTEM_PROMPT = 'You are a helpful assistant that provides accurate and concise responses. Respond in markdown.'

if function_calling:
    SYSTEM_PROMPT = """You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant: { 'function': 'search_bing', 'description': 'Search the web for content on Bing. This allows users to search online/the internet/the web for content.', 'arguments': [ { 'name': 'query', 'type': 'string', 'description': 'The search query string' } ] } To call a function, respond - immediately and only - with a JSON object of the following format: { 'function': 'function_name', 'arguments': { 'argument1': value1, 'argument2': value2 } }"""

    SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT + "The following functions are available for you to fetch further data to answer user questions, if relevant:\n\n"

    # Randomize the order of function descriptions
    if search_bing_status == True:
        SYSTEM_PROMPT += json.dumps(search_bing_metadata, indent=4, separators=(',', ': '))

    SYSTEM_PROMPT += """
                \n\nTo call a function, respond - immediately and only - with a JSON object of the following format:\n{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}
"""

    SYSTEM_PROMPT += "\n\nDon't make use of functions unless the user asks you to search the web/online/internet."

else:
    SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT

print(SYSTEM_PROMPT)


You are a helpful assistant that provides accurate and concise responses. Respond in markdown.The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "search_bing",
    "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}
                

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}


Don't make use of functions unless the user asks you to search the web/online/internet.


## Select Language Model

In [11]:
### Select the language model

# # 7B model
# model_name_or_path = "Trelis/Llama-2-7b-chat-hf-function-calling"

# 13B model (recommended for better function prompting precision)
model_name_or_path = "Trelis/Llama-2-13b-chat-hf-function-calling"

### Install

In [14]:
# https://stackoverflow.com/questions/56081324/why-are-google-colab-shell-commands-not-working
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [15]:
## some of these are not needed, like bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install -q -U bitsandbytes
!pip install -q -U pdfminer.six

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m58.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for accelerate (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

### Import

In [16]:
import transformers
import torch
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig

## Load Model
If you have connected to Google Drive, the model will load from there (unless this is your first time connecting, in which case the model will be saved to Drive).
- Takes about 6 mins first time around.
- Takes about 30s 2nd time onwards with Google Drive.

In [17]:
extrapolation_factor = 1.0 # allows for a max sequence length of 8192 tokens (~6k words) with a factor of 2.0! Unfortunately, requires Colab Pro and a V100 or A100 to have sufficient RAM.

if runtime == "gpu":
    # Load the model in 4-bit to allow it to fit in a free Google Colab runtime with a CPU and T4 GPU
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True, #adds speed with minimal loss of quality.
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        quantization_config=bnb_config,
        device_map='auto', # for inference use 'auto', for training use device_map={"":0}
        trust_remote_code=True,
        rope_scaling = {"type": "dynamic", "factor": extrapolation_factor}, # allows for a max sequence length of 8192 tokens with a factor of 2.0!!!
        cache_dir=cache_dir)
    # Not possible to use bits and bits if using cpu only, afaik
else:
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map=runtimeFlag, trust_remote_code=True, cache_dir=cache_dir) # this can easily exhaust Colab RAM. Note that bfloat16 can't be used on cpu.

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

In [18]:
print(model.config)

LlamaConfig {
  "_name_or_path": "Trelis/Llama-2-13b-chat-hf-function-calling",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 40,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "quantization_config": {
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 1.0,
    "type": "dynamic"
  },
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.32.

In [19]:
print(model.config.max_position_embeddings*extrapolation_factor)

4096.0


## Set up the Tokenizer

In [20]:
# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir, use_fast=True) # will use the Rust fast tokenizer if available

In [21]:
print("BOS token:", tokenizer.bos_token)
print("EOS token:", tokenizer.eos_token)

BOS token: <s>
EOS token: </s>


In [22]:
from IPython.display import display, HTML, clear_output, Markdown
import textwrap, json
import ipywidgets as widgets
import re, time
from google.colab import files
from pdfminer.high_level import extract_text
import io

In [23]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

# max_doc_length = 50
max_doc_length = int(0.75 * model.config.max_position_embeddings*extrapolation_factor)  # max doc length is 75% of the context length
max_doc_words = int(0.75*max_doc_length)

In [74]:
def generate_response(dialogs, temperature=0.01, top_p=0.9, logprobs=False):
    torch.cuda.empty_cache()
    # print(json.dumps(dialogs, indent=4))
    max_prompt_len = int(0.85 * model.config.max_position_embeddings*extrapolation_factor)
    max_gen_len = int(0.10 * max_prompt_len)

    prompt_tokens = []
    for dialog in dialogs:
        if dialog[0]["role"] != "system":
            dialog = [
                {
                    "role": "system",
                    "content": SYSTEM_PROMPT,
                }
            ] + dialog
        dialog_tokens = [tokenizer(
            f"{B_INST} {B_SYS}{(dialog[0]['content']).strip()}{E_SYS}{(dialog[1]['content']).strip()} {E_INST}",
            return_tensors="pt",
            add_special_tokens=True
        ).input_ids.to(runtimeFlag)]
        for i in range(2, len(dialog), 2):
            user_tokens = tokenizer(
                f"{B_INST} {(dialog[i+1]['content']).strip()} {E_INST}",
                return_tensors="pt",
                add_special_tokens=True
            ).input_ids.to(runtimeFlag)
            assistant_w_eos = dialog[i]['content'].strip() + tokenizer.eos_token
            assistant_tokens = tokenizer(
                            assistant_w_eos,
                            return_tensors="pt",
                            add_special_tokens=False
                        ).input_ids.to(runtimeFlag)
            tokens = torch.cat([assistant_tokens, user_tokens], dim=-1)
            dialog_tokens.append(tokens)
        prompt_tokens.append(torch.cat(dialog_tokens, dim=-1))

    input_ids = prompt_tokens[0]
    if len(input_ids[0]) > max_prompt_len:
        return "\n\n **The language model's input limit has been reached. Clear the chat and start afresh!**"

    # print(tokenizer.decode(input_ids[0], skip_special_tokens=True))

    generation_output = model.generate(
        input_ids=input_ids,
        do_sample=True,
        max_new_tokens=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )

    new_tokens = generation_output[0][input_ids.shape[-1]:]
    new_assistant_response = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()

    try:
        json_response = json.loads(new_assistant_response)
        if isinstance(json_response, dict) and 'function' in json_response:
            func_name = json_response['function']
            if func_name in FUNCTIONS:
                func_args = json_response.get('arguments', {})
                try:
                    # Add the JSON response to the last dialog as assistant's response
                    dialogs[-1].append({
                        "role": "assistant",
                        "content": new_assistant_response
                    })

                    with output_log:
                        clear_output()
                        for message in dialog_history:
                            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

                    func_result = FUNCTIONS[func_name](**func_args)

                    # print(f"Calling function {func_name} with arguments {func_args}")

                    # Trim the function response to be no more than 3000 characters
                    func_result = func_result[:3000]

                    response = f"Here is the response from that function call: {func_result}. \n\nUse this information, if relevant, to assist in responding to my last question. Do not call a function."
                    # print(response)

                    # Add the function result as user's response to the last dialog
                    dialogs[-1].append({
                        "role": "user",
                        "content": response
                    })

                    with output_log:
                        clear_output()
                        for message in dialog_history:
                            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

                    return generate_response(dialogs, temperature, top_p, logprobs)
                except TypeError as e:
                    return f"An error occurred when calling the function: {e}. Check your arguments."
                except Exception as e:
                    return f"An unexpected error occurred: {e}"
    except json.JSONDecodeError:
        # print(f"Failed to parse the following as JSON: {new_assistant_response}")
        pass

    return new_assistant_response

In [75]:
def print_wrapped(text):
    # Regular expression pattern to detect code blocks
    code_pattern = r'```(.+?)```'
    matches = list(re.finditer(code_pattern, text, re.DOTALL))

    if not matches:
        # If there are no code blocks, display the entire text as Markdown
        display(Markdown(text))
        return

    start = 0
    for match in matches:
        # Display the text before the code block as Markdown
        before_code = text[start:match.start()].strip()
        if before_code:
            display(Markdown(before_code))

        # Display the code block
        code = match.group(0).strip()  # Extract code block
        display(Markdown(code))  # Display code block

        start = match.end()

    # Display the text after the last code block as Markdown
    after_code = text[start:].strip()  # Text after the last code block
    if after_code:
        display(Markdown(after_code))


def grab_and_shorten_text(max_doc_length):

    uploaded = files.upload()

    file_name = list(uploaded.keys())[0]

    # Check the file extension
    if file_name.endswith('.txt'):
        text = uploaded[file_name].decode()
    elif file_name.endswith('.pdf'):
        pdf_bytes = io.BytesIO(uploaded[file_name])
        text = extract_text(pdf_bytes)
    else:
        raise ValueError('Unsupported file type. Please upload a .txt or .pdf file.')

    with alert_out:
        clear_output()  # Clear the previous alert
        print("Shortening the text...")

    tokens = tokenizer.encode(text, truncation=True, max_length=max_doc_length, return_tensors='pt')

    shortened_text = tokenizer.decode(tokens[0], skip_special_tokens=True)

    return file_name, shortened_text

dialog_history = [{"role": "system", "content": SYSTEM_PROMPT}]

button = widgets.Button(description="Send")
upload_button = widgets.Button(description="Upload .txt or .pdf")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

output_log = widgets.Output()

# Define the 'Send' button click event handler
def on_button_clicked(b):
    user_input = text.value
    dialog_history.append({"role": "user", "content": user_input})

    text.value = ''

    button.description = 'Processing...'

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

    assistant_response = generate_response([dialog_history])

    button.description = 'Send'

    dialog_history.append({"role": "assistant", "content": assistant_response})

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

button.on_click(on_button_clicked)

# Create an output widget for alerts
alert_out = widgets.Output()

# Define the 'Upload' button click event handler
def on_upload_button_clicked(b):

    file_name, uploaded_text = grab_and_shorten_text(max_doc_length)

    with alert_out:
        clear_output()  # Clear the previous alert
        print(f"Upload successful: {file_name}, processing the file...")

    user_input = f"Uploaded document [{file_name}]: {uploaded_text}"
    dialog_history.append({"role": "user", "content": user_input})

    time.sleep(0.1)  # slight delay to ensure order

    assistant_input = f"You have uploaded text from {file_name}"
    dialog_history.append({"role": "assistant", "content": assistant_input})

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

    with alert_out:
        clear_output()  # Clear the previous alert
        # print(f"File processing completed.")

upload_button.on_click(on_upload_button_clicked)

clear_button = widgets.Button(description="Clear Chat")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

def on_clear_button_clicked(b):
    # Clear the dialog history
    dialog_history.clear()
    # Add back the initial system prompt
    dialog_history.append({"role": "system", "content": SYSTEM_PROMPT})
    # Clear the output log
    with output_log:
        clear_output()

clear_button.on_click(on_clear_button_clicked)

def save_chat(b):
    # Serialize the chat history into a JSON string
    chat_json = json.dumps(dialog_history)

    # Write the chat history to a temporary file
    with open('chat_history.json', 'w') as f:
        f.write(chat_json)

    # Download the file
    files.download('chat_history.json')

save_button = widgets.Button(description="Save Chat")
save_button.on_click(save_chat)

In [76]:
# Define the function to upload chat
def upload_chat(b):
    # Upload the file
    uploaded = files.upload()

    # Get the file name
    file_name = list(uploaded.keys())[0]

    # Ensure the file is a .json file
    if not file_name.endswith('.json'):
        print('Error: Incorrect file type. Please upload a .json file.')
        return

    # Load the content of the file
    chat_data = uploaded[file_name].decode()

    # Load the JSON data from the file
    try:
        global dialog_history
        dialog_history = json.loads(chat_data)
    except json.JSONDecodeError:
        print('Error: File is not in the correct format. Please upload a properly formatted .json file.')
        return

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

# Create the upload button and set the on_click event handler
upload_chat_button = widgets.Button(description="Upload Chat")
upload_chat_button.on_click(upload_chat)


In [77]:
from ipywidgets import HBox, VBox

# Create the title with Markdown
title = f"#Trelis Chat\n (uploaded files will be shortened to {max_doc_words} words)\n" + "\n"

# Assuming that output_log, alert_out, and text are other widgets or display elements...
first_row = HBox([button, clear_button, upload_button])  # Arrange these buttons horizontally
second_row = HBox([save_button, upload_chat_button])  # Arrange these buttons horizontally

# Arrange the two rows of buttons and other display elements vertically
layout = VBox([output_log, alert_out, text, first_row, second_row])

# Chat

In [78]:
display(Markdown(title))
display(layout)

#Trelis Chat
 (uploaded files will be shortened to 2304 words)



VBox(children=(Output(), Output(), Textarea(value='', layout=Layout(width='800px')), HBox(children=(Button(des…