# *About Jupyter Code Llama*
---
**Jupyter Code Llama**

A Chat Assistant built on Llama 2.

**Jupyter Code Llama PRO**
- Save and re-load chats.
- Upload pdf or text files for analysis.
- Get access [here](https://buy.stripe.com/bIYbJu09v6XN6wE7sG).

Copyright © 2023 Trelis LTD. Commercial License.

Find Trelis on [HuggingFace](https://huggingface.co/Trelis) and [YouTube](https://www.youtube.com/@TrelisResearch).

**Disclaimers:**
- Language models can generate false/inaccurate results.
- This notebook installs software, including llama.cpp on your computer. There can be differences in operating systems that result in issues arising from installation or attempted installations.
- By running this notebook, users accept they are doing so at their sole risk.
- Report any bugs to ronan [at] trelis [dot] com

## Setup and Installation

[Full instructions](https://github.com/TrelisResearch/install-guides/blob/main/jupyter-lab-setup.md)

In [1]:
!pip install requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Install Llama.cpp (only need to do this once)
The instructions below are for Macs with an M1 chip.
For other operating systems, comment out those cells and get instructions [here](https://github.com/TrelisResearch/llamacpp-install-basics/blob/main/instructions.md).

In [2]:
# Set the SYSTEM PROMPT
DEFAULT_SYSTEM_PROMPT = 'You are a helpful pair-coding assistant.'
SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT

print(SYSTEM_PROMPT)

You are a helpful pair-coding assistant.


In [3]:
# Download the model file
model_name = 'TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q2_K.gguf'

In [4]:
pure_name = model_name.split('/')[-1]
print(pure_name)

codellama-7b-instruct.Q2_K.gguf


In [5]:
parts = model_name.split('/')
model_path = f"{parts[0]}/{parts[1]}"

print(model_path)

TheBloke/CodeLlama-7B-Instruct-GGUF


In [6]:
import os
from IPython.display import display, Markdown

os_choice = '' # Declare the variable outside the function so it's available in all cells

def initialize_os_choice():
    global os_choice

    # Ask the user to select their Operating System
    os_choice = input("Select your Operating System: \n'a' for Mac, \n'b' for Mac with M1 chip, or \n'c' Windows\nYour choice: ").lower()

    # Check the user's response
    if os_choice == 'b':
        print("You chose Mac with M1 chip.")
        process_os_choice()
    elif os_choice == 'a':
        print("You chose Mac (not M1).")
        process_os_choice()
    elif os_choice == 'c':
        print("You chose Windows.")
        display(Markdown("[Click here for llama.cpp install instructions for Windows. Make sure to clone llama.cpp into the same folder as this jupyter notebook](https://github.com/TrelisResearch/llamacpp-install-basics/blob/main/instructions.md)"))
    else:
        print("Invalid input. Please enter a, b, or c based on your Operating System.")

def process_os_choice():
    """Process the choice of OS made by the user."""
    
    if not os.path.exists('llama.cpp'):
        print("Cloning llama.cpp...")
        !git clone https://github.com/ggerganov/llama.cpp
        %cd llama.cpp

        if os_choice == 'a': # Mac (not M1)
            print("Compiling for Mac...")
            !make
            print("Compilation completed!")
        elif os_choice == 'b': # Mac with M1 chip
            print("Compiling for Mac with M1 chip...")
            !LLAMA_METAL=1 make
            print("Compilation completed!")
        else :
            print("You need to manually install ")
        
        %cd ../

    else:
        print("llama.cpp has already been cloned into this directory!")

initialize_os_choice()

Select your Operating System: 
'a' for Mac, 
'b' for Mac with M1 chip, or 
'c' Windows
Your choice:  b


You chose Mac with M1 chip.
llama.cpp has already been cloned into this directory!


In [7]:
import os

%cd llama.cpp

if not os.path.exists(pure_name):
    !wget https://huggingface.co/{model_name}
else:
    print(f"{pure_name} already exists!")

%cd ../

/Users/ronanmcgovern/jupyter_code_llama/llama.cpp
codellama-7b-instruct.Q2_K.gguf already exists!
/Users/ronanmcgovern/jupyter_code_llama


In [8]:
!pip install transformers #for the tokenizer so we can shorten text.
!pip install torch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [9]:
!pip install PyPDF2
!pip install ipywidgets
!pip install jupyterlab_widgets


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [10]:
# Import necessary libraries if not already imported
import ipywidgets as widgets
from IPython.display import display

# Define the radio button widget with default value
radio = widgets.RadioButtons(
    options=['High Speed (512 tokens)', 'Long Context Length (4096 tokens)'],
    value='Long Context Length (4096 tokens)',  # Set default value
    description='Preference:',
    disabled=False,
)

# Set the default value for context_length to High Speed
context_length = 4096
max_doc_length = int(0.75 * context_length)
max_doc_tokens = max_doc_length
n_predict = int(0.2 * context_length)

def initialize_preference():
    global context_length, max_doc_length, max_doc_tokens, n_predict
    
    # Ask the user to choose their preference
    choice = input("Select your preference: \n'a' for High Speed (512 tokens), \n'b' for Long Context Length (4096 tokens)\nYour choice: ").lower()

    # Check the user's response
    if choice == 'a':
        context_length = 512
        print("You chose High Speed (512 tokens).")
    elif choice == 'b':
        context_length = 4096
        print("You chose Long Context Length (4096 tokens).")
    else:
        print("Invalid input. Please enter a or b based on your preference.")
    
    # If a valid choice was made, update other related values
    if context_length:
        max_doc_length = int(0.75 * context_length) # Makes inference faster
        max_doc_tokens = int(0.75 * max_doc_length)
        n_predict = int(0.2 * context_length)

initialize_preference()

Select your preference: 
'a' for High Speed (512 tokens), 
'b' for Long Context Length (4096 tokens)
Your choice:  a


You chose High Speed (512 tokens).


In [11]:
import os
import subprocess
import threading

def start_server():
    # Change directory using Python's os module
    os.chdir('llama.cpp')
    
    base_command = f"./server -m {pure_name} -c {context_length} --port 8081"
    
    # If the user has a Mac with M1 chip (assuming os_choice is set to 'b' for this option)
    if os_choice == 'b':
        command = f"{base_command} -ngl 48"
    else:
        command = base_command
    
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = process.communicate()
    
    print(out.decode())
    if err:
        print(err.decode())

    # Change back to the parent directory using Python
    os.chdir('../')

thread = threading.Thread(target=start_server)
thread.start()

## Set up the User Interface

In [12]:
# !pip install --upgrade pip

In [13]:
from IPython.display import display, HTML, clear_output, Markdown, FileLink
import textwrap, json
import ipywidgets as widgets
import re, time
# from google.colab import files
import io
from PyPDF2 import PdfReader

In [14]:
import PyPDF2
# if this fails you may need to restart the kernel: Menu -> Kernel -> Restart

In [15]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

In [16]:
from transformers import AutoTokenizer
import requests
import json

tokenizer_path = 'Trelis/Llama-2-7b-chat-hf-function-calling-GGML'

tokenizer=AutoTokenizer.from_pretrained(tokenizer_path)

def generate_response(dialogs, n_predict):
    # Endpoint for the completion
    url = 'http://127.0.0.1:8081/completion'
    
    prompt_tokens = []

    for dialog in dialogs:
        if dialog[0]["role"] != "system":
            dialog = [{
                "role": "system",
                "content": SYSTEM_PROMPT,
            }] + dialog
    
        dialog_string = f"{B_INST} "
        for i, entry in enumerate(dialog):
            role_token = B_SYS if entry["role"] == "system" else ""
            end_token = E_SYS if entry["role"] == "system" else E_INST
            dialog_string += f"{role_token}{entry['content'].strip()}{end_token} "
        
        prompt_tokens.append(dialog_string.strip())

    # Combine all dialogs into one string
    prompt_text = " ".join(prompt_tokens)
    
    # Tokenize and check the length
    input_ids = tokenizer.encode(prompt_text, truncation=True, max_length=context_length)
    
    if len(input_ids) > context_length:
        print("\n\n **The language model's input limit has been reached. Clear the chat and start afresh!**")
        return
    
    # Parameters for the request
    params = {
        "prompt": prompt_text,
        "n_predict": n_predict,
        "n_ctx": context_length
    }
    
    # Send the POST request
    response = requests.post(url, json=params)

    if response.status_code != 200:
        return f"Error: {response.status_code}, {response.text}"

    new_assistant_response = json.loads(response.text)['content']

    return new_assistant_response

In [18]:
def print_wrapped(text):
    # Regular expression pattern to detect code blocks
    code_pattern = r'```(.+?)```'
    matches = list(re.finditer(code_pattern, text, re.DOTALL))

    if not matches:
        # If there are no code blocks, display the entire text as Markdown
        display(Markdown(text))
        return

    start = 0
    for match in matches:
        # Display the text before the code block as Markdown
        before_code = text[start:match.start()].strip()
        if before_code:
            display(Markdown(before_code))

        # Display the code block
        code = match.group(0).strip()  # Extract code block
        display(Markdown(code))  # Display code block

        start = match.end()

    # Display the text after the last code block as Markdown
    after_code = text[start:].strip()  # Text after the last code block
    if after_code:
        display(Markdown(after_code))

dialog_history = [{"role": "system", "content": SYSTEM_PROMPT}]

button = widgets.Button(description="Send")
upload_button = widgets.Button(description="Upload .txt or .pdf")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

output_log = widgets.Output()

from functools import partial

def on_button_clicked(b):
    user_input = text.value
    dialog_history.append({"role": "user", "content": user_input})

    text.value = ''

    # Change button description and color, and disable it
    button.description = 'Processing...'
    button.style.button_color = '#ff6e00'  # Use hex color codes for better color choices
    button.disabled = True  # Disable the button when processing

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

    assistant_response = generate_response([dialog_history], n_predict)

    # Re-enable the button, reset description and color after processing
    button.description = 'Send'
    button.style.button_color = 'lightgray'
    button.disabled = False

    dialog_history.append({"role": "assistant", "content": assistant_response})

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')

button.on_click(on_button_clicked)

# Create an output widget for alerts
alert_out = widgets.Output()

clear_button = widgets.Button(description="Clear Chat")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

def on_clear_button_clicked(b):
    # Clear the dialog history
    dialog_history.clear()
    # Add back the initial system prompt
    dialog_history.append({"role": "system", "content": SYSTEM_PROMPT})
    # Clear the output log
    with output_log:
        clear_output()

clear_button.on_click(on_clear_button_clicked)


In [19]:
from IPython.display import display, HTML
from ipywidgets import HBox, VBox

# Create the title with HTML
title = f"<h1 style='color: #ff6e00;'>Jupyter Code Llama 🦙 💻</h1> <p>(uploaded files will be shortened to {max_doc_tokens} tokens)</p>"

# Assuming that output_log, alert_out, and text are other widgets or display elements...
first_row = HBox([button, clear_button])  # Arrange these buttons horizontally

# Arrange the two rows of buttons and other display elements vertically
layout = VBox([output_log, alert_out, text, first_row])

In [20]:
display(HTML(title))  # Use HTML function to display the title
display(layout)

VBox(children=(Output(), Output(), Textarea(value='', layout=Layout(width='800px')), HBox(children=(Button(des…