## Start Jupyter Lab

To run at the first time, you need to set up the jupyter lab. Open a terminal, and run the below code:

```zsh
pip install venv
```


then,

```zsh
python -m venv myvenv
```

Activate the virtual environment:

On Mac, run
```zsh
source /Users/{username}/.../moho_bot/.myvenv/bin/activate
```

Install jupyterlab:
```zsh
pip install jupyterlab
```

Install ipykernel:
```zsh
pip install ipykernel
```

Make the virtual environment available to Jupyter Labs with:
```zsh
python -m ipykernel install --user --name=myvenv
```

Start Jupyter with the command:
```zsh
jupyter lab
```

## Prerequisites

Change the directory before run the rest of the program:

In [1]:
llamacpp_directory = '/Users/astridz/Documents/llama.cpp'
local_directory = '/Users/astridz/Documents/moho_bot'
docker_directory = '/Users/astridz'
check_model_directory = '/Users/astridz/Documents'

Install other essential packages (by uncomment the lines), other packages should be installed in set-up step:

In [2]:
# %pip install -q ipywidgets 
# %pip install -q PyPDF2
# %pip install -q langchain
# %pip install -q -r requirements.txt

# Initial Epsilla Vector Database

You should already install docker in your laptop. Run these 2 commands to install the Epsilla vector database docker image on your personal computer:

https://github.com/epsilla-cloud/vectordb

In [3]:
# killall Docker
# open -a Docker

Open docker:

In [4]:
!open -a Docker

Connect with docker:

In [5]:
import os
# <!-- docker installed in "/Users/astridz/Documents/Moho_Bot/myvenv/lib/python3.10/site-packages/docker" -->
#service docker restart
os.chdir(docker_directory)
!docker pull epsilla/vectordb
!docker run --pull=always -d -p 8888:8888 epsilla/vectordb
os.chdir(local_directory)

Using default tag: latest
latest: Pulling from epsilla/vectordb

[1Bb88252ef: Already exists 
[1Bbd3bcbe5: Pulling fs layer 
[1Bed6c869c: Pulling fs layer 
[1B0303ba53: Pull complete 255kB/1.255kBB[3A[2K[2A[2K[3A[2K[2A[2K[3A[2K[2A[2K[3A[2K[2A[2K[3A[2K[3A[2K[2A[2K[2A[2K[2A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2

## Train vector space

source code: https://api.python.langchain.com/en/latest/_modules/langchain/vectorstores/epsilla.html

In [33]:
from langchain.document_loaders import TextLoader 
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import Epsilla
from sentence_transformers import SentenceTransformer
from typing import List
from glob import glob
from pyepsilla import vectordb
from langchain.vectorstores import Chroma

In [34]:
def reading(fileName):
    f = open(fileName, "r")
    content = f.read() 
    f.close()
    return content

# Local embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
class LocalEmbeddings():
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return model.encode(texts).tolist()
embeddings = LocalEmbeddings()


files = glob("./Documents_collection/*")
splitted_documents = []
splitter = RecursiveCharacterTextSplitter( separators=[" ", ",", "\n"],chunk_size=1000, chunk_overlap=200)

for file in files:
    loader = TextLoader(file)
    documents = loader.load()
    split_docs = splitter.split_documents(documents)
    # print("Splitted document chunk size for current file:", len(split_docs))
    splitted_documents.extend(split_docs)

Initialize vector database:

In [35]:
client = vectordb.Client()

[INFO] Connected to localhost:8888 successfully.


Train vectordb:

In [9]:

try:
    vector_store = Epsilla.from_documents(
        splitted_documents,
        embeddings,
        client,
        db_path="/tmp/localchatdb",
        db_name="LocalChatDB",
        collection_name="LocalChatCollection"
    )
except Exception as e:
    print("An error occurred:", e)

# Install llama.cpp and llama2 model

In [36]:
# Download the model file
model_name = 'TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf'
pure_name = model_name.split('/')[-1]
parts = model_name.split('/')
model_path = f"{parts[0]}/{parts[1]}"
print("Your model name is", pure_name)

Your model name is llama-2-7b-chat.Q4_K_M.gguf


In [37]:
os.chdir(check_model_directory)
if not os.path.exists('llama.cpp'):
    print("Cloning llama.cpp...")
    !git clone https://github.com/ggerganov/llama.cpp
    %cd llama.cpp
    print("Compiling for Mac with M1 chip...")
    !LLAMA_METAL=1 make
    print("Compilation completed!")
    %cd ../
    %cd Moho.Bot
else:
    print("llama.cpp has already been cloned into this directory!")
    %cd MOHO_BOT
print("current directory is ", os.getcwd())

llama.cpp has already been cloned into this directory!
/Users/astridz/Documents/moho_bot
current directory is  /Users/astridz/Documents/moho_bot


In [38]:
# # set directory to llama.cpp
os.chdir(llamacpp_directory)
if not os.path.exists(pure_name):
    !wget https://huggingface.co/{model_name}
else:
    print(f"{pure_name} already exists!")
%cd ../Moho_bot

llama-2-7b-chat.Q4_K_M.gguf already exists!
/Users/astridz/Documents/moho_bot


## Set up the User Interface

In [39]:
from IPython.display import display, HTML, clear_output, Markdown, FileLink
import textwrap, json
import ipywidgets as widgets
import re, time
import io
import PyPDF2
from PyPDF2 import PdfReader
from functools import partial
import os
import threading
import subprocess
from IPython.display import display, HTML
from ipywidgets import HBox, VBox

In [40]:
def print_wrapped(text):
    # Regular expression pattern to detect code blocks
    code_pattern = r'```(.+?)```'
    matches = list(re.finditer(code_pattern, text, re.DOTALL))
    if not matches:
        # If there are no code blocks, display the entire text as Markdown
        display(Markdown(text))
        return
    start = 0
    for match in matches:
        # Display the text before the code block as Markdown
        before_code = text[start:match.start()].strip()
        if before_code:
            display(Markdown(before_code))
        # Display the code block
        code = match.group(0).strip()  # Extract code block
        display(Markdown(code))  # Display code block
        start = match.end()
    # Display the text after the last code block as Markdown
    after_code = text[start:].strip()  # Text after the last code block
    if after_code:
        display(Markdown(after_code))

In [None]:
button = widgets.Button(description="Send")
usertext = widgets.Textarea(layout=widgets.Layout(width='800px'))
output_log = widgets.Output()

['./main', '-m', 'llama-2-7b-chat.Q4_K_M.gguf', '--multiline-input', '-ngl', '8', '-p', "[INST]<<SYS>>\nAnswer the user question followed the rules:     1. Do not copy the context in your answer. Try to understand the context and rephrase them.     2. Please don't make things up or say things not mentioned in the Context.     3. You can trust the context.     4. Give a short response! Your answer should be in 200 words.    The context is: conducted in Spanish unless indicated otherwise.Students contemplating study abroad in Spain or Latin America are encouraged to elect a Spanish course in the first semester of their first year.Course OfferingsSPAN-101  Elementary SpanishFall and Spring.Credits: 4An interactive introduction to the Spanish language and Hispanic cultures.This course emphasizes communication through extensive oral practice in class in order to provide students with an immersion experience.Covers basic grammar structures to equip students to communicate about personal info

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Log start
main: build = 1376 (9e24cc6)
main: built with  for unknown
main: seed  = 1701194798
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.




['./main', '-m', 'llama-2-7b-chat.Q4_K_M.gguf', '--multiline-input', '-ngl', '8', '-p', "[INST]<<SYS>>\nAnswer the user question followed the rules:     1. Do not copy the context in your answer. Try to understand the context and rephrase them.     2. Please don't make things up or say things not mentioned in the Context.     3. You can trust the context.     4. Give a short response! Your answer should be in 200 words.    The context is: conducted in Spanish unless indicated otherwise.Students contemplating study abroad in Spain or Latin America are encouraged to elect a Spanish course in the first semester of their first year.Course OfferingsSPAN-101  Elementary SpanishFall and Spring.Credits: 4An interactive introduction to the Spanish language and Hispanic cultures.This course emphasizes communication through extensive oral practice in class in order to provide students with an immersion experience.Covers basic grammar structures to equip students to communicate about personal info

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Log start
main: build = 1376 (9e24cc6)
main: built with  for unknown
main: seed  = 1701194908
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.




['./main', '-m', 'llama-2-7b-chat.Q4_K_M.gguf', '--multiline-input', '-ngl', '8', '-p', "[INST]<<SYS>>\nAnswer the user question followed the rules:     1. Do not copy the context in your answer. Try to understand the context and rephrase them.     2. Please don't make things up or say things not mentioned in the Context.     3. You can trust the context.     4. Give a short response! Your answer should be in 200 words.    The context is: conducted in Spanish unless indicated otherwise.Students contemplating study abroad in Spain or Latin America are encouraged to elect a Spanish course in the first semester of their first year.Course OfferingsSPAN-101  Elementary SpanishFall and Spring.Credits: 4An interactive introduction to the Spanish language and Hispanic cultures.This course emphasizes communication through extensive oral practice in class in order to provide students with an immersion experience.Covers basic grammar structures to equip students to communicate about personal info

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Log start
main: build = 1376 (9e24cc6)
main: built with  for unknown
main: seed  = 1701194942
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.




## Vector

In [42]:
from langchain.vectorstores import Epsilla
from pyepsilla import vectordb
from sentence_transformers import SentenceTransformer
from typing import List
from langchain import PromptTemplate
from langchain import LLMChain
from langchain.llms import CTransformers
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

In [43]:
model = SentenceTransformer('all-MiniLM-L6-v2')
class LocalEmbeddings():
    def embed_query(self, text: str) -> List[float]:
        return model.encode(text).tolist()
    
embeddings = LocalEmbeddings()
# Connect to Epsilla as knowledge base.
client = vectordb.Client()
vector_store = Epsilla(
    client,
    embeddings,
    db_path="/tmp/localchatdb",
    db_name="LocalChatDB"
)

vector_store.use_collection("LocalChatCollection")

[INFO] Connected to localhost:8888 successfully.


# Run Chatbot

In [44]:
DEFAULT_PROMPT = '''You are a helpful question answer assistant.'''
# SYSTEM_PROMPT ='''Answer the user question followed the rules:1. Do not copy the context in your answer. Try to understand the Context and rephrase them. 2. Please don't make things up or say things not mentioned in the Context. 3. You can trust the context. 4. Give a short response! Your answer should be in 200 words. The context is: '''

In [45]:
#initialize the dialog
os.chdir(local_directory)
dialog_history = [{"role": "system", "content": DEFAULT_PROMPT}]

In [46]:
def generate_response(process, output_widget):
    while True:
        output = process.stdout.readline()
        print(output)
        assistant_response = ""
        if output:
            # assistant_response = output.strip()
            # dialog_history.append({"role": "assistant", "content": assistant_response})
            # Update the output widget
            if '[/INST]' in output:
                inst_index = output.find('[/INST]')
                # Check if [/INST] is found in the text
                if inst_index != -1:
                    # Print everything after [/INST]
                    assistant_response = output[inst_index + len('[/INST]'):].strip()
                    dialog_history.append({"role": "assistant", "content": assistant_response})
                    # Update the output widget
                    with output_widget:
                        print_wrapped(f'**{"assistant".capitalize()}**: {assistant_response}\n')
            elif '<</SYS>>' or '[INST]' or 'Answer the user question followed the rules:' in output :
                print("skip")
                continue
            
            else:
                if output == '':
                    continue
                else:
                    assistant_response = output.strip()
                    dialog_history.append({"role": "assistant", "content": assistant_response})
                    with output_widget:
                        print_wrapped(f'**{assistant_response}')

            # if '[INST]' or '<>' in output :
            #     print_wrapped(f'**{"assistant".capitalize()}**: {"skip"}\n')
            #     continue
            # if '[/INST]' in output:
            #     inst_index = output.find('[/INST]')
            #     # Check if [/INST] is found in the text
            #     if inst_index != -1:
            #         # Print everything after [/INST]
            #         assistant_response = output[inst_index + len('[/INST]'):].strip()
            #         dialog_history.append({"role": "assistant", "content": assistant_response})
            #         # Update the output widget
            #         with output_widget:
            #             print_wrapped(f'**{"assistant".capitalize()}**: {assistant_response}\n')
                
            # else:
                # with output_widget:
                #         print_wrapped(f'**{"assistant".capitalize()}**: {"Processing the context..."}\n')

            # dialog_history.append({"role": "assistant", "content": assistant_response})
            # # Update the output widget
            # with output_widget:
            #     print_wrapped(f'**{"assistant".capitalize()}**: {assistant_response}\n')
                
        # if output:
        # #     if '[INST]' or '<>' in output :
        # #         continue
        #     if '[/INST]' in output:
        #         inst_index = output.find('[/INST]')
        #         # Check if [/INST] is found in the text
        #         if inst_index != -1:
        #             # Print everything after [/INST]
        #             assistant_response = output[inst_index + len('[/INST]'):].strip()
        #     else:
        #         assistant_response = f"\n\n{output.strip()}[INST]"
            
            # dialog_history.append({"role": "assistant", "content": + assistant_response})
            
        #     if assistant_response:
        #         # Update the output widget
        #         with output_widget:
        #             print_wrapped(f'**{"assistant".capitalize()}**: {assistant_response}\n')
        # else:
        #     break
        process.stdout.close()

In [50]:
#when the user start to use model
def on_button_clicked(b):
    question = usertext.value
    dialog_history.append({"role": "user", "content": question})
    usertext.value = ''

    # Change button description and color, and disable it
    button.description = 'Processing...'
    button.style.button_color = '#ff6e00'  # Use hex color codes for better color choices
    button.disabled = True  # Disable the button when processing

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')
            
    context = ''.join(map(lambda doc: doc.page_content, vector_store.similarity_search(question, k = 1)))
    
    # for message in dialog_history:
    #     role = message["role"]
    #     content = message["content"]
    #     if role == "system":
    SYSTEM_PROMPT =f'''Answer the user question followed the rules: \
    1. Do not copy the context in your answer. Try to understand the context and rephrase them. \
    2. Please don't make things up or say things not mentioned in the Context. \
    3. You can trust the context. \
    4. Give a short response! Your answer should be in 200 words.\
    The context is: {context}.'''
    # prompt_template = f'''[INST]<<SYS>>\n{SYSTEM_PROMPT}{context}\n<</SYS>>\n\n{question}[/INST]'''
    # prompt_template = f'''[INST]<<SYS>>\n{SYSTEM_PROMPT}{context}\n<</SYS>>\n\n{question}[/INST]'''
    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
    SYSTEM_PROMPT = B_SYS + SYSTEM_PROMPT +  E_SYS
    template = B_INST + SYSTEM_PROMPT + question + E_INST
    # template = B_INST + SYSTEM_PROMPT + question + E_INST

    # Start the subprocess and the threading to handle its output
    if (os.getcwd() != llamacpp_directory):
        os.chdir(llamacpp_directory)
    args = ['./main', '-m', pure_name, '-c', '1024', '--multiline-input', '-ngl', '8', '-p', template]
    print(args)
    process = subprocess.Popen(args, stdout=subprocess.PIPE, text=True)
    # Start the thread that will handle the subprocess output
    output_thread = threading.Thread(target=generate_response, args=(process, output_log))
    output_thread.start()

    # Wait for the subprocess and thread to finish
    process.wait()
    output_thread.join()

    # Re-enable the button, reset description and color after processing
    button.description = 'Send'
    button.style.button_color = 'lightgray'
    button.disabled = False

In [48]:
button.on_click(on_button_clicked)
alert_out = widgets.Output()
clear_button = widgets.Button(description="Clear Chat")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

quit_button = widgets.Button(description="Force Quit")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

def on_clear_button_clicked(b):
    # Clear the dialog history
    dialog_history.clear()
    # Add back the initial system prompt
    dialog_history.append({"role": "system", "content": DEFAULT_PROMPT})
    # Clear the output log
    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')
            
clear_button.on_click(on_clear_button_clicked)
# Create the title with HTML
title = f"<h1 style='color: #ff6e00;'>MohoBot 🦙</h1> <p> Enter your question! </p>"
with output_log:
    clear_output()
# Assuming that output_log, alert_out, and text are other widgets or display elements...
first_row = HBox([button, clear_button, quit_button])  # Arrange these buttons horizontally
# Arrange the two rows of buttons and other display elements vertically
layout = VBox([output_log, alert_out, usertext, first_row])
display(HTML(title))  # Use HTML function to display the title
display(layout)

VBox(children=(Output(), Output(), Textarea(value='', layout=Layout(width='800px')), HBox(children=(Button(des…