## Start Jupyter Lab

To run at the first time, you need to set up the jupyter lab. Open a terminal, and run the below code:

```zsh
pip install venv
```


then,

```zsh
python -m venv myvenv
```

Activate the virtual environment:

On Mac, run
```zsh
source /Users/{username}/.../moho_bot/.myvenv/bin/activate
```

Install jupyterlab:
```zsh
pip install jupyterlab
```

Install ipykernel:
```zsh
pip install ipykernel
```

Make the virtual environment available to Jupyter Labs with:
```zsh
python -m ipykernel install --user --name=myvenv
```

Start Jupyter with the command:
```zsh
jupyter lab
```

## Prerequisites

Change the directory before run the rest of the program:

In [1]:
llamacpp_directory = '/Users/astridz/Documents/llama.cpp'
local_directory = '/Users/astridz/Documents/moho_bot'
docker_directory = '/Users/astridz'

Install other essential packages (by uncomment the lines), other packages should be installed in set-up step:

In [2]:
%pip install -q ipywidgets 
%pip install -q PyPDF2
%pip install -q langchain
# %pip install -q -r requirements.txt

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Initial Epsilla Vector Database

You should already install docker in your laptop. Run these 2 commands to install the Epsilla vector database docker image on your personal computer:

https://github.com/epsilla-cloud/vectordb

import os
<!-- docker installed in "/Users/astridz/Documents/Moho_Bot/myvenv/lib/python3.10/site-packages/docker" -->
os.chdir(docker_directory)
!open -a Docker
!docker pull epsilla/vectordb
!docker run --pull=always -d -p 8888:8888 epsilla/vectordb
os.chdir(local_directory)

## Train vector space

In [3]:
from langchain.document_loaders import TextLoader 
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import Epsilla
from sentence_transformers import SentenceTransformer
from typing import List
from glob import glob
from langchain.vectorstores import Epsilla
from pyepsilla import vectordb
from langchain.vectorstores import Chroma

In [4]:
def reading(fileName):
    f = open(fileName, "r")
    content = f.read() 
    f.close()
    return content
# Local embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
class LocalEmbeddings():
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return model.encode(texts).tolist()
        
embeddings = LocalEmbeddings()
files = glob("./Documents_collection/*")
splitted_documents = []
splitter = RecursiveCharacterTextSplitter( separators=[" ", ",", "\n"],chunk_size=1000, chunk_overlap=200)

for file in files:
    loader = TextLoader(file)
    documents = loader.load()
    split_docs = splitter.split_documents(documents)
    # print("Splitted document chunk size for current file:", len(split_docs))
    splitted_documents.extend(split_docs)

In [5]:
client = vectordb.Client()

# Connect to Epsilla as knowledge base.
vector_store = Epsilla.from_documents(
    splitted_documents,
    embeddings,
    client,
    db_path="/tmp/localchatdb",
    db_name="LocalChatDB",
    collection_name="LocalChatCollection"
)

[INFO] Connected to localhost:8888 successfully.


# Install llama.cpp and llama2 model

In [6]:
# Download the model file
model_name = 'TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf'
pure_name = model_name.split('/')[-1]
parts = model_name.split('/')
model_path = f"{parts[0]}/{parts[1]}"
print("Your model name is", pure_name)

Your model name is llama-2-7b-chat.Q4_K_M.gguf


In [7]:
# os.chdir(local_directory)
# if not os.path.exists('llama.cpp'):
#     print("Cloning llama.cpp...")
#     !git clone https://github.com/ggerganov/llama.cpp
#     %cd llama.cpp
#     print("Compiling for Mac with M1 chip...")
#     !LLAMA_METAL=1 make
#     print("Compilation completed!")
#     %cd ../
#     %cd Moho.Bot
# else:
#     print("llama.cpp has already been cloned into this directory!")
#     %cd MOHO_BOT
# print("current directory is ", os.getcwd())

In [8]:
# # set directory to llama.cpp
# os.chdir('/Users/astridz/Documents/llama.cpp/')
# if not os.path.exists(pure_name):
#     !wget https://huggingface.co/{model_name}
# else:
#     print(f"{pure_name} already exists!")
# %cd ../Moho_bot

## Set up the User Interface

In [9]:
from IPython.display import display, HTML, clear_output, Markdown, FileLink
import textwrap, json
import ipywidgets as widgets
import re, time
import io
import PyPDF2
from PyPDF2 import PdfReader
from functools import partial
import os
import threading
import subprocess
from IPython.display import display, HTML
from ipywidgets import HBox, VBox

In [10]:

def print_wrapped(text):
    # Regular expression pattern to detect code blocks
    code_pattern = r'```(.+?)```'
    matches = list(re.finditer(code_pattern, text, re.DOTALL))
    if not matches:
        # If there are no code blocks, display the entire text as Markdown
        display(Markdown(text))
        return
    start = 0
    for match in matches:
        # Display the text before the code block as Markdown
        before_code = text[start:match.start()].strip()
        if before_code:
            display(Markdown(before_code))
        # Display the code block
        code = match.group(0).strip()  # Extract code block
        display(Markdown(code))  # Display code block
        start = match.end()
    # Display the text after the last code block as Markdown
    after_code = text[start:].strip()  # Text after the last code block
    if after_code:
        display(Markdown(after_code))

In [None]:
button = widgets.Button(description="Send")
usertext = widgets.Textarea(layout=widgets.Layout(width='800px'))
output_log = widgets.Output()

context:  Students will synthesize many topics learned in prior courses as well as explore new technologies required to complete a specific project. Programming intensive. Applies to requirement(s): Math  Sciences B. Lerner Prereq: COMSC-215 or COMSC-225 . COMSC-322  Operating Systems  Fall and Spring. Credits: 4   An introduction to the issues involved in orchestrating the use of computer resources. Topics include operating system evolution, memory management, virtual memory, resource scheduling, multiprogramming, deadlocks, concurrent processes, protection, and design principles. Course emphasis: understanding the implications of OS design on the programs you run and write (i.e., on their security, performance, etc.). This course is programming intensive . Applies to requirement(s): Math  Sciences B. Lerner, J. McCauley Prereq: COMSC-221 , and either COMSC-211 or COMSC-225 . COMSC-334  Artificial Intelligence  Not Scheduled for This Year. Credits: 4   Artificial Intelligence, as a fi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Log start
main: build = 1376 (9e24cc6)
main: built with  for unknown
main: seed  = 1699988324
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.

assistant:  <s>[INST]<<SYS>>




system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 1024, n_batch = 512, n_predict = -1, n_keep = 0




assistant: Answer the user question followed the rules:1. Do not copy the context in your answer. Try to understand the Context and rephrase them. 2. Please don't make things up or say things not mentioned in the Context. 3. You can trust the context. 4. Give a short response! Your answer should be in 200 words. The context is: Students will synthesize many topics learned in prior courses as well as explore new technologies required to complete a specific project. Programming intensive. Applies to requirement(s): Math  Sciences B. Lerner Prereq: COMSC-215 or COMSC-225 . COMSC-322  Operating Systems  Fall and Spring. Credits: 4   An introduction to the issues involved in orchestrating the use of computer resources. Topics include operating system evolution, memory management, virtual memory, resource scheduling, multiprogramming, deadlocks, concurrent processes, protection, and design principles. Course emphasis: understanding the implications of OS design on the programs you run and wr

assistant: <</SYS>>



assistant: 



assistant: what is operating system in computer science taught?[/INST]  In Computer Science, Operating System (OS) is a course that teaches students about the issues involved in coordinating and managing computer resources. Students will learn about various topics such as:



assistant: 1. OS evolution and history.



assistant: 2. Memory management techniques such as virtual memory.



assistant: 3. Resource scheduling mechanisms for efficient allocation of system resources to multiple tasks and programs.



assistant: 4. Multiprogramming, including the concept of deadlocks and how they can occur in computer systems.



assistant: 5. Process coordination and control, including protection methods to prevent unauthorized access or interference with system resources or programs.



assistant: 6. Design principles for operating systems that prioritize security, performance, and reliability.



assistant: This course is programming intensive, meaning students will apply their knowledge of OS design principles by completing practical projects that demonstrate the implications of different design choices on program behavior and performance. The course applies to requirements in Math Sciences B, specifically Lerner and McCauley's courses, and prepares students for more advanced studies or careers in computer science, software engineering, or related fields.


 [end of text]

llama_print_timings:        load time =   11506.16 ms
llama_print_timings:      sample time =     284.16 ms /   235 runs   (    1.21 ms per token,   826.99 tokens per second)
llama_print_timings: prompt eval time =    7806.28 ms /   644 tokens (   12.12 ms per token,    82.50 tokens per second)
llama_print_timings:        eval time =   26989.43 ms /   234 runs   (  115.34 ms per token,     8.67 tokens per second)
llama_print_timings:       total time =   36620.64 ms
ggml_metal_free: deallocating
Log end


assistant: 
Subprocess has completed.


context:  It is an introduction to computer architecture. Specific topics include assembly language programming, memory, and parallelism. This course is programming intensive.Applies to requirement(s): Math  SciencesL. Ballesteros, J. McCauleyPrereq: COMSC-201, COMSC-205, or COMSC-205PY; and MATH-232. Coreq: COMSC-221L.Advisory: The department recommends, but does not require, that students take COMSC-225 prior to COMSC-221.
COMSC-225  Software Design and Development
Fall and Spring. Credits: 4


Building large software systems introduces new challenges to software development. Appropriate design decisions and programming methodology can make a major difference in developing software that is correct and maintainable. In this course, students will learn techniques and tools that are used to build correct and maintainable software, improving their skills in designing, writing, debugging, and testing software. Topics include object-oriented design, testing, design patterns, and softwareIt

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Log start
main: build = 1376 (9e24cc6)
main: built with  for unknown
main: seed  = 1699988630
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.

assistant:  <s>[INST]<<SYS>>




system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 1024, n_batch = 512, n_predict = -1, n_keep = 0




assistant: Answer the user question followed the rules:1. Do not copy the context in your answer. Try to understand the Context and rephrase them. 2. Please don't make things up or say things not mentioned in the Context. 3. You can trust the context. 4. Give a short response! Your answer should be in 200 words. The context is: It is an introduction to computer architecture. Specific topics include assembly language programming, memory, and parallelism. This course is programming intensive.Applies to requirement(s): Math  SciencesL. Ballesteros, J. McCauleyPrereq: COMSC-201, COMSC-205, or COMSC-205PY; and MATH-232. Coreq: COMSC-221L.Advisory: The department recommends, but does not require, that students take COMSC-225 prior to COMSC-221.



assistant: COMSC-225  Software Design and Development



assistant: Fall and Spring. Credits: 4



assistant: 



assistant: 



assistant: Building large software systems introduces new challenges to software development. Appropriate design decisions and programming methodology can make a major difference in developing software that is correct and maintainable. In this course, students will learn techniques and tools that are used to build correct and maintainable software, improving their skills in designing, writing, debugging, and testing software. Topics include object-oriented design, testing, design patterns, and softwareIt is an introduction to computer architecture. Specific topics include assembly language programming, memory, and parallelism. This course is programming intensive.Applies to requirement(s): Math  SciencesL. Ballesteros, J. McCauleyPrereq: COMSC-201, COMSC-205, or COMSC-205PY; and MATH-232. Coreq: COMSC-221L.Advisory: The department recommends, but does not require, that students take COMSC-225 prior to COMSC-221.



assistant: COMSC-225  Software Design and Development



assistant: Fall and Spring. Credits: 4



assistant: 



assistant: 



assistant: Building large software systems introduces new challenges to software development. Appropriate design decisions and programming methodology can make a major difference in developing software that is correct and maintainable. In this course, students will learn techniques and tools that are used to build correct and maintainable software, improving their skills in designing, writing, debugging, and testing software. Topics include object-oriented design, testing, design patterns, and software



assistant: <</SYS>>



assistant: 



assistant: what is comsc322 prerequirement? [/INST]  COMSC-322 is an advanced computer architecture course that builds upon the concepts introduced in COMSC-201, COMSC-205, or COMSC-205PY, and MATH-232. According to the prerequisites listed in the context, students must have completed one of these courses before enrolling in COMSC-322. The specific prerequisites are:



assistant: * COMSC-201, COMSC-205, or COMSC-205PY: Introduction to Computer Organization and Architecture



assistant: * MATH-232: Discrete Mathematics for Computer Science



assistant: By requiring students to complete these courses before enrolling in COMSC-322, the department aims to provide a solid foundation in computer architecture and mathematics for more advanced study.


 [end of text]

llama_print_timings:        load time =    9184.76 ms
llama_print_timings:      sample time =     457.27 ms /   186 runs   (    2.46 ms per token,   406.76 tokens per second)
llama_print_timings: prompt eval time =    7321.89 ms /   620 tokens (   11.81 ms per token,    84.68 tokens per second)
llama_print_timings:        eval time =   16656.26 ms /   185 runs   (   90.03 ms per token,    11.11 tokens per second)
llama_print_timings:       total time =   24979.49 ms
ggml_metal_free: deallocating
Log end


assistant: 
Subprocess has completed.


## Vector

In [12]:
from langchain.vectorstores import Epsilla
from pyepsilla import vectordb
from sentence_transformers import SentenceTransformer
from typing import List

In [13]:
model = SentenceTransformer('all-MiniLM-L6-v2')
class LocalEmbeddings():
    def embed_query(self, text: str) -> List[float]:
        return model.encode(text).tolist()
    
embeddings = LocalEmbeddings()
# Connect to Epsilla as knowledge base.
client = vectordb.Client()
vector_store = Epsilla(
    client,
    embeddings,
    db_path="/tmp/localchatdb",
    db_name="LocalChatDB"
)

vector_store.use_collection("LocalChatCollection")

[INFO] Connected to localhost:8888 successfully.


# Run Chatbot

In [14]:
DEFAULT_PROMPT = '''You are a helpful question answer assistant.'''
SYSTEM_PROMPT ='''Answer the user question followed the rules:1. Do not copy the context in your answer. Try to understand the Context and rephrase them. 2. Please don't make things up or say things not mentioned in the Context. 3. You can trust the context. 4. Give a short response! Your answer should be in 200 words. The context is: '''

In [15]:
#initialize the dialog
os.chdir(local_directory)
dialog_history = [{"role": "system", "content": DEFAULT_PROMPT}]

In [16]:
def generate_response(process, output_widget):
    while True:
        output = process.stdout.readline()
        print("assistant:", output)
        #reinitialize assistant_response each time
        if process.poll() is not None and output == '':
            print("Subprocess has completed.")
            break 
        # Testing:
        # if output:
        #         # Update the output widget
        #     with output_widget:
        #         print_wrapped(f'{output}\n')
                
        # if output:
        #     if '[INST]' or '<>' in output :
        #         continue
        #     if '[/INST]' in output:
        #         inst_index = output.find('[/INST]')
        #         # Check if [/INST] is found in the text
        #         if inst_index != -1:
        #             # Print everything after [/INST]
        #             assistant_response = output[inst_index + len('[/INST]'):].strip()
        #     else:
        #         assistant_response = f"{output.strip()}"
            
        #     dialog_history.append({"role": "assistant", "content": assistant_response})
            
        #     if assistant_response:
        #         # Update the output widget
        #         with output_widget:
        #             print_wrapped(f'{assistant_response}\n')
        # else:
        #     break
    process.stdout.close()

In [17]:
#when the user start to use model
def on_button_clicked(b):
    question = usertext.value
    dialog_history.append({"role": "user", "content": question})
    usertext.value = ''

    # Change button description and color, and disable it
    button.description = 'Processing...'
    button.style.button_color = '#ff6e00'  # Use hex color codes for better color choices
    button.disabled = True  # Disable the button when processing

    with output_log:
        clear_output()
        for message in dialog_history:
            print_wrapped(f'**{message["role"].capitalize()}**: {message["content"]}\n')
            
    context = ''.join(map(lambda doc: doc.page_content, vector_store.similarity_search(question, k = 2)))
    print("context: ", context)
    prompt_template = f'''<s>[INST]<<SYS>>\n{SYSTEM_PROMPT}{context}\n<</SYS>>\n\n{question}[/INST]'''
    
    # Start the subprocess and the threading to handle its output
    if (os.getcwd() != llamacpp_directory):
        os.chdir(llamacpp_directory)
    args = ['./main', '-m', pure_name, '-c', '1024', '-ngl', '48', '-p', prompt_template]
    process = subprocess.Popen(args, stdout=subprocess.PIPE, text=True)
    # Start the thread that will handle the subprocess output
    output_thread = threading.Thread(target=generate_response, args=(process,output_log))
    output_thread.start()

    # Wait for the subprocess and thread to finish
    process.wait()
    output_thread.join()

    # Re-enable the button, reset description and color after processing
    button.description = 'Send'
    button.style.button_color = 'lightgray'
    button.disabled = False

In [18]:
button.on_click(on_button_clicked)
alert_out = widgets.Output()
clear_button = widgets.Button(description="Clear Chat")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

quit_button = widgets.Button(description="Force Quit")
text = widgets.Textarea(layout=widgets.Layout(width='800px'))

def on_clear_button_clicked(b):
    # Clear the dialog history
    dialog_history.clear()
    # Add back the initial system prompt
    dialog_history.append({"role": "system", "content": SYSTEM_PROMPT})
    # Clear the output log
    with output_log:
        clear_output()
        
clear_button.on_click(on_clear_button_clicked)

# Create the title with HTML
title = f"<h1 style='color: #ff6e00;'>MohoBot 🦙</h1> <p> Enter your question! </p>"

# Assuming that output_log, alert_out, and text are other widgets or display elements...
first_row = HBox([button, clear_button, quit_button])  # Arrange these buttons horizontally
# Arrange the two rows of buttons and other display elements vertically
layout = VBox([output_log, alert_out, usertext, first_row])
display(HTML(title))  # Use HTML function to display the title
display(layout)

VBox(children=(Output(), Output(), Textarea(value='', layout=Layout(width='800px')), HBox(children=(Button(des…