### Lesson 7: Chat


In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_community.chat_models.cohere import ChatCohere
from langchain_community.embeddings.cohere import CohereEmbeddings
from langchain_community.vectorstores.chroma import Chroma

In [3]:
persist_directory = "./.chroma/"

In [4]:
embedding = CohereEmbeddings()

In [5]:
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding,
)

In [6]:
question = "What are major topics for this class?"

docs = vectordb.similarity_search(question, k=3)

In [7]:
print(docs[0].page_content)

statistics for a while or maybe algebra, we'll go over those in the discussion sections as a 
refresher for those of you that want one.  
Later in this quarter, we'll also use the disc ussion sections to go over extensions for the 
material that I'm teaching in the main lectur es. So machine learning is a huge field, and 
there are a few extensions that we really want  to teach but didn't have time in the main 
lectures for.


In [8]:
llm = ChatCohere(temperature=0)

llm.predict("Hello world!")

'Hello to you as well! How can I assist you today? If you would like, we can have a conversation about a variety of topics. I could tell you more about artificial intelligence or large language models if you wish to know more about the field I operate in. Alternatively, we can discuss any other subject that interests you, explore a new topic, or I can help you with any specific questions or tasks you may have. Feel free to let me know how I can help you and we can get started!'

In [9]:
template = """Use the following pieces of context to answer the question \
    at the end. If you don't know the answer, just say that you don't know, \
    don't try to make up an answer. Use three sentences maximum. \
    Keep the answer as concise as possible. Always say "thanks for asking!" \
    at the end of the answer.
{context}
Question: {question}
Helpful Answer:\n
"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template=template)

In [10]:
question = "Is probability a topic from the lecture material that you can retrieve?"

In [11]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

In [12]:
result = qa_chain({"query": question})

print(result["result"])

Yes, probability is indeed part of the lecture material for this course. Probability theory is fundamental to both algebra and statistics, and it is crucial for understanding many aspects of machine learning and the topics you've mentioned, such as convex optimization and hidden Markov models. 

So, in the earlier discussion, when the instructor mentioned "we'll go over algebra and statistics as refreshers", they were likely including probability within the scope of these subjects. 

Is there anything specific you would like to further inquire about regarding probability, or any other topic related to this course? Remember to attend the discussion sections if you'd like to delve deeper into extensions such as convex optimization and hidden Markov models. 

Feel free to let me know if you have any other questions about your coursework or academic pursuits. Always remember to stay curious and keep learning! 

Thanks for asking!


#### Memory


In [13]:
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

#### ConversationalRetrievalChain


In [14]:
qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectordb.as_retriever(),
    memory=memory,
)

In [15]:
question = "Is probability a topic of the lecture material that you can retrieve?"

result = qa({"question": question})

In [16]:
print(result["answer"])

Yes, I do have access to information on the topic of probability. Probability is a fundamental branch of mathematics that deals with the measurement and quantification of uncertainty. It provides a mathematical framework for analyzing and understanding the likelihood or chance of various events occurring.

In more detail, probability allows us to:

1. Formulate uncertain situations: We can express uncertainty by assigning probabilities to different outcomes of an experiment or event.

2. Make predictions: With probability, we can estimate how likely it is that a particular event will occur, allowing us to make informed decisions. For example, in business, probabilities are used to assess the likelihood of success or failure of a new venture.

3. Compare scenarios: Probability enables us to compare different alternatives and choose the best option by analyzing their probabilities of success.

4. Understand randomness: Probability helps us comprehend and analyze random phenomena, which a

In [17]:
question = "What is the name of the topic I just asked?"

result = qa({"question": question})

In [18]:
print(result["answer"])

Thank you for the prompt explanation. Indeed, the field of mathematics that deals with the measurement and quantification of uncertainty is called probability theory. 

Probability theory is a fundamental branch of mathematics that provides a framework for analyzing uncertainty by assigning probabilities to different events. It offers a mathematical language to describe the likelihood of various outcomes occurring in a given situation. 

Key concepts in probability theory include:
1. Sample Space: The sample space is the set of all possible outcomes of an experiment. For example, when rolling a fair six-sided die, the sample space consists of the numbers 1 through 6.
2. Events: Events are subsets of the sample space, representing specific outcomes or combinations of outcomes.
3. Probability Measure: Probability is a measure assigned to an event, indicating the likelihood of that event occurring. It falls within the range of 0 (will not occur) to 1 (certainly will occur).
4. Mutual Excl

#### Create a chatbot that works on your documents


In [19]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores.docarray import DocArrayInMemorySearch

In [20]:
def load_db(file, chain_type, k):
    loader = PyPDFLoader(file_path=file)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    docs = text_splitter.split_documents(documents=documents)

    embeddings = CohereEmbeddings()
    db = DocArrayInMemorySearch.from_documents(documents=docs, embedding=embeddings)

    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": k})

    qa_chain = ConversationalRetrievalChain.from_llm(
        llm=ChatCohere(temperature=0),
        chain_type=chain_type,
        retriever=retriever,
        return_source_documents=True,
        return_generated_question=True,
    )

    return qa_chain

In [None]:
import panel as pn
import param


class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query = param.String("")
    db_response = param.List([])

    def __init__(self, **params):
        super(cbfs, self).__init__(**params)
        self.panels = []
        self.loaded_file = "docs/cs229_lectures/MachineLearning-Lecture01.pdf"
        self.qa = load_db(self.loaded_file, "stuff", 4)

    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style = "outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style = "solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(
                pn.Row("User:", pn.pane.Markdown("", width=600)), scroll=True
            )
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result["answer"]
        self.panels.extend(
            [
                pn.Row("User:", pn.pane.Markdown(query, width=600)),
                pn.Row(
                    "ChatBot:",
                    pn.pane.Markdown(
                        self.answer, width=600, style={"background-color": "#F6F6F6"}
                    ),
                ),
            ]
        )
        inp.value = ""  # clears loading indicator when cleared
        return pn.WidgetBox(*self.panels, scroll=True)

    @param.depends(
        "db_query ",
    )
    def get_lquest(self):
        if not self.db_query:
            return pn.Column(
                pn.Row(
                    pn.pane.Markdown(
                        "Last question to DB:", styles={"background-color": "#F6F6F6"}
                    )
                ),
                pn.Row(pn.pane.Str("no DB accesses so far")),
            )
        return pn.Column(
            pn.Row(
                pn.pane.Markdown("DB query:", styles={"background-color": "#F6F6F6"})
            ),
            pn.pane.Str(self.db_query),
        )

    @param.depends(
        "db_response",
    )
    def get_sources(self):
        if not self.db_response:
            return
        rlist = [
            pn.Row(
                pn.pane.Markdown(
                    "Result of DB lookup:", styles={"background-color": "#F6F6F6"}
                )
            )
        ]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends("convchain", "clr_history")
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(
                pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True
            )
        rlist = [
            pn.Row(
                pn.pane.Markdown(
                    "Current Chat History variable",
                    styles={"background-color": "#F6F6F6"},
                )
            )
        ]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self, count=0):
        self.chat_history = []
        return

#### Create a chatbot


In [None]:
cb = cbfs()

file_input = pn.widgets.FileInput(accept=".pdf")
button_load = pn.widgets.Button(name="Load DB", button_type="primary")
button_clearhistory = pn.widgets.Button(name="Clear History", button_type="warning")
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput(placeholder="Enter text here…")

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp)

jpg_pane = pn.pane.Image("./img/convchain.jpg")

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation, loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2 = pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources),
)
tab3 = pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4 = pn.Column(
    pn.Row(file_input, button_load, bound_button_load),
    pn.Row(
        button_clearhistory,
        pn.pane.Markdown("Clears chat history. Can use to start a new topic"),
    ),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400)),
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown("# ChatWithYourData_Bot")),
    pn.Tabs(
        ("Conversation", tab1),
        ("Database", tab2),
        ("Chat History", tab3),
        ("Configure", tab4),
    ),
)
dashboard