In [1]:
from dotenv import load_dotenv
import openai

In [2]:
load_dotenv('/home/tom/two/envapi/my-env')

True

In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


In [4]:
persist_directory = '/home/tom/two/vstore/chroma/'

In [6]:
embedding = OpenAIEmbeddings()

In [7]:
vectordb = Chroma(
    persist_directory=persist_directory, 
    embedding_function=embedding
)

In [8]:
question = "what are the major topics for the class?"

In [9]:
docs = vectordb.similarity_search(question, k=3)
docs

[Document(page_content="statistics for a while or maybe algebra, we'll go over those in the discussion sections as a \nrefresher for those of you that want one.  \nLater in this quarter, we'll also use the disc ussion sections to go over extensions for the \nmaterial that I'm teaching in the main lectur es. So machine learning is a huge field, and \nthere are a few extensions that we really want  to teach but didn't have time in the main \nlectures for.", metadata={'page': 8, 'source': 'https://see.stanford.edu/materials/aimlcs229/transcripts/machinelearning-lecture01.pdf'}),
 Document(page_content="statistics for a while or maybe algebra, we'll go over those in the discussion sections as a \nrefresher for those of you that want one.  \nLater in this quarter, we'll also use the disc ussion sections to go over extensions for the \nmaterial that I'm teaching in the main lectur es. So machine learning is a huge field, and \nthere are a few extensions that we really want  to teach but didn

In [16]:
def pretty_print(docs):
    for i,d in enumerate(docs):
        print(f'Document {i}"s content----------------------')
        print(d.page_content, end='\n\n')

In [17]:
pretty_print(docs)

Document 0"s content----------------------
statistics for a while or maybe algebra, we'll go over those in the discussion sections as a 
refresher for those of you that want one.  
Later in this quarter, we'll also use the disc ussion sections to go over extensions for the 
material that I'm teaching in the main lectur es. So machine learning is a huge field, and 
there are a few extensions that we really want  to teach but didn't have time in the main 
lectures for.

Document 1"s content----------------------
statistics for a while or maybe algebra, we'll go over those in the discussion sections as a 
refresher for those of you that want one.  
Later in this quarter, we'll also use the disc ussion sections to go over extensions for the 
material that I'm teaching in the main lectur es. So machine learning is a huge field, and 
there are a few extensions that we really want  to teach but didn't have time in the main 
lectures for.

Document 2"s content----------------------
middle of c

In [18]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
    model_name = 'gpt-3.5-turbo',
    temperature=0
)

In [19]:
from langchain.prompts import PromptTemplate

In [21]:
tempalte = """
    Use the following page of context to answer the question that follows. If you dont know the answer cleary say' I dont know.' 
    without making up any answer.
    {context}

    Question:{question}
    Answer:
"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=tempalte
)

In [22]:

#run chain
from langchain.chains import RetrievalQA
question = "Is probability a class topic?"


In [23]:
qa_chain = RetrievalQA.from_chain_type(
    llm, 
    retriever = vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={'prompt': QA_CHAIN_PROMPT}
)

In [24]:
result = qa_chain({'query':question})
result

{'query': 'Is probability a class topic?',
 'result': 'Yes, probability is a class topic.',
 'source_documents': [Document(page_content="of this class will not be very program ming intensive, although we will do some \nprogramming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.  \nI also assume familiarity with basic proba bility and statistics. So most undergraduate \nstatistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna \nassume all of you know what ra ndom variables are, that all of you know what expectation \nis, what a variance or a random variable is. And in case of some of you, it's been a while \nsince you've seen some of this material. At some of the discussion sections, we'll actually \ngo over some of the prerequisites, sort of as  a refresher course under prerequisite class. \nI'll say a bit more about that later as well.  \nLastly, I also assume familiarity with basi c linear algebra. And again, most undergr

In [26]:
pretty_print(result['source_documents'])

Document 0"s content----------------------
of this class will not be very program ming intensive, although we will do some 
programming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.  
I also assume familiarity with basic proba bility and statistics. So most undergraduate 
statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna 
assume all of you know what ra ndom variables are, that all of you know what expectation 
is, what a variance or a random variable is. And in case of some of you, it's been a while 
since you've seen some of this material. At some of the discussion sections, we'll actually 
go over some of the prerequisites, sort of as  a refresher course under prerequisite class. 
I'll say a bit more about that later as well.  
Lastly, I also assume familiarity with basi c linear algebra. And again, most undergraduate 
linear algebra courses are more than enough. So if you've taken courses like Math 51, 
103, Mat

In [27]:
from langchain.memory import ConversationBufferMemory


In [28]:
memory = ConversationBufferMemory(
    memory_key = 'chat_history',
    return_messages=True
)

In [35]:
from langchain.chains import ConversationalRetrievalChain
retriever = vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm, 
    retriever=retriever,
    memory = memory,
    chain_type='stuff',
    verbose=True
)

In [36]:
question='is probability a class topic?'
result = qa({'question':question})
result



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: is probability a class topic?
Assistant: Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge of random variables, expectation, variance, and other related concepts.
Human: why those prerequisites are required?
Assistant: The prerequisites are required because they provide the foundational knowledge and skills necessary to understand and apply the concepts taught in the class. 

Familiarity with basic probability and statistics is necessary because machine learning involves analyzing and making predictions based on data, which requires an understanding of probability and statistical concepts such as rando

{'question': 'is probability a class topic?',
 'chat_history': [HumanMessage(content='is probability a class topic?'),
  AIMessage(content='Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge of random variables, expectation, variance, and other related concepts.'),
  HumanMessage(content='why those prerequisites are required?'),
  AIMessage(content='The prerequisites are required because they provide the foundational knowledge and skills necessary to understand and apply the concepts taught in the class. \n\nFamiliarity with basic probability and statistics is necessary because machine learning involves analyzing and making predictions based on data, which requires an understanding of probability and statistical concepts such as random variables, expectation, variance, and probability distributions.\n\nKnowledge of basic linear algebra is important

In [37]:
question = 'why those prerequisites are required?'
result = qa({'question':question})
result



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: is probability a class topic?
Assistant: Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge of random variables, expectation, variance, and other related concepts.
Human: why those prerequisites are required?
Assistant: The prerequisites are required because they provide the foundational knowledge and skills necessary to understand and apply the concepts taught in the class. 

Familiarity with basic probability and statistics is necessary because machine learning involves analyzing and making predictions based on data, which requires an understanding of probability and statistical concepts such as rando

{'question': 'why those prerequisites are required?',
 'chat_history': [HumanMessage(content='is probability a class topic?'),
  AIMessage(content='Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge of random variables, expectation, variance, and other related concepts.'),
  HumanMessage(content='why those prerequisites are required?'),
  AIMessage(content='The prerequisites are required because they provide the foundational knowledge and skills necessary to understand and apply the concepts taught in the class. \n\nFamiliarity with basic probability and statistics is necessary because machine learning involves analyzing and making predictions based on data, which requires an understanding of probability and statistical concepts such as random variables, expectation, variance, and probability distributions.\n\nKnowledge of basic linear algebra is i

In [38]:
result['chat_history']

[HumanMessage(content='is probability a class topic?'),
 AIMessage(content='Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge of random variables, expectation, variance, and other related concepts.'),
 HumanMessage(content='why those prerequisites are required?'),
 AIMessage(content='The prerequisites are required because they provide the foundational knowledge and skills necessary to understand and apply the concepts taught in the class. \n\nFamiliarity with basic probability and statistics is necessary because machine learning involves analyzing and making predictions based on data, which requires an understanding of probability and statistical concepts such as random variables, expectation, variance, and probability distributions.\n\nKnowledge of basic linear algebra is important because many machine learning algorithms and techniques involve m

In [39]:
result['answer']

'The prerequisites of basic probability and statistics are required for this class because machine learning heavily relies on statistical concepts and techniques. Understanding probability and statistics is essential for analyzing and interpreting data, making predictions, and evaluating the performance of machine learning models. These concepts are used to quantify uncertainty, estimate parameters, and assess the reliability of results. Without a solid foundation in probability and statistics, it would be challenging to grasp the fundamental principles and methodologies of machine learning.'