In [1]:
!pip install langchain langchain-openai openai chromadb tiktoken pypdf

Collecting langchain
  Downloading langchain-0.1.5-py3-none-any.whl (806 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.7/806.7 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-openai
  Downloading langchain_openai-0.0.5-py3-none-any.whl (29 kB)
Collecting openai
  Downloading openai-1.11.1-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.1/226.1 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.22-py3-none-any.whl (509 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.0/509.0 kB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pypdf
  Downloading pypdf-4.0.1-py3-none-any.w

In [2]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
OPENAI_Embedding_KEY = userdata.get('Embeddings')

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains.question_answering import load_qa_chain

1. Load the pdf.

In [4]:
loader = PyPDFLoader("/content/book.pdf")
pages = loader.load_and_split()

2. Split the data.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=800,
    chunk_overlap=300,
    length_function=len,
    is_separator_regex=False,
)

In [6]:
docs = text_splitter.split_documents(pages)

3. Create Embeddings.

In [11]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large", openai_api_key=OPENAI_API_KEY)

4. Store in DB.

In [12]:
db = Chroma.from_documents(docs, embeddings)



In [15]:
# query it
query = "What is machine learning?"
matching_docs = db.similarity_search(query, k = 3)

# print results
print(matching_docs)



[Document(page_content="Deep learning  (DL) is a  subset of ML that uses a large number of artificial neurons \n(known as an artificial neural network ) to learn, which is similar to how a human \nbrain learns. An example of a deep learning-based solution  is the Amazon  Echo  \nvirtual assistant . To better understand how ML works, let's first talk about the different \napproaches taken by machines to learn. They are as follows:\n• Supervised ML\n• Unsupervised machine learning\n• Reinforcement learning\nLet's have a look at each one of them in detail.", metadata={'page': 22, 'source': '/content/book.pdf'}), Document(page_content='understanding languages, or driving cars. Having AI capability does not necessarily mean \na system has to be powered only by ML. An AI system can also be powered by other \ntechniques, such as rule-based engines. ML is a form of AI that learns how to perform a \ntask using different learning techniques, such as learning from examples using historical \ndata

In [16]:
# Here's an example of the first document that was returned
for doc in matching_docs:
    print (f"{doc.page_content}\n")

Deep learning  (DL) is a  subset of ML that uses a large number of artificial neurons 
(known as an artificial neural network ) to learn, which is similar to how a human 
brain learns. An example of a deep learning-based solution  is the Amazon  Echo  
virtual assistant . To better understand how ML works, let's first talk about the different 
approaches taken by machines to learn. They are as follows:
• Supervised ML
• Unsupervised machine learning
• Reinforcement learning
Let's have a look at each one of them in detail.

understanding languages, or driving cars. Having AI capability does not necessarily mean 
a system has to be powered only by ML. An AI system can also be powered by other 
techniques, such as rule-based engines. ML is a form of AI that learns how to perform a 
task using different learning techniques, such as learning from examples using historical 
data or learning by trial and error. An example of ML would be making credit decisions 
using an ML algorithm with acce

5. Query using LLM

Using load_qa_chain

In [17]:
llm = ChatOpenAI(openai_api_key = OPENAI_API_KEY,
                 temperature = 0)

In [18]:
chain = load_qa_chain(llm, chain_type="stuff")

In [19]:
query = "What is machine learning?"

matching_results = db.similarity_search(query)


response = chain.run(input_documents = matching_results, question = query)



In [20]:
response

'Machine learning (ML) is a form of artificial intelligence (AI) that involves training a computer system to learn and make predictions or decisions without being explicitly programmed. It is a process where machines learn from data and improve their performance over time. ML algorithms analyze large amounts of data, identify patterns, and make predictions or take actions based on those patterns. ML can be supervised, unsupervised, or reinforcement learning, each with its own approach to learning.'

In [26]:
query = "Explain different ML life cycle?"

matching_results = db.similarity_search(query)


response = chain.run(input_documents = matching_results, question = query)



In [27]:
print(response)

The ML life cycle refers to the different stages involved in developing and deploying a machine learning model. The typical ML life cycle includes the following steps:

1. Business Understanding: This step involves gaining a clear understanding of the business goals and objectives that the ML project aims to achieve. It includes defining the problem statement, identifying the key performance metrics, and understanding the requirements of the project.

2. Data Acquisition and Understanding: In this step, relevant data is collected from various sources. The data is then analyzed and explored to gain insights and understand its quality, completeness, and relevance to the problem at hand.

3. Data Preparation: This step involves cleaning and preprocessing the data to make it suitable for training the ML model. It includes tasks such as handling missing values, removing outliers, normalizing or scaling the data, and transforming categorical variables.

4. Model Building: In this step, the M