In [1]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
import textwrap

In [2]:
def load_pdf_data(file_path):
    # Creating a PyMuPDFLoader object with file_path
    loader = PyMuPDFLoader(file_path=file_path)
    
    # loading the PDF file
    docs = loader.load()
    
    # returning the loaded document
    return docs

In [3]:
# Responsible for splitting the documents into several chunks
def split_docs(documents, chunk_size=1024, chunk_overlap=20):
    
    # Initializing the RecursiveCharacterTextSplitter with
    # chunk_size and chunk_overlap
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    
    # Splitting the documents into chunks
    chunks = text_splitter.split_documents(documents=documents)
    
    # returning the document chunks
    return chunks

In [4]:
# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
    return HuggingFaceEmbeddings(
        model_name=model_path,
        model_kwargs={'device':'cpu'}, # here we will run the model with CPU only
        encode_kwargs = {
            'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
        }
    )


# Function for creating embeddings using FAISS
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"):
    # Creating the embeddings using FAISS
    vectorstore = FAISS.from_documents(chunks, embedding_model)
    
    # Saving the model in current directory
    vectorstore.save_local(storing_path)
    
    # returning the vectorstore
    return vectorstore

In [5]:
prompt = """
### System:
You are an AI Assistant that follows instructions extreamly well. \
Help as much as you can.
Please say I don't know if you don't know.

### User:
{prompt}

### Response:

"""

In [6]:
template = """
### System:
You are an respectful and honest assistant. You have to answer the user's \
questions using only the context provided to you. If you don't know the answer, \
just say you don't know. Don't try to make up an answer.

### Context:
{context}

### User:
{question}

### Response:
"""

In [7]:
def load_qa_chain(retriever, llm, prompt):
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever, # here we are using the vectorstore as a retriever
        chain_type="stuff",
        return_source_documents=True, # including source documents in output
        chain_type_kwargs={'prompt': prompt} # customizing the prompt
    )

In [8]:
def get_response(query, chain):
    # Getting response from chain
    response = chain({'query': query})
    
    # Wrapping the text for better output in Jupyter Notebook
    wrapped_text = textwrap.fill(response['result'], width=100)
    print(wrapped_text)

In [9]:
from langchain.llms import Ollama
from langchain import PromptTemplate

In [10]:
# Loading llama3:70b from Ollama
llm = Ollama(model="llama3:70b", temperature=0)

# Loading the Embedding Model
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


In [11]:
# loading and splitting the documents
docs = load_pdf_data(file_path="data/choisoonwook.pdf")
documents = split_docs(documents=docs)

# creating vectorstore
vectorstore = create_embeddings(documents, embed)

# converting vectorstore to a retriever
retriever = vectorstore.as_retriever()

In [12]:
# Creating the prompt from the template which we created before
prompt = PromptTemplate.from_template(template)

# Creating the chain
chain = load_qa_chain(retriever, llm, prompt)

In [13]:
get_response("Summarize the document", chain)

  warn_deprecated(


The document appears to be a personal profile or CV of Soon-Wook Choi, a bachelor student at Korea
University. The profile includes:  1. Conferences: Two poster presentations in 2023 on information
security and cryptography. 2. Education: Bachelor student at Korea University, Department of AI
Cyber Security, with a GPA of 3.68/4.5. 3. Research Experiences: Research student at Artificial
Intelligence Cyber Security, Korea University Sejong, Korea since March 2021. 4. Research Interests:
Quantum machine learning, post-quantum cryptography, binary code analysis, network architecture, and
network security. 5. Projects: Three ongoing research projects on quantum computing, cryptography,
and AI-based binary similarity analysis. 6. Skills and Techniques: Proficient in using Github for
version management, Python for CNN code implementation, and quantum circuit implementation.
Overall, the document highlights Soon-Wook Choi's academic background, research experiences, and
skills in the field of

In [14]:
get_response("What is QAOA", chain)

I don't know what QAOA is. The context provided does not mention QAOA. It seems to be a personal
profile of SoonWook Choi, and it discusses his education, research experiences, projects, skills,
and conferences related to AI cyber security, quantum machine learning, and post-quantum
cryptography.


In [15]:
get_response("What is BERT", chain)

I don't know what BERT is based on the provided context. The context only provides information about
conferences, education, research experiences, research interests, projects, and skills and
techniques related to SoonWook Choi, but it does not mention BERT.


In [16]:
get_response("Who is SoonWook Choi", chain)

Soon-Wook Choi is a Bachelor student at Korea University, Department of AI Cyber Security.


In [17]:
resp1 = llm.invoke("What is QAOA")

In [18]:
print(resp1)

QAOA stands for Quantum Approximate Optimization Algorithm, (pronounced "kuh-oh-ah"). It's a quantum algorithm designed to solve complex optimization problems more efficiently than classical algorithms.

**The Problem:**
Optimization problems involve finding the best solution among many possible solutions. These problems are ubiquitous in various fields, as they can model real-world scenarios like:

* Scheduling tasks
* Allocating resources
* Portfolio optimization (finance)
* Logistics and supply chain management

Classical computers struggle to solve these problems efficiently due to their inherent complexity.

**The Solution:**
QAOA is a hybrid quantum-classical algorithm that leverages the power of quantum computing to approximate optimal solutions. It's designed to tackle complex optimization problems by exploiting the principles of quantum mechanics, such as superposition and entanglement.

Here's a high-level overview of how QAOA works:

1. **Problem Formulation**: Define the op

In [19]:
resp2 = llm.invoke("What is BERT")
print(resp2)

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google in  , which has revolutionized the field of Natural Language Processing (NLP).

**What does BERT do?**

BERT is designed to understand natural language processing tasks, such as:

1. **Language Modeling**: Predicting the next word in a sentence based on the context.
2. **Text Classification**: Classifying text into categories like spam/not spam, positive/negative sentiment, etc.
3. **Named Entity Recognition**: Identifying named entities (people, organizations, locations) in text.
4. **Question Answering**: Answering questions based on the content of a passage.

**How does BERT work?**

BERT uses a multi-layer bidirectional transformer encoder to generate contextualized representations of words in a sentence. These representations are then fine-tuned for specific NLP tasks.

Here's a high-level overview of the architecture:

1. **Input Embeddings**: Word embeddings (e.g., 

In [20]:
resp3 = llm.invoke("Who is SoonWook Choi")
print(resp3)

Soon-Wook Choi is a South Korean professional golfer who has had a successful career in the sport. Here are some key facts about him:

**Early Life and Amateur Career**: Born on June  , 1963, in Seoul, South Korea, Choi started playing golf at the age of 12. He won several amateur tournaments in his home country before turning pro in 1986.

**Professional Career**: Choi has had a prolific career, winning over 20 professional tournaments worldwide, including multiple victories on the Asian Tour, Japanese Tour, and Korean Tour. His most notable win came at the 2003 Korea Open, a co-sanctioned event with the European Tour.

**International Appearances**: Choi has represented South Korea in several international team competitions, including the Presidents Cup (1996, 2000), the Dunhill Cup (1989-1991), and the World Cup (1989-1992).

**Achievements and Honors**: Choi was named the Asian Tour's Rookie of the Year in 1987 and Player of the Year in 1994. He has also been recognized for his con

In [21]:
import os
pdf = os.listdir("./data")
for _ in pdf:
    print(_)

.DS_Store
A Tutorial on Quantum Approximate Optimization Algorithm QAOA Fundamentals and Applications.pdf
self_rag_arxiv.pdf
BERT_arxiv.pdf
crag_arxiv.pdf
RAG_arxiv.pdf
choisoonwook.pdf
