In [3]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
import textwrap

In [4]:
def load_pdf_data(file_path):
    # Creating a PyMuPDFLoader object with file_path
    loader = PyMuPDFLoader(file_path=file_path)
    
    # loading the PDF file
    docs = loader.load()
    
    # returning the loaded document
    return docs

In [5]:
# Responsible for splitting the documents into several chunks
def split_docs(documents, chunk_size=1024, chunk_overlap=20):
    
    # Initializing the RecursiveCharacterTextSplitter with
    # chunk_size and chunk_overlap
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    
    # Splitting the documents into chunks
    chunks = text_splitter.split_documents(documents=documents)
    
    # returning the document chunks
    return chunks

In [6]:
# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
    return HuggingFaceEmbeddings(
        model_name=model_path,
        model_kwargs={'device':'cpu'}, # here we will run the model with CPU only
        encode_kwargs = {
            'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
        }
    )


# Function for creating embeddings using FAISS
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"):
    # Creating the embeddings using FAISS
    vectorstore = FAISS.from_documents(chunks, embedding_model)
    
    # Saving the model in current directory
    vectorstore.save_local(storing_path)
    
    # returning the vectorstore
    return vectorstore

In [7]:
prompt = """
### System:
You are an AI Assistant that follows instructions extreamly well. \
Help as much as you can.
Please say I don't know if you don't know.

### User:
{prompt}

### Response:

"""

In [8]:
template = """
### System:
You are an respectful and honest assistant. You have to answer the user's \
questions using only the context provided to you. If you don't know the answer, \
just say you don't know. Don't try to make up an answer.

### Context:
{context}

### User:
{question}

### Response:
"""

In [9]:
def load_qa_chain(retriever, llm, prompt):
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever, # here we are using the vectorstore as a retriever
        chain_type="stuff",
        return_source_documents=True, # including source documents in output
        chain_type_kwargs={'prompt': prompt} # customizing the prompt
    )

In [10]:
def get_response(query, chain):
    # Getting response from chain
    response = chain({'query': query})
    
    # Wrapping the text for better output in Jupyter Notebook
    wrapped_text = textwrap.fill(response['result'], width=100)
    print(wrapped_text)

In [11]:
from langchain.llms import Ollama
from langchain import PromptTemplate

In [12]:
# Loading llama3:70b from Ollama
llm = Ollama(model="llama3:70b", temperature=0)

# Loading the Embedding Model
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


In [13]:
# loading and splitting the documents
docs = load_pdf_data(file_path="data/choisoonwook.pdf")
documents = split_docs(documents=docs)

# creating vectorstore
vectorstore = create_embeddings(documents, embed)

# converting vectorstore to a retriever
retriever = vectorstore.as_retriever()

In [14]:
# Creating the prompt from the template which we created before
prompt = PromptTemplate.from_template(template)

# Creating the chain
chain = load_qa_chain(retriever, llm, prompt)

In [15]:
get_response("Summarize the document", chain)

  warn_deprecated(


The document appears to be a personal profile or CV of Soon-Wook Choi, a bachelor student at Korea
University's Department of AI Cyber Security. The profile includes:  * Conference presentations (2
posters) * Education details (GPA, advisor, and current status) * Research experiences (research
student at Artificial Intelligence Cyber Security) * Research interests (quantum machine learning,
post-quantum cryptography, etc.) * Projects (3 ongoing projects related to quantum computing, binary
code analysis, and network security) * Skills and techniques (version management using Github,
Python implementation of CNN codes, and quantum circuit implementation)  Overall, the document
showcases Soon-Wook Choi's academic background, research experiences, and skills in the field of AI
cyber security.


In [16]:
get_response("What is QAOA", chain)

I don't know what QAOA stands for or what it refers to based on the provided context. The context
mentions quantum machine learning and quantum circuit implementation, but it does not explicitly
mention QAOA. If you could provide more information about QAOA, I may be able to help further.


In [17]:
get_response("What is BERT", chain)

I don't know what BERT is based on the provided context. The context only provides information about
SoonWook Choi's conferences, education, research experiences, interests, projects, and skills, but
it does not mention BERT.


In [18]:
get_response("Who is SoonWook Choi", chain)

Soon-Wook Choi is a Bachelor student at Korea University, Department of AI Cyber Security. He is
also a Research Student at Artificial Intelligence Cyber Security, Korea University Sejong, Korea.


In [19]:
resp1 = llm.invoke("What is QAOA")
print(resp1)

QAOA stands for Quantum Approximate Optimization Algorithm. It's a quantum algorithm designed to solve combinatorial optimization problems, which are notoriously difficult to solve exactly in a reasonable amount of time.

**The Problem:**
Combinatorial optimization problems involve finding the best solution among an exponentially large number of possible solutions. Examples include:

1. MaxCut: Divide a graph into two subsets to maximize the number of edges between them.
2. Traveling Salesman Problem (TSP): Find the shortest route that visits a set of cities and returns to the starting point.
3. Scheduling: Schedule tasks or jobs to minimize delays or maximize efficiency.

These problems are NP-hard, meaning that the running time of traditional algorithms increases exponentially with the size of the problem instance.

**The Solution:**
QAOA is a hybrid quantum-classical algorithm that leverages the power of quantum computing to approximate the solution to these optimization problems. T

In [20]:
resp2 = llm.invoke("What is BERT")
print(resp2)

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google in 2018. It's a deep learning model that has revolutionized the field of Natural Language Processing (NLP).

**What makes BERT special?**

1. **Contextual understanding**: BERT is trained to understand natural language text in context, unlike traditional language models that focus on individual words or phrases.
2. **Bidirectional encoding**: BERT uses a bidirectional encoder, which means it considers the entire input sequence (left and right context) when generating representations for each word.
3. **Transformer architecture**: BERT is built using the Transformer architecture, introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017. This architecture relies on self-attention mechanisms to model relationships between input tokens.

**How does BERT work?**

1. **Pre-training**: BERT is pre-trained on a large corpus of text data (e.g., Wikipedia, BookC

In [21]:
resp3 = llm.invoke("Who is SoonWook Choi")
print(resp3)

Soon-Wook Choi is a South Korean professional golfer who has had a successful career in the sport. Here are some key facts about him:

**Early Life and Amateur Career**: Born on December 20, 1968, in Seoul, South Korea, Choi started playing golf at the age of 12. He won several amateur tournaments in his home country before turning pro in 1991.

**Professional Career**: Choi has won numerous professional tournaments around the world, including:

* 2 PGA Tour victories: The 2007 Barclays Classic and the 2011 Players Championship
* 4 Japan Golf Tour wins
* 3 Asian Tour wins
* 2 Korean Tour wins

He has also represented South Korea in several international team competitions, such as the Presidents Cup (2003, 2007) and the World Cup (1995, 1996).

**Notable Achievements**: Choi's win at the 2011 Players Championship was particularly notable, as it made him the first Asian-born player to win the tournament. He also became the oldest player to win the event at the age of 42.

**Style and Imp

In [22]:
import os
pdf = os.listdir("./data")
for _ in pdf:
    print(_)

.DS_Store
A Tutorial on Quantum Approximate Optimization Algorithm QAOA Fundamentals and Applications.pdf
self_rag_arxiv.pdf
BERT_arxiv.pdf
crag_arxiv.pdf
RAG_arxiv.pdf
choisoonwook.pdf


In [24]:
pid = os.getpid()
pid

78220

In [25]:
# os.kill(pid, 0)