# Following [tutorial](https://python.langchain.com/docs/use_cases/question_answering/#step-1-load) from langchain website

Steps to implement
1. **Loading:** First we need to load our data. Unstructured data can be loaded from many sources. Use the LangChain integration hub to browse the full set of loaders. Each loader returns data as a LangChain Document.
2. **Splitting:** Text splitters break Documents into splits of specified size
3. **Storage:** Storage (e.g., often a vectorstore) will house and often embed the splits
4. **Retrieval:** The app retrieves splits from storage (e.g., often with similar embeddings to the input question)
5. **Generation:** An LLM produces an answer using a prompt that includes the question and the retrieved data

## Imports

In [1]:
import os 
import logging
import json
import pandas as pd
from pprint import pprint
from textwrap import wrap
from collections import defaultdict, ChainMap

from langchain.document_loaders import JSONLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import LlamaCppEmbeddings
from langchain.vectorstores import Chroma
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.llms import LlamaCpp
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

## 1. Load

In [2]:
# loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")#
# data = loader.load()

# Expression to parse the SQuAD dtaset
jq_exp =   ".data[].paragraphs[] | {context:.context, queries:[.qas[] | {question:.question, id:.id, answers:[.answers[].text]}]}"

# Keep metadata in document to check questions and answers later
def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["queries"] = record.get("queries")
    return metadata

loader = JSONLoader("../documents/train-v2.0.json", jq_schema=jq_exp, content_key="context", metadata_func=metadata_func)
docs = loader.load()

## 2. Split

In [11]:
# We won't need splits as the documnents loaded are already short enough
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=0)
all_splits = text_splitter.split_documents(docs)
print(len(all_splits))
# Keeping only 100 first splits/docs to avoid loading too much data in the vectorstore
all_splits = all_splits[:100]

19102


## 3. Store

In [12]:
# LLama2 model configs
# Some configs taken from https://python.langchain.com/docs/guides/local_llms#llamacpp
llama_cpp_shared_configs = {
    "n_gpu_layers": 35,                                 # max number of layers to get offloaded to GPU (35 according to model's metadata)
    "n_batch": 512,                                     # Tokens to process in parallel
    "n_ctx": 2048,                                      # Context window length (should be 4096 for llama2)
    "f16_kv": True                                      # lower precision for less mem consumption
}

llama_embeddings_configs = {
    "model_path": "../models/llama-2-q4_0.gguf",
    "n_gpu_layers": 1
}

# Instantiate embeddings to use to transform documents to vectors before storing
embeddings = LlamaCppEmbeddings(**ChainMap(llama_embeddings_configs, llama_cpp_shared_configs))

# Create vector store, vectors and persist on disk
vectorstore_path = "../vectorstore/"

# Load from disk if already there
if os.listdir(vectorstore_path):
    vectorstore = Chroma(
        embedding_function=embeddings,
        persist_directory=vectorstore_path
    )
else:
    # Create new one if missing
    vectorstore = Chroma.from_documents(
        documents=all_splits, # use all_splits if this is used with large documents 
        embedding=embeddings,
        persist_directory=vectorstore_path
    )
    # Store embedded text on disk to avoid re-doing the same work
    vectorstore.persist()

llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from ../models/llama-2-q4_0.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_gate.weight q4_0     [  4096, 11008,     1,     1 ]

## 4. Retrieve

In [5]:
# Question to use
question = "What is the aim of each of the papers?"

In [11]:
# Simplest approach is to retrieve vectors based on similarity search directly from the vector store
unique_docs = vectorstore.similarity_search(question)

# A bit more sophisticate is by using the MultiQueryRetriever and an LLM
# logging.basicConfig()
# logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

llama_llm_configs={
    "model_path": "../models/llama-2-chat-q4_0.gguf", 
    "temperature":1,
    "callback_manager": CallbackManager([StreamingStdOutCallbackHandler()]),
    "verbose": True,
} 
llm = LlamaCpp(**ChainMap(llama_llm_configs, llama_cpp_shared_configs))


# retriever_from_llm = MultiQueryRetriever.from_llm(
#     retriever=vectorstore.as_retriever(),
#     llm=llm
# )
# unique_docs = retriever_from_llm.get_relevant_documents(query=question)


llama_print_timings:        load time =  2295.00 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =   949.95 ms /    11 tokens (   86.36 ms per token,    11.58 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   950.76 ms
llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from ../models/llama-2-chat-q4_0.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_0

In [12]:
for i, d in enumerate(unique_docs):
    m = d.metadata
    c = d.page_content
    c = '\n'.join(wrap(c.strip(), width=120))
    print("Source {}, Page {} Part {}:\n {}\n".format(m["source"], m["page"], i, c))

Source ../documents/VR_paper_2.pdf, Page 2 Part 0:
 ing the objectives and the research questions. A second section presents a review of the literature on bibliometric
analysis. The methodology used is defined in the third section, indicating the search procedures used to identify the
literature on VR in education. The fourth section presents the results, and these are discussed in the fifth section.
Finally, in the sixth section, the conclusions are drawn, and future lines of research are suggested.

Source ../documents/VR_paper.pdf, Page 3 Part 1:
 related to the use of                                 virtual reality in language learning.       2.4. Application of
inclusion and exclusion criteria for refining the VR corpus   Studies were eligible for inclusion in the VR corpus if
they conformed with the following criteria:   1. Include empirical data related to the application of Virtual Reality in
a language course (thus literature                                   reviews, editoria

## 5. Generate

In [13]:
template = """
[INST]<<SYS>> You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.<</SYS>> 
Question: {question} 
Context: {context} 
Answer:[/INST]
"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

qa_chain = RetrievalQA.from_chain_type(
    llm, 
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
    return_source_documents=True)
    
result = qa_chain({"query": question})


llama_print_timings:        load time =  2295.00 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =  3490.93 ms /    11 tokens (  317.36 ms per token,     3.15 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =  3493.72 ms


The aim of each paper in the corpus is to investigate and provide scholarly knowledge on the use of virtual reality (VR) in language learning. Specifically, the papers aim to:

1. Examine the effectiveness of VR in improving language learning outcomes, such as vocabulary acquisition and grammar comprehension.
2. Investigate the role of VR in enhancing learners' motivation and engagement in language courses.
3. Analyze the potential of VR to create immersive and interactive language learning experiences.
4. Explore the challenges and limitations of implementing VR in language education, such as cost and technological accessibility.

By providing a comprehensive review of the literature on VR in language learning, this corpus aims to contribute to the growing body of research on the use of VR in education and to inform future studies and practical applications in this field.


llama_print_timings:        load time =  1159.19 ms
llama_print_timings:      sample time =    81.87 ms /   200 runs   (    0.41 ms per token,  2442.90 tokens per second)
llama_print_timings: prompt eval time =  2148.61 ms /   819 tokens (    2.62 ms per token,   381.18 tokens per second)
llama_print_timings:        eval time = 16098.81 ms /   199 runs   (   80.90 ms per token,    12.36 tokens per second)
llama_print_timings:       total time = 18743.67 ms


In [16]:
# Print documents metadata to see which document where used to produce the response
# Because we tokenize documents, we'll need to get every document back only once
source_docs = defaultdict(dict)

for doc in result["source_documents"]:
    source_docs[(doc.metadata["source"], doc.metadata["page"])].update(doc.metadata) 

for k, v in source_docs.items():
    print("-" * 100)
    pprint(v, width=120)


----------------------------------------------------------------------------------------------------
{'page': 2, 'source': '../documents/VR_paper_2.pdf'}
----------------------------------------------------------------------------------------------------
{'page': 3, 'source': '../documents/VR_paper.pdf'}
----------------------------------------------------------------------------------------------------
{'page': 4, 'source': '../documents/VR_paper.pdf'}
----------------------------------------------------------------------------------------------------
{'page': 1, 'source': '../documents/VR_paper.pdf'}


In [15]:
# Examine results
print(result.keys())
print("")
pprint(f"{result['query']}")
print("")
print(result['result'])

dict_keys(['query', 'result', 'source_documents'])

'What is the aim of each of the papers?'

The aim of each paper in the corpus is to investigate and provide scholarly knowledge on the use of virtual reality (VR) in language learning. Specifically, the papers aim to:

1. Examine the effectiveness of VR in improving language learning outcomes, such as vocabulary acquisition and grammar comprehension.
2. Investigate the role of VR in enhancing learners' motivation and engagement in language courses.
3. Analyze the potential of VR to create immersive and interactive language learning experiences.
4. Explore the challenges and limitations of implementing VR in language education, such as cost and technological accessibility.

By providing a comprehensive review of the literature on VR in language learning, this corpus aims to contribute to the growing body of research on the use of VR in education and to inform future studies and practical applications in this field.


## Next steps
1. Compare LlamaCpp with LlamaCppEmbeddings and see if we can use the same object. (No, had to check when I was running out of GPU memory)
2. Add proper evaluation of model's performance for the task.
3. Investigate the vectorstores further.
4. Formalise new document addition process.
5. Investigate and understand further the vectorisation process of documents. 
6. Further investigate that answers actually come only from the documents provided. 
7. Distribute system to mutliple machines (e.g. VectorStore and LLM can live independently of the main program)