# LangChain

In [None]:
!pip install -q -U ragatouille
!pip install -q langchain
!pip install -q langchain-openai
!pip install -q langchain-core
!pip install -q langchain-community
!pip install -q pypdf

In [None]:
from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


artifact.metadata:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/405 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [None]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("Orca_paper.pdf")
pages = loader.load_and_split()



In [None]:
len(pages)

55

In [None]:
full_document = ""

for page in pages:
  full_document += page.page_content

In [None]:
print(full_document)

Orca: Progressive Learning from Complex
Explanation Traces of GPT-4
Subhabrata Mukherjee∗†, Arindam Mitra∗
Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
Microsoft Research
Abstract
Recent research has focused on enhancing the capability of smaller models
through imitation learning, drawing on the outputs generated by large
foundation models (LFMs). A number of issues impact the quality of these
models, ranging from limited imitation signals from shallow LFM outputs;
small scale homogeneous training data; and most notably a lack of rigorous
evaluation resulting in overestimating the small model’s capability as they
tend to learn to imitate the style, but not the reasoning process of LFMs . To
address these challenges, we develop Orca, a 13-billion parameter model
that learns to imitate the reasoning process of LFMs. Orca learns from
rich signals from GPT-4 including explanation traces; step-by-step thought
processes; and other complex instructions, guided by teacher assi

In [None]:
type(full_document)

str

In [None]:
RAG.index(
    collection=[full_document],
    index_name="orca_paper",
    max_document_length=512,
    split_documents=True,
)

This is a behaviour change from RAGatouille 0.8.0 onwards.
This works fine for most users and smallish datasets, but can be considerably slower than FAISS and could cause worse results in some situations.
If you're confident with FAISS working on your machine, pass use_faiss=True to revert to the FAISS-using behaviour.
--------------------


[Mar 30, 21:05:39] #> Creating directory .ragatouille/colbert/indexes/orca_paper 


[Mar 30, 21:05:43] [0] 		 #> Encoding 87 passages..
[Mar 30, 21:05:46] [0] 		 avg_doclen_est = 359.0804748535156 	 len(local_sample) = 87
[Mar 30, 21:05:46] [0] 		 Creating 2,048 partitions.
[Mar 30, 21:05:46] [0] 		 *Estimated* 31,240 embeddings.
[Mar 30, 21:05:46] [0] 		 #> Saving the indexing plan to .ragatouille/colbert/indexes/orca_paper/plan.json ..
used 19 iterations (0.5773s) to cluster 29678 items into 2048 clusters
[Mar 30, 21:05:47] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
[Mar 30, 21:07:2

0it [00:00, ?it/s]

[Mar 30, 21:08:59] [0] 		 #> Encoding 87 passages..


1it [00:01,  1.22s/it]
100%|██████████| 1/1 [00:00<00:00, 1219.63it/s]

[Mar 30, 21:09:01] #> Optimizing IVF to store map from centroids to list of pids..
[Mar 30, 21:09:01] #> Building the emb2pid mapping..
[Mar 30, 21:09:01] len(emb2pid) = 31240



100%|██████████| 2048/2048 [00:00<00:00, 57013.09it/s]

[Mar 30, 21:09:01] #> Saved optimized IVF to .ragatouille/colbert/indexes/orca_paper/ivf.pid.pt
Done indexing!





'.ragatouille/colbert/indexes/orca_paper'

### Do Retrieval

In [None]:
results = RAG.search(query="What is instruction tuning?", k=3)


In [None]:
results

[{'content': "For multimodal tasks, instruction tuning has been used to generate\nsynthetic instruction-following data for language-image tasks, such as image captioning [ 23]\nand visual question answering [24].\nA wide range of works in recent times, including Alpaca [ 7], Vicuna [ 9], WizardLM [ 8] and\nKoala [14], have adopted instruction-tuning to train smaller language models with outputs\ngenerated from large foundation models from the GPT family. As outlined in Section 1.1,\na significant drawback with all these works has been both limited task diversity, query\ncomplexity and small-scale training data in addition to limited evaluation overstating the\nbenefits of such approach.\n2.2 Role of System Instructions\nVanilla instruction-tuning (refer to Figure 4 for examples) often uses input, response pairs\nwith short and terse responses. Such responses when used to train smaller models, as in\nexisting works, give them limited ability to trace the reasoning process of the LFM. In

### Use as LangChain Retriever

In [None]:
retriever = RAG.as_langchain_retriever(k=3)

In [None]:
retriever.invoke("What is instruction tuning?")

[Document(page_content="For multimodal tasks, instruction tuning has been used to generate\nsynthetic instruction-following data for language-image tasks, such as image captioning [ 23]\nand visual question answering [24].\nA wide range of works in recent times, including Alpaca [ 7], Vicuna [ 9], WizardLM [ 8] and\nKoala [14], have adopted instruction-tuning to train smaller language models with outputs\ngenerated from large foundation models from the GPT family. As outlined in Section 1.1,\na significant drawback with all these works has been both limited task diversity, query\ncomplexity and small-scale training data in addition to limited evaluation overstating the\nbenefits of such approach.\n2.2 Role of System Instructions\nVanilla instruction-tuning (refer to Figure 4 for examples) often uses input, response pairs\nwith short and terse responses. Such responses when used to train smaller models, as in\nexisting works, give them limited ability to trace the reasoning process of t

### Create a Chain

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('openai')

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template(
    """Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}"""
)

llm = ChatOpenAI()

document_chain = create_stuff_documents_chain(llm, prompt)


retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [None]:
retrieval_chain.invoke({"input": "What is instruction tuning?"})

{'input': 'What is instruction tuning?',
 'context': [Document(page_content="For multimodal tasks, instruction tuning has been used to generate\nsynthetic instruction-following data for language-image tasks, such as image captioning [ 23]\nand visual question answering [24].\nA wide range of works in recent times, including Alpaca [ 7], Vicuna [ 9], WizardLM [ 8] and\nKoala [14], have adopted instruction-tuning to train smaller language models with outputs\ngenerated from large foundation models from the GPT family. As outlined in Section 1.1,\na significant drawback with all these works has been both limited task diversity, query\ncomplexity and small-scale training data in addition to limited evaluation overstating the\nbenefits of such approach.\n2.2 Role of System Instructions\nVanilla instruction-tuning (refer to Figure 4 for examples) often uses input, response pairs\nwith short and terse responses. Such responses when used to train smaller models, as in\nexisting works, give the

In [None]:
response = retrieval_chain.invoke({"input": "What is instruction tuning?"})

In [None]:
response["answer"]

'Instruction tuning is a technique that allows pre-trained language models to learn from input (natural language descriptions of the task) and response pairs. It has been applied to both language-only and multimodal tasks, where it has been shown to improve the zero-shot and few-shot performance of models on various benchmarks. In the context provided, instruction tuning has specifically been used to generate synthetic instruction-following data for language-image tasks like image captioning and visual question answering.'

# Llama-Index

In [None]:
!pip install -q llama-index
!pip install -q llama-hub
!pip install -q llama-index-core
!pip install -q llama-index-llms-openai

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["Orca_paper.pdf"])
docs = reader.load_data()

In [None]:
# docs

In [None]:
from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
RAGatouilleRetrieverPack = download_llama_pack(
    "RAGatouilleRetrieverPack", "./ragatouille_pack"
)

In [None]:
from llama_index.llms.openai import OpenAI

In [None]:
# create the pack
ragatouille_pack = RAGatouilleRetrieverPack(
    docs,  # List[Document]
    llm=OpenAI(model="gpt-3.5-turbo"),
    index_name="orca_paper",
    top_k=5,
)

This is a behaviour change from RAGatouille 0.8.0 onwards.
This works fine for most users and smallish datasets, but can be considerably slower than FAISS and could cause worse results in some situations.
If you're confident with FAISS working on your machine, pass use_faiss=True to revert to the FAISS-using behaviour.
--------------------


[Mar 30, 21:25:16] #> Note: Output directory .ragatouille/colbert/indexes/orca_paper already exists


[Mar 30, 21:25:16] #> Will delete 10 files already at .ragatouille/colbert/indexes/orca_paper in 20 seconds...
[Mar 30, 21:25:39] [0] 		 #> Encoding 219 passages..
[Mar 30, 21:25:40] [0] 		 avg_doclen_est = 155.7305908203125 	 len(local_sample) = 219
[Mar 30, 21:25:40] [0] 		 Creating 2,048 partitions.
[Mar 30, 21:25:40] [0] 		 *Estimated* 34,104 embeddings.
[Mar 30, 21:25:40] [0] 		 #> Saving the indexing plan to .ragatouille/colbert/indexes/orca_paper/plan.json ..
used 20 iterations (0.4053s) to cluster 32400 items into 2048 clusters
[0.034, 0.03

0it [00:00, ?it/s]

[Mar 30, 21:25:40] [0] 		 #> Encoding 219 passages..


1it [00:01,  1.13s/it]
100%|██████████| 1/1 [00:00<00:00, 1167.03it/s]

[Mar 30, 21:25:42] #> Optimizing IVF to store map from centroids to list of pids..
[Mar 30, 21:25:42] #> Building the emb2pid mapping..
[Mar 30, 21:25:42] len(emb2pid) = 34105



100%|██████████| 2048/2048 [00:00<00:00, 34108.16it/s]

[Mar 30, 21:25:42] #> Saved optimized IVF to .ragatouille/colbert/indexes/orca_paper/ivf.pid.pt





Done indexing!


In [None]:
response = ragatouille_pack.run("What is instruction tuning? ")


Loading searcher for index orca_paper for the first time... This may take a few seconds
[Mar 30, 21:26:30] #> Loading codec...
[Mar 30, 21:26:30] #> Loading IVF...
[Mar 30, 21:26:30] #> Loading doclens...


100%|██████████| 1/1 [00:00<00:00, 1390.68it/s]

[Mar 30, 21:26:30] #> Loading codes and residuals...



100%|██████████| 1/1 [00:00<00:00, 186.82it/s]

Searcher loaded!

#> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==
#> Input: . What is instruction tuning? , 		 True, 		 None
#> Output IDs: torch.Size([32]), tensor([  101,     1,  2054,  2003,  7899, 17372,  1029,   102,   103,   103,
          103,   103,   103,   103,   103,   103,   103,   103,   103,   103,
          103,   103,   103,   103,   103,   103,   103,   103,   103,   103,
          103,   103], device='cuda:0')
#> Output Mask: torch.Size([32]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')






In [None]:
response

Response(response='Instruction tuning is a technique that allows pre-trained language models to learn from input and response pairs. It involves training smaller models with pairs of user instructions, input, and corresponding outputs to improve their performance on various tasks.', source_nodes=[NodeWithScore(node=TextNode(id_='a89b493f-28fb-4d94-84e0-f0b340ca04de', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Given user instructions for a task and an input,\nthe system generates a response. Existing works like Alpaca [ 7], Vicuna [ 9] and variants\nfollow a similar template to train small models with ⟨{user instruction, input}, output ⟩.\n2 Preliminaries\n2.1 Instruction Tuning\nInstruction tuning [ 22] is a technique that allows pre-trained language models to learn\nfrom input (natural language descriptions of the task) and response pairs, for example,\n{"instruction": "Arrange the words in the given sentence to

In [None]:
print(response)

Instruction tuning is a technique that allows pre-trained language models to learn from input and response pairs. It involves training smaller models with pairs of user instructions, input, and corresponding outputs to improve their performance on various tasks.
