In [None]:
!pip3 install langchain
!pip3 install llama-index==0.6.0
!pip3 install pymongo
!pip3 install nltk
!pip3 install Pillow
!pip3 install python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: 

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [None]:
from llama_index import (
    LLMPredictor,
    GPTVectorStoreIndex,
    GPTListIndex,
    GPTSimpleKeywordTableIndex,
    download_loader
)

from langchain.chat_models import ChatOpenAI
from llama_index.response.notebook_utils import display_response

INFO:numexpr.utils:Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


  from .autonotebook import tqdm as notebook_tqdm


### INTRO

At a basic level, LlamaIndex takes your documents and breaks them into chunks called nodes.

Workflow:
1) Connect the private knowledge sources using LlamaIndex connectors.
2) Load in the Documents. A ‘LlamaIndex Document’ represents a lightweight container around the data source.
3) Parse the ‘LlamaIndex Documents’ objects into ‘LlamaIndex Nodes’ objects. Nodes represent “chunks” of source ‘LlamaIndex Documents’ (ex., a text chunk). These node objects can be persisted in a MongoDB collection.
4) Construct ‘LlamaIndex Index’ from ‘LlamaIndex Nodes’. There are various kinds of indexes in LlamaIndex, like “List Index” (which stores Nodes as a Sequential chain) and “Vector Store Index” (this stores each node and a corresponding embedding in a vector store). Depending on the type of Index, these indexes can be persisted into a MongoDB collection or a Vector Database.
5) Finally, query the Index. The query is parsed at this step; relevant Nodes are retrieved through indexes and provided as input to the “Large Language Model” (LLM). Different types of queries can use different indexes.


Use of Indexes:
For summarization, you have two options: GPTListIndex or GPTVectorStoreIndex with response_mode="tree_summarize". The distinction lies in the approach taken to generate the summary. A list index utilizes every node in the index to create the summary, while a vector index utilizes only the top k nodes to generate a summary.

For Q&A, GPTVectorStoreIndex can be used. During the query, the system fetches the top k most relevant nodes based on your query text. These nodes are then used as context to synthesize an answer using the LLM.

### Initialize OpenAI and MongoDB

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]



#### Load Documents

In [None]:
# In this example we load in GPT-4 paper
import requests
from pathlib import Path
import os

PDFReader = download_loader("PDFReader")
loader = PDFReader()

out_dir = Path("data")
if not out_dir.exists():
    os.makedirs(out_dir)
out_path = out_dir / "paper.pdf"

if not out_path.exists():
    url = 'https://arxiv.org/pdf/2303.08774.pdf'
    r = requests.get(url)
    with open(out_path, 'wb') as f:
        f.write(r.content)

doc = loader.load_data(file=Path(out_path))[0]


#### Parse into Nodes
Document stores contain ingested document chunks, which LlamaIndex calls 'Node' objects.


By default, the SimpleDocumentStore stores Node objects in-memory.

In [None]:
from llama_index.node_parser import SimpleNodeParser
nodes = SimpleNodeParser().get_nodes_from_documents([doc])

## Persisting nodes and indexes to MongoDB
There is an option to persist the nodes as an actual collection in mongoDB using MongoDocumentStore. Here we would persist the data in mongoDB.
Storing the ‘LlamaIndex documents’ and indexes in a database becomes necessary in a couple of scenarios:
(a) Use cases where large datasets require more than in-memory storage.
(b) Ingesting and processing data from various sources (for example, PDFs, google docs, Slack).
(c) The requirement to continuously maintain updates from the underlying data sources.

Being able to persist this data enables processing the data once and then being able to query it for various downstream applications. You can easily reconnect to your MongoDB collection and reload the index by re-initializing a MongoIndexStore with an existing db_name and collection_name.

MongoDB offers a free forever Atlas cluster in the public cloud service of your choice. Quickly create a free forever Atlas cluster by following this [tutorial](https://www.mongodb.com/developer/products/atlas/free-atlas-cluster/). Or you can get started directly [here](https://www.mongodb.com/cloud/atlas/register).


In [None]:
MONGO_URI = os.environ["MONGO_URI"]
MONGODB_DATABASE = "gpt4_paper"
# Note: You can configure the db_name and namespace when instantiating MongoDocumentStore & MongoIndexStore,
# otherwise they default to db_name="db_docstore" and namespace="docstore"

#### Add Nodes to MongoDB backed Docstore

In [None]:
from llama_index.storage.docstore import MongoDocumentStore
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)

docstore.add_documents(nodes)

This would result in a new collection called `docstore/data` and `docstore/metadata` being created in mongoDB

![MongoDocumentStore](https://drive.google.com/uc?export=view&id=1PrMet1I8bWfd-6pf4YK8RtQmRYFpLdVu)


### Define Indexes & Store them in MongoDB


Each index uses the same underlying Docstore.

In [None]:
from llama_index.storage.docstore import MongoDocumentStore
from llama_index.storage.index_store import MongoIndexStore
from llama_index.storage.storage_context import StorageContext

storage_context = StorageContext.from_defaults(
    docstore=MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),
    index_store=MongoIndexStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),
)



In [None]:
list_index = GPTListIndex(nodes, storage_context=storage_context)


INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens


In [None]:
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context)


INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 97155 tokens
> [build_index_from_nodes] Total embedding token usage: 97155 tokens


In [None]:
keyword_table_index = GPTSimpleKeywordTableIndex(nodes, storage_context=storage_context)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens


This would result in a new collection called `index_store/data` being created in mongoDB

![MongoIndexStore](https://drive.google.com/uc?export=view&id=1JkpyWyJjXLLC-0i1Q2NCflDG5RyDUQbk)

### Retrieve Nodes from MongoDB Docstore

(This is an OPTIONAL step. If you have been following along till now, the documents are already loaded in-memory)

In [None]:
from llama_index.storage.docstore import MongoDocumentStore
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE)
nodes = list(docstore.docs.values())

# NOTE: Verify that the docstore still has the same nodes
len(docstore.docs)


## Test out some Queries

In [None]:
vector_response = vector_index.as_query_engine().query("How does GPT4 do on the bar exam?")
display_response(vector_response)

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> [retrieve] Total embedding token usage: 11 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1996 tokens
> [get_response] Total LLM token usage: 1996 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens


**`Final Response:`** GPT-4 performs well on the Uniform Bar Exam, achieving a score in the top 10% of test takers (Table 1, Figure 4).

---

**`Source Node 1/2`**

**Document ID:** 4ecd6a4d-b2f3-423a-bff8-28971258a752<br>**Similarity:** 0.8548833172733542<br>**Text:** knowledge) 86 % 86 % 58 %
Advanced Sommelier (theory knowledge) 77 % 77 % 46 %
Leetcode (easy) 31...<br>

---

**`Source Node 2/2`**

**Document ID:** cb9d2c43-2f71-4047-a7ce-d53115827dd2<br>**Similarity:** 0.8320178432122729<br>**Text:** 213 / 400 (~10th)
LSAT 163 (~88th) 161 (~83rd) 149 (~40th)
SAT Evidence-Based Reading & Writing 7...<br>

{'4ecd6a4d-b2f3-423a-bff8-28971258a752': None,
 'cb9d2c43-2f71-4047-a7ce-d53115827dd2': None}

In [None]:
vector_response = vector_index.as_query_engine().query("What issues were observed after fine-tuning GPT-4 with RLHF?")
display_response(vector_response)

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 17 tokens
> [retrieve] Total embedding token usage: 17 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1890 tokens
> [get_response] Total LLM token usage: 1890 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens


**`Final Response:`** After RLHF fine-tuning, GPT-4 was observed to be overly cautious in certain ways, refusing innocuous requests and excessively hedging or "overrefusing". Additionally, the post-training process was observed to reduce the calibration of the model.

---

**`Source Node 1/2`**

**Document ID:** a438be10-f028-4b53-aa7c-280b48e10716<br>**Similarity:** 0.842341961779629<br>**Text:** 5-shot RLHF0%10%20%30%40%50%60%70%
ModelAccuracyAccuracy on adversarial questions (TruthfulQA mc1...<br>

---

**`Source Node 2/2`**

**Document ID:** e17b25d7-070f-449b-8235-830ae763f0a6<br>**Similarity:** 0.8396913346010052<br>**Text:** and
improve how users experience the model (e.g., to reduce risk of overreliance).27
3.1 Model Mi...<br>

{'a438be10-f028-4b53-aa7c-280b48e10716': None,
 'e17b25d7-070f-449b-8235-830ae763f0a6': None}

In [None]:
vector_response = vector_index.as_query_engine().query("What is RBRM?")
display_response(vector_response)

In [None]:
vector_response = vector_index.as_query_engine().query("How much better is GPT-4 in reducing hallucinations over GPT-3.5?")
display_response(vector_response)

In [None]:
# Note: This will take a while to execute
# You set use_async=True and response_mode="tree_summarize"
query_engine = list_index.as_query_engine()

list_response = query_engine.query(
    "What is a summary of this document?"
)

display_response(list_response)