<a href="https://colab.research.google.com/github/Nov05/Google-Colaboratory/blob/master/20231214_chatgpt_3_5_turbo_%2B_llama_index_(RAG)_with_the_book_of_mormon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **\<TOP\>**  
* 2023-12-14  
* go to [the short course folder](https://drive.google.com/drive/folders/1NUa9wsPDILnlhnSZu95iylnccYYzv27g)
* 20231214_L3-Sentence_window_retrieval.ipynb  

---

* [check openai billing](https://platform.openai.com/account/billing/overview) (need to turn off the vpn web protection)
* [Using the Book of Mormon to Answer Life’s Questions](https://www.churchofjesuschrist.org/youth/activities/new/using-the-book-of-mormon-to-answer-lifes-questions?lang=eng)    

In [None]:
## download a folder
# !gdown --no-check-certificate --folder https://drive.google.com/drive/folders/10LpvZD_trQ7t0J3NeMoP6zFvU8ZmwvCv
# !cp /content/l3_files/* /content
# !rm -r /content/l3_files

## download a file
!gdown --no-check-certificate 'https://drive.google.com/uc?export=download&id=1qHEVMw6wZV7NdOV3E68nZySkmvBhPZ5-'

In [None]:
!pip install openai==1.3.5
!pip install llama-index==0.9.8
!pip install python-dotenv
!pip install trulens-eval==0.18.1
!pip install pypdf
!pip uninstall transformers ## 4.35.2 pre-installed by colab
!pip install transformers==4.33.2
!pip install sentence-transformers==2.2.2
## restart the session

In [19]:
from google.colab import output
output.enable_custom_widget_manager()
# output.disable_custom_widget_manager()

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"]

# **Lesson 3: Sentence Window Retrieval**

In [3]:
import utils
import os
import openai
from google.colab import userdata
openai.api_key = utils.get_openai_api_key()

In [6]:
from llama_index import SimpleDirectoryReader
path = '/content/The Book of Mormon_50 pages.pdf'
documents = SimpleDirectoryReader(
    input_files=[path]
).load_data()
print(type(documents))
print(len(documents))
print(type(documents[0]))
print(documents[0])

<class 'list'>
50
<class 'llama_index.schema.Document'>
Doc ID: b51dac0d-f1cd-4638-b8a4-8138f3df91a4
Text: The   Book of MorMon Another T estament of   Jesus Christ
Published by   The Church of Jesus Christ of Latter-day Saints Salt
Lake City, Utah, USA


In [8]:
from llama_index import Document
document = Document(text="\n\n".join([doc.text for doc in documents]))

In [4]:
import os
from llama_index import ServiceContext, VectorStoreIndex, StorageContext
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index import load_index_from_storage


def build_sentence_window_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index",
):
    # create the sentence window node parser w/ default settings
    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=sentence_window_size,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )
    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )
    if not os.path.exists(save_dir):
        sentence_index = VectorStoreIndex.from_documents(
            documents, service_context=sentence_context
        )
        sentence_index.storage_context.persist(persist_dir=save_dir)
    else:
        sentence_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=sentence_context,
        )

    return sentence_index


def get_sentence_window_query_engine(
    sentence_index, similarity_top_k=6, rerank_top_n=2
):
    # define postprocessors
    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )

    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
    )
    return sentence_window_engine

In [9]:
from llama_index.llms import OpenAI
index = build_sentence_window_index(
    [document],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    save_dir="./sentence_index",
)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [None]:
# %%time
# query_engine = get_sentence_window_query_engine(index, similarity_top_k=6)

# **TruLens Evaluation**  

In [11]:
eval_questions = []
path = '/content/questions.txt'
with open(path, 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)

In [12]:
from trulens_eval import Tru
def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)

In [13]:
from utils import get_prebuilt_trulens_recorder
from trulens_eval import Tru
Tru().reset_database()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


# **Sentence window size = 3**  

In [16]:
%%time
sentence_index_3 = build_sentence_window_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index_3",
)
sentence_window_engine_3 = get_sentence_window_query_engine(
    sentence_index_3
)
tru_recorder_3 = get_prebuilt_trulens_recorder(
    sentence_window_engine_3,
    app_id='sentence window engine 3'
)
## OpenAIError: The api_key client option must be set either by passing api_key to the client
## or by setting the OPENAI_API_KEY environment variable

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
CPU times: user 14 s, sys: 1.11 s, total: 15.1 s
Wall time: 16 s


In [20]:
%%time
run_evals(eval_questions, tru_recorder_3, sentence_window_engine_3)

CPU times: user 2min 19s, sys: 15.3 s, total: 2min 35s
Wall time: 4min 25s


In [18]:
Tru().run_dashboard()

Starting dashboard ...
npx: installed 22 in 4.847s

Go to this url and submit the ip given here. your url is: https://fruity-squids-hammer.loca.lt

  Submit this IP Address: 34.80.55.246



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

<img src="https://github.com/Nov05/pictures/blob/master/DLAI/20231213_Building%20and%20Evaluating%20Advanced%20RAG/2023-12-14%2011_01_34-Settings.jpg?raw=true" width=800>  

<img src="https://github.com/Nov05/pictures/blob/master/DLAI/20231213_Building%20and%20Evaluating%20Advanced%20RAG/2023-12-14%2011_10_03-Settings.jpg?raw=true" width=800>  

<img src="https://github.com/Nov05/pictures/blob/master/DLAI/20231213_Building%20and%20Evaluating%20Advanced%20RAG/2023-12-14%2011_14_04-Evaluations.jpg?raw=true" width=800>

In [47]:
## display output texts
import textwrap
records, feedback = Tru().get_records_and_feedback(app_ids=[]) ## records is a pandas dataframe
n_rows = 10
for i in range(n_rows):
    input, output = records.loc[i, ['input', 'output']]
    print('👉', textwrap.fill(input, 100))
    print('▪️', textwrap.fill(output, 100))
    output = records.loc[i+n_rows, 'output']
    print('▪️', textwrap.fill(output, 100), '\n')

👉 "What is the purpose of life? (Alma 34)"
▪️ "The purpose of life is not mentioned in the given context information."
▪️ "The purpose of life is not directly mentioned in the given context information." 

👉 "Does God know me? (Alma 5:38, 58)"
▪️ "Yes, God knows you. (Alma 5:38, 58)"
▪️ "Yes, God knows you. (Alma 5:38, 58)" 

👉 "How does God answer prayers? (Enos 1)"
▪️ "God answers prayers by communicating with individuals through visions and messengers. In Enos 1,
the individual had a vision where a messenger related to him the commandments and instructions from
God. The individual then obeyed and went to his father to share the vision, and his father confirmed
that it was from God. This suggests that God answers prayers by providing guidance and direction
through divine communication."
▪️ "God answers prayers by communicating with individuals through visions and messengers. In Enos 1,
the individual had a vision and received commandments from a messenger sent by God. This shows that

# **single questions**

In [21]:
%%time
window_response = sentence_window_engine_3.query(
    "What does Jesus Christ expect of me? (2 Nephi 9)"
)
display_response(window_response)

**`Final Response:`** Jesus Christ expects you to diligently seek Him and His teachings. He desires that you have faith in Him and strive to keep His commandments. He wants you to come to know Him through the power of the Holy Ghost, which is a gift from God. By seeking Him and following His teachings, you can receive His tender mercies and be blessed with deliverance.

In [22]:
%%time
from llama_index.response.notebook_utils import display_response
window_response = sentence_window_engine_3.query(
    "What is the purpose of life? (Alma 34)"
)
display_response(window_response)

**`Final Response:`** The purpose of life is not mentioned in the given context information.

CPU times: user 7.86 s, sys: 21.9 ms, total: 7.89 s
Wall time: 19 s


In [23]:
%%time
question = 'Summarize The First Book of Nephi.' ## check page 23
window_response = sentence_window_engine_3.query(question)
display_response(window_response)

**`Final Response:`** The First Book of Nephi is an account of Lehi and his family, including his wife Sariah and his four sons: Laman, Lemuel, Sam, and Nephi. Lehi is warned by the Lord to leave Jerusalem because he prophesies about the people's iniquity and they want to kill him. They embark on a three-day journey into the wilderness. Later, Nephi witnesses the power of God with the Gentiles who were delivered from captivity. He also sees that they prosper in the land and encounters an angel who asks him if he knows the meaning of a book, to which Nephi admits he does not.

CPU times: user 5.01 s, sys: 13 ms, total: 5.03 s
Wall time: 14.6 s


In [28]:
%%time
question = 'Summarize 1 Nephi Chapter 3.' ## check page 27
window_response = sentence_window_engine_3.query(question)
display_response(window_response)

**`Final Response:`** In 1 Nephi Chapter 3, Nephi and his brothers are commanded by the Lord to return to Jerusalem to obtain the plates of brass, which contain the record of the Jews and the genealogy of their forefathers. Laban, the keeper of the plates, refuses to give them up. Nephi persuades his brothers to be faithful in keeping the commandments of God. They gather their gold, silver, and precious things and return to Laban's house.

CPU times: user 6.54 s, sys: 496 ms, total: 7.04 s
Wall time: 13.3 s


# **others**  

* [javascript 1](https://stackoverflow.com/questions/71456390/how-to-keep-the-google-colab-running-without-disconnecting-in-2022), [javascript 2](https://stackoverflow.com/questions/76595272/how-to-stop-google-colab-from-automatically-disconnecting-from-google-drive) that keeps colab running  

> Keep your session active: Even though this should not be necessary when a script is actively running, you could use a JavaScript code snippet that will press the Colab connect button for you every few minutes. It's not an ideal solution but has been reported to help some users. Please remember this might be against the terms of service, use with caution.  

>You should open the browser's developer tools (usually F12), go to the Console tab, paste the above script and hit Enter.  

```
function ConnectButton(){
    console.log("Connect pushed");
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton,60000);
```

# **\<BOTTOM\>**  