# LlamaIndex LLM Retreival and Reranking with OPEN AI

This tutorial showcases how to do a two-stage pass for retrieval. Use embedding-based retrieval with a high top-k value
in order to maximize recall and get a large set of candidate items. Then, use LLM-based retrieval
to dynamically select the nodes that are actually relevant to the query.

In [None]:
!pip install ../dependencies/boto3-1.28.21-py3-none-any.whl
!pip install ../dependencies/botocore-1.31.21-py3-none-any.whl
!pip install langchain
!pip install pypdf
!pip install llama-index
!pip install openai

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
    LLMPredictor,
    get_response_synthesizer,
    set_global_service_context
)
from llama_index.indices.postprocessor import LLMRerank
from llama_index.llms import OpenAI
from IPython.display import Markdown, display
from llama_index.indices.document_summary import DocumentSummaryIndex


INFO:numexpr.utils:NumExpr defaulting to 2 threads.
NumExpr defaulting to 2 threads.


In [21]:
import os
import openai

#os.environ["OPENAI_API_KEY"] = "your api key"
openai.api_key = os.getenv("OPENAI_API_KEY")
#print(openai.api_key)


## Load Data, Build Index

In [5]:
# LLM Predictor (gpt-3.5-turbo) + service context
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)
#service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, chunk_size=1024)
#set_global_service_context(service_context)

In [11]:
documents = SimpleDirectoryReader(input_files=["data/p1212.pdf"]).load_data()

In [12]:
documents

[Document(id_='b49e6efe-3e07-40e2-bcab-6c4e4bebd51c', embedding=None, metadata={'page_label': '1', 'file_name': 'p1212.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='87e6ebdad45bd554023db01a711839acacc80eb11e7ad459cab8f940c08369d7', text='Contents\nIntroduction .................. 2\nDefinitions ................... 2\nDebt Instruments in the OID \nTables ................... 3\nDebt Instruments Not in the OID \nTables ................... 3\nInformation for Brokers and Other \nMiddlemen ................ 4\nShort-Term Obligations \nRedeemed at Maturity ........ 4\nLong-Term Debt Instruments ...... 4\nCertificates of Deposit .......... 5\nBearer Bonds and Coupons ....... 5\nBackup Withholding ........... 5\nInformation for Owners of OID Debt \nInstruments ............... 5\nForm 1099-OID .............. 6\nHow To Report OID ........... 7\nFiguring OID on Long-Term \nDebt Instruments ........... 7\nDebt Instruments Issued \nAfter July 1, 1982, a

In [13]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

## Retrieval

In [14]:
from llama_index.retrievers import VectorIndexRetriever
from llama_index.indices.query.schema import QueryBundle
import pandas as pd
from IPython.display import display, HTML


pd.set_option("display.max_colwidth", -1)


def get_retrieved_nodes(
    query_str, vector_top_k=10, reranker_top_n=3, with_reranker=False
):
    query_bundle = QueryBundle(query_str)
    # configure retriever
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=vector_top_k,
    )
    retrieved_nodes = retriever.retrieve(query_bundle)

    if with_reranker:
        # configure reranker
        reranker = LLMRerank(
            choice_batch_size=5, top_n=reranker_top_n, service_context=service_context
        )
        retrieved_nodes = reranker.postprocess_nodes(retrieved_nodes, query_bundle)

    return retrieved_nodes


def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n", "<br>")))


def visualize_retrieved_nodes(nodes) -> None:
    result_dicts = []
    for node in nodes:
        result_dict = {"Score": node.score, "Text": node.node.get_text()}
        result_dicts.append(result_dict)

    pretty_print(pd.DataFrame(result_dicts))
    # print_text(Score
    #     f'\n\n****Score****: {node.score}\n****Node text****\n: {node.node.get_text()}',
    #     color="blue"
    # )

  pd.set_option("display.max_colwidth", -1)


In [15]:
new_nodes = get_retrieved_nodes(
    "Who Must File Form 8300?", vector_top_k=3, with_reranker=False
)

In [16]:
visualize_retrieved_nodes(new_nodes)

Unnamed: 0,Score,Text
0,0.82367,"be- cause the information in the OID tables has generally not been verified by the IRS as cor- rect, the following tax matters are subject to change upon examination by the IRS. •The OID reported by owners of a debt in- strument on their income tax returns. •The issuer's classification of an instrument as debt for federal income tax purposes. •The adjusted basis of a debt instrument. Instructions for issuers of OID debt instru- ments. In general, issuers of publicly offered OID debt instruments must file Form 8281 within 30 days after the date of issuance, and, if registered with the Securities and Exchange Commission (SEC), within 30 days after regis- tration with the SEC. A separate Form 8281 must be filed for each issuance or SEC registra- tion. For more information, see Form 8281 and its instructions.Issuers should report errors in and omissions from the list in writing at the following address: IRS OID Publication Project SE:W:CAR:MP:TFP 1111 Constitution Ave. NW, IR-6526 Washington, DC 20224 REMIC and CDO information reporting re- quirements. Brokers and other middlemen must follow special information reporting re- quirements for real estate mortgage investment conduit (REMIC) regular interests, and collater- alized debt obligation (CDO) interests. The rules are explained in Pub. 938. Holders of interests in REMICs and CDOs should see chapter 1 of Pub. 550 for informa- tion on REMICs and CDOs. Comments and suggestions. We welcome your comments about this publication and sug- gestions for future editions. You can send us comments through IRS.gov/FormComments . Or, you can write to the Internal Revenue Service, Tax Forms and Publications, 1111 Constitution Ave. NW, IR-6526, Washington, DC"
1,0.820786,"OID for each actual owner, showing the OID for the owner. Show the owner of the debt instrument as the “recipient” and you as the “payer.” Complete Form 1099 -OID and Form 1096 and file the forms with the Internal Revenue Service Center for your area. See Where To File in the Instructions for Form 1096. You must also give a copy of the Form 1099 -OID to the actual owner. However, you are not required to file a nominee return to show amounts belong- ing to your spouse. See the Form 1099 -OID in- structions for more information. When preparing your tax return, follow the instructions under Showing an OID adjustment , later. How To Report OID You report your taxable interest and OID in- come on the interest line of Form 1040 or 1040-SR. Where to report. List each payer's name (if a brokerage firm gave you a Form 1099, list the brokerage firm as the payer), and the amount received from each payer on Schedule B (Form 1040), line 1. Include all OID and qualified sta- ted interest shown on any Form 1099 -OID, boxes 1, 2, and 8, you received for the tax year. Also include any other OID and interest income for which you did not receive a Form 1099. Showing an OID adjustment. To report more or less OID than shown in box 1 or box 8 on Form 1099 -OID, list the full OID on Schedule B (Form 1040), Part I, line 1, and follow the in- structions under (1) or (2) next. 1.If the OID, as adjusted, is less than the amount shown on Form 1099-OID, show the adjustment as follows. a.Under your last entry on line 1, subto- tal all interest and OID income listed on line 1. b.Below the subtotal, write “Nominee Distribution”"
2,0.806637,"•Foreign obligations not traded in the Uni- ted States and obligations not issued in the United States. Information for Brokers and Other Middlemen The following discussions contain specific in- structions for brokers and middlemen who hold or redeem a debt instrument for the owner. In general, you must file a Form 1099 -INT or Form 1099 -OID for the debt instrument if the in- terest or OID to be included in the owner's in- come for a calendar year totals $10 or more. You must also file a Form 1099 -INT or Form 1099- OID if you were required to deduct and withhold tax, even if the interest or OID is less than $10. See Backup Withholding , later. If you must file a Form 1099 -INT or Form 1099- OID, furnish a copy to the owner of the debt instrument by January 31 in the year it is due, or February 15 in the year it is due if the Form 1099 -INT or Form 1099 -OID is furnished as part of a consolidated reporting statement. File all your Forms 1099 with the IRS, accompa- nied by Form 1096, by February 28 in the year they are due (March 31 if you file electronically). Electronic payee statements. You can issue Form 1099-INT or Form 1099-OID electronically with the consent of the recipient. More information. For more information, in- cluding penalties for failure to file (or furnish) re- quired information returns or statements, see the current General Instructions for Certain Information Returns , available at IRS.gov/ 1099GeneralInstructions . Short-Term Obligations Redeemed at Maturity If you redeem a short -term discount obligation for the owner at maturity, you must report the discount as interest on Form 1099-INT. To figure the discount, use the purchase price shown on the owner's copy of"


In [17]:
new_nodes = get_retrieved_nodes(
    "Who Must File Form 8300?",
    vector_top_k=10,
    reranker_top_n=3,
    with_reranker=True,
)

In [18]:
visualize_retrieved_nodes(new_nodes)

Unnamed: 0,Score,Text
0,10.0,"1111 Constitution Ave. NW, IR-6526, Washington, DC 20224. Although we can’t respond individually to each comment received, we do appreciate your feedback and will consider your comments and suggestions as we revise our tax forms, instruc- tions, and publications. Don’t send tax ques- tions, tax returns, or payments to the above ad- dress. Getting answers to your tax questions. If you have a tax question not answered by this publication or the How To Get Tax Help section at the end of this publication, go to the IRS In- teractive Tax Assistant page at IRS.gov/ Help/ITA where you can find topics by using the search feature or viewing the categories listed. Getting tax forms, instructions, and pub- lications. Go to IRS.gov/Forms to download current and prior -year forms, instructions, and publications. Ordering tax forms, instructions, and publications. Go to IRS.gov/OrderForms to order current forms, instructions, and publica- tions; call 800 -829- 3676 to order prior -year forms and instructions. The IRS will process your order for forms and publications as soon as possible. Don’t resubmit requests you’ve al- ready sent us. You can get forms and publica- tions faster online. Useful Items You may want to see: Publication 515 Withholding of Tax on Nonresident Aliens and Foreign Entities 550 Investment Income and Expenses 938 Real Estate Mortgage Investment Conduits (REMICs) Reporting Information (And Other Collateralized Debt Obligations (CDOs))  515  550  938Form (and Instructions) 1096 Annual Summary and Transmittal of U.S. Information Returns 1099-B Proceeds From Broker and Barter Exchange Transactions 1099-INT Interest"
1,8.0,"•Foreign obligations not traded in the Uni- ted States and obligations not issued in the United States. Information for Brokers and Other Middlemen The following discussions contain specific in- structions for brokers and middlemen who hold or redeem a debt instrument for the owner. In general, you must file a Form 1099 -INT or Form 1099 -OID for the debt instrument if the in- terest or OID to be included in the owner's in- come for a calendar year totals $10 or more. You must also file a Form 1099 -INT or Form 1099- OID if you were required to deduct and withhold tax, even if the interest or OID is less than $10. See Backup Withholding , later. If you must file a Form 1099 -INT or Form 1099- OID, furnish a copy to the owner of the debt instrument by January 31 in the year it is due, or February 15 in the year it is due if the Form 1099 -INT or Form 1099 -OID is furnished as part of a consolidated reporting statement. File all your Forms 1099 with the IRS, accompa- nied by Form 1096, by February 28 in the year they are due (March 31 if you file electronically). Electronic payee statements. You can issue Form 1099-INT or Form 1099-OID electronically with the consent of the recipient. More information. For more information, in- cluding penalties for failure to file (or furnish) re- quired information returns or statements, see the current General Instructions for Certain Information Returns , available at IRS.gov/ 1099GeneralInstructions . Short-Term Obligations Redeemed at Maturity If you redeem a short -term discount obligation for the owner at maturity, you must report the discount as interest on Form 1099-INT. To figure the discount, use the purchase price shown on the owner's copy of"
2,6.0,"be- cause the information in the OID tables has generally not been verified by the IRS as cor- rect, the following tax matters are subject to change upon examination by the IRS. •The OID reported by owners of a debt in- strument on their income tax returns. •The issuer's classification of an instrument as debt for federal income tax purposes. •The adjusted basis of a debt instrument. Instructions for issuers of OID debt instru- ments. In general, issuers of publicly offered OID debt instruments must file Form 8281 within 30 days after the date of issuance, and, if registered with the Securities and Exchange Commission (SEC), within 30 days after regis- tration with the SEC. A separate Form 8281 must be filed for each issuance or SEC registra- tion. For more information, see Form 8281 and its instructions.Issuers should report errors in and omissions from the list in writing at the following address: IRS OID Publication Project SE:W:CAR:MP:TFP 1111 Constitution Ave. NW, IR-6526 Washington, DC 20224 REMIC and CDO information reporting re- quirements. Brokers and other middlemen must follow special information reporting re- quirements for real estate mortgage investment conduit (REMIC) regular interests, and collater- alized debt obligation (CDO) interests. The rules are explained in Pub. 938. Holders of interests in REMICs and CDOs should see chapter 1 of Pub. 550 for informa- tion on REMICs and CDOs. Comments and suggestions. We welcome your comments about this publication and sug- gestions for future editions. You can send us comments through IRS.gov/FormComments . Or, you can write to the Internal Revenue Service, Tax Forms and Publications, 1111 Constitution Ave. NW, IR-6526, Washington, DC"
