## Search, Retrieve & Summarize with GPT-4
This notebook utilizes GPT-4 to search, retrieve, and summarize text documents. The process can be broadly categorized into the following steps:
<br>
* Configuration and Setup
* Encoding and Indexing Text Content
* Document Retrieval
* Summarization Using MapReduce

#### 1. Configuration and Setup
1.1. Import Necessary Libraries

In [1]:
# Libraries for embeddings, chains, document loading, chat models, prompts, vector stores, etc.
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain, StuffDocumentsChain
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import JSONLoader
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.vectorstores import FAISS
import logging
import yaml
import os 

1.2. Logging Setup

In [2]:
# Set up the logging mechanism to monitor progress and capture any issues
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

1.3. Load API Key and Set Paths


In [3]:
# Load the API key for OpenAI from the credentials file
with open('./../credentials/oai-key.yml', 'rb') as f:
    credentials = yaml.safe_load(f)
    
api_key = credentials['key']
os.environ['OPENAI_API_KEY'] = api_key

# Define paths for input and output data
LOCAL_INPUT_DIR = './DATA/INPUT'
LOCAL_OUTPUT_DIR = './DATA/OUTPUT'
MODEL_NAME = 'gpt-4'
FILE_NAME = 'file-2'
text_embeder = OpenAIEmbeddings()

#### 2. Encoding and Indexing Text Content
2.1. Define Metadata Extraction Function

In [4]:
# Function to extract necessary metadata from a record
def extract_metadata(record: dict, metadata: dict) -> dict:
    metadata['doc_name'] = record.get('doc_name')
    metadata['page_num'] = record.get('page_num')
    return metadata

2.2. Load Documents and Create Vector Store

In [5]:
# Load documents from a JSONL file and index them in a FAISS vector store
loader = JSONLoader(file_path=f'{LOCAL_OUTPUT_DIR}/{FILE_NAME}/TEXT/{FILE_NAME}.jsonl', 
                    jq_schema='.', 
                    metadata_func=extract_metadata, 
                    content_key='page_content', 
                    json_lines=True)

segments = loader.load()  # Here, segments correspond to pages 
vector_store = FAISS.from_documents(segments, text_embeder)

#### 3. Document Retrieval
3.1. Setup Retrieval Chain

In [6]:
# Set up a retriever using FAISS to search for relevant documents
retriever = vector_store.as_retriever(search_type='mmr', search_kwargs={'k': 10})

# Initialize LLM
llm = ChatOpenAI(model_name=MODEL_NAME, 
                 temperature=0.0, 
                 max_tokens=2048)

3.2. Retrieve Documents Based on Query

In [7]:
# Execute a query and log the results
query = "derivative contracts"
pages = retriever.get_relevant_documents(query)

for page in pages:
    logger.info(f'Content => {page.page_content}')
    logger.info(f'Metadata => {page.metadata}')
    logger.info(f"\n{'-' * 200}\n")


Content =>   
 
 savings association elected to treat the derivative  contract as multiple derivative contracts under 
§ 3.113(e)(6); 
(D) For derivative contracts that are not cleared through a QCCP, the cash collateral 
received by the recipient counterparty is not segregated (by law, regulation, or an agreement with 
the counterparty); 
(E) Variation margin is calculated and transferred on a daily basis based on the mark-to-
fair value of the derivative contract; 
(F) The variation margin transferred under th e derivative contract or the governing rules 
of the CCP or QCCP for a cleared transaction is the full amount that is necessary to fully extinguish the net current credit exposure to th e counterparty of the derivative contracts, subject 
to the threshold and minimum transfer amounts applicable to the counterparty under the terms of 
the derivative contract or the governing rules for a cleared transaction; 
(G) The variation margin is in the form of cash in the same currency as

#### 4. Summarization Using MapReduce
4.1. Map Chain for Page Summarization

In [8]:
# Define the map chain using a prompt template to provide a summarized version of each individual page
map_template = """You are a Derivatives Risk Analyst. Your job is to create a detailed SUMMARY of financial documents.
Your summary should include all important facts, math formulas, and technical details, and leave out extra information.
Your goal is to provide a clear and precise SUMMARY of the PAGE below that shows your knowledge of finance, with high accuracy and quality.
---
PAGE => {PAGE}
---
SUMMARY =>"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, 
                     prompt=map_prompt, 
                     verbose=True)


4.2. Reduce Chain for Consolidated Summarization

In [9]:
# Define the reduce chain using a prompt template to consolidate individual page summaries into a cohesive summary
reduce_template = """The following is set of SUMMARIES:
---
SUMMARIES => {SUMMARIES}
---
Combine the summaries into a single cohesive CONSOLIDATED summary without distilling or removing any information. Add sections wherever applicable to clearly delineate topics or themes.
CONSOLIDATED SUMMARY =>"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, 
                        prompt=reduce_prompt, 
                        verbose=True)

combine_documents_chain = StuffDocumentsChain(llm_chain=reduce_chain, 
                                              document_variable_name='SUMMARIES', 
                                              verbose=True)

reduce_documents_chain = ReduceDocumentsChain(combine_documents_chain=combine_documents_chain, 
                                              collapse_documents_chain=combine_documents_chain, 
                                              token_max=2048, 
                                              verbose=True)

4.3. Execute the MapReduce Chain

In [10]:
# Execute the map-reduce chain to get the final consolidated summary
map_reduce_chain = MapReduceDocumentsChain(llm_chain=map_chain, 
                                           reduce_documents_chain=reduce_documents_chain, 
                                           document_variable_name='PAGE', 
                                           return_intermediate_steps=False, 
                                           verbose=True)
completion = map_reduce_chain.run(pages)
completion = completion.strip()
logger.info(completion)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Derivatives Risk Analyst. Your job is to create a detailed SUMMARY of financial documents.
Your summary should include all important facts, math formulas, and technical details, and leave out extra information.
Your goal is to provide a clear and precise SUMMARY of the PAGE below that shows your knowledge of finance, with high accuracy and quality.
---
PAGE =>   
 
 savings association elected to treat the derivative  contract as multiple derivative contracts under 
§ 3.113(e)(6); 
(D) For derivative contracts that are not cleared through a QCCP, the cash collateral 
received by the recipient counterparty is not segregated (by law, regulation, or an agreement with 
the counterparty); 
(E) Variation margin is calculated and transferred on a daily basis based on the mark-to-
fair value of the derivative contract; 
(F) The variation margin tr

The document provides a comprehensive discussion on the treatment of derivative contracts, particularly those not cleared through a Qualified Central Counterparty (QCCP). It outlines the process of calculating and transferring variation margin daily based on the mark-to-fair value of the derivative contract. The derivative contract and the variation margin should be governed by a qualifying master netting agreement between the legal entities that are the counterparties to the derivative contract or by the governing rules for a cleared transaction.

Section 1: Derivative Risk Calculation
The document provides two formulas for calculating derivative risk, with variables representing the sum of the adjusted derivative contract amounts within different hedging sets. It also outlines the calculation of hedging set amounts for exchange rate derivative contracts, credit derivative contracts, and equity derivative contracts.

Section 2: Regulatory Alignment and Revisions
The document proposes 


[1m> Finished chain.[0m

[1m> Finished chain.[0m


4.4. Write the Final Summary to a File

In [11]:
# Write the final consolidated summary to a file for future reference
SUMMARY_OUT_PATH = f'{LOCAL_OUTPUT_DIR}/{FILE_NAME}/SUMMARY'
os.makedirs(SUMMARY_OUT_PATH, exist_ok=True)

with open(f'{SUMMARY_OUT_PATH}/{FILE_NAME}-summary-oai.txt', 'w') as f:
    f.write(completion)