# Actuarial Standards of Practice (ASOP) Q&A Machine using Retrieval Augmented Generation (RAG)
This project aims to create a Retrieval-Augmented Generation (RAG) process for actuaries to ask questions on a set of Actuarial Standards of Practice (ASOP) documents. The RAG process utilizes the power of the Large Language Model (LLM) to provide answers to questions on ASOPs.

However, RAG is not without challenges, i.e., hallucination and inaccuracy. This code allows verifiability by providing the context it used to arrive at those answers. This process enables actuaries to validate the information provided by the LLM, empowering them to make informed decisions. By combining the capabilities of LLM with verifiability, this code offers actuaries a robust tool to leverage LLM technology effectively and extract maximum value.

The current example uses OpenAI's GPT 3.5 turbo AND outputs results using different parameters for comparison purposes.  

# 1. Initial Setup
This setup includes loading environment variables from a `.env` file, setting the required environment variables, and importing the necessary modules for further processing. It ensures that the code has access to the required APIs and functions for the subsequent tasks.


In [1]:
# Initial set up
from dotenv import load_dotenv
import os

# Load the variables from .env file and set the API key (or user may manually set the API key)
load_dotenv()  # This loads the variables from .env (not part of repo)
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
#os.environ["LANGCHAIN_TRACING_V2"] = "true" # use when you want to debug or monitor the performance of your langchain applications
#os.environ["LANGCHAIN_API_KEY"] = os.getenv('LANGCHAIN_API_KEY') # use when accessing cloud-based language models or services that langchain integrates with

# Import the necessary modules
import bs4
from langchain import hub
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.llms import Ollama
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableParallel # for RAG with source
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from IPython.display import display, Markdown, Latex
import glob
import chromadb

In [2]:
# use_OpenAI
embeddings_model = OpenAIEmbeddings()
db_directory = "../data/chroma_db1"
llm = ChatOpenAI(model_name="gpt-3.5-turbo-0125", 
                 temperature=0) # context window size 16k for GPT 3.5 Turbo

# 2. Load PDF Files and Convert to a Vector DB
1. Create a function to load and extract text from PDF files in a specified folder. It defines a function called `load_pdfs_from_folder()` that takes a folder path as input and returns a list of extracted text documents from the PDF files in that folder.

2. In the example, the folder path `../data/ASOP` is used, but you can modify it to point to your desired folder.

3. By calling the `load_pdfs_from_folder()` function with the folder path, the code loads the PDF files, extracts the text using the PyPDFLoader, and stores the extracted text documents in the `docs` list.

4. After loading and extracting the text, a `RecursiveCharacterTextSplitter` object is created with specific parameters for chunking the documents. The `split_documents()` method is then used to split the documents into smaller chunks based on the specified parameters.

5. Finally, a Chroma vectorstore is created from the document splits. The vectorstore uses the defined embedding model for embedding the chunks and is saved to the predefined directory.

In [3]:
# Run only when the DB directory is empty
if not os.path.exists(db_directory) or not os.listdir(db_directory):
    # Define a function to load and extract text from PDFs in a folder
    def load_pdfs_from_folder(folder_path):
        # Get a list of PDF files in the specified folder
        pdf_files = glob.glob(f"{folder_path}/*.pdf")
        docs = []
        for pdf_file in pdf_files:
            # Load the PDF file using the PyPDFLoader
            loader = PyPDFLoader(pdf_file) 
            # Extract the text from the PDF and add it to the docs list
            docs.extend(loader.load())
        return docs
    
    # Example folder path
    folder_path = '../data/ASOP'
    
    # Call the function to load and extract text from PDFs in the specified folder
    docs = load_pdfs_from_folder(folder_path)
    
    # Create a text splitter object with specified parameters
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, 
        chunk_overlap=200,
        length_function=len,)
    
    # Split the documents into chunks using the text splitter
    splits = text_splitter.split_documents(docs)
    
    # Create a Chroma vector database from the document splits, using OpenAIEmbeddings for embedding
    vectorstore = Chroma.from_documents(documents=splits, 
                                        embedding=embeddings_model, 
                                        persist_directory=db_directory)

# 3. Retrieve from the Vector DB 

In [3]:
# Get a Chroma vector database with specified parameters
vectorstore = Chroma(embedding_function=embeddings_model, 
                     persist_directory=db_directory)

In [25]:
# This is where you may change the parameters
n_k = 4 # Number of documents to output (different from the number of documents to fetch in the algorithm)
lambda_1 = 1.0 # 1 being the least diverse, 0 being the most diverse
lambda_2 = 0.0 # 1 being the least diverse, 0 being the most diverse

In [26]:
## Retrieve and RAG chain

# Create a retriever using the vector database as the search source
retriever = vectorstore.as_retriever(search_type="mmr", 
                                     search_kwargs={'k': n_k, 'lambda_mult': lambda_1}) 

## Alternative method using local open-source LLM
retriever_ALT = vectorstore.as_retriever(search_type="mmr", 
                                     search_kwargs={'k': n_k, 'lambda_mult': lambda_2})

# Use MMR (Maximum Marginal Relevance) to find a set of documents that are both similar to the input query and diverse among themselves
# Increase the number of documents to get, and increase diversity (lambda mult 0.5 being default, 0 being the most diverse, 1 being the least)

# Load the RAG (Retrieval-Augmented Generation) prompt
prompt = hub.pull("rlm/rag-prompt")

# Define a function to format the documents with their sources and pages
def format_docs_with_sources(docs):
    formatted_docs = "\n\n".join(doc.page_content for doc in docs)
    sources_pages = "\n".join(f"{doc.metadata['source']} (Page {doc.metadata['page'] + 1})" for doc in docs)
    # Added 1 to the page number assuming 'page' starts at 0 and we want to present it in a user-friendly way

    return f"Documents:\n{formatted_docs}\n\nSources and Pages:\n{sources_pages}"

# Create a RAG chain using the formatted documents as the context
rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs_with_sources(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

# Create a parallel chain for retrieving and generating answers
rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

## Alternative method using different parameters
# Create a parallel chain for retrieving and generating answers
rag_chain_with_source_ALT = RunnableParallel(
    {"context": retriever_ALT, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)


# 4. Generate Q&A Functions

In [27]:
def generate_output():
    # Prompt the user for a question on ASOP
    usr_input = input("What is your question on ASOP?: ")

    # Invoke the RAG chain with the user input as the question
    output = rag_chain_with_source.invoke(usr_input)
    output_ALT = rag_chain_with_source_ALT.invoke(usr_input)

    # Generate the Markdown output with the question, answer, and context
    markdown_output = "### Question\n{}\n\n### First Answer with {}\n{}\n\n".format(output['question'], lambda_1, output['answer'])
    markdown_output += "### Second Answer with {}\n{}\n\n### First Context\n".format(lambda_2, output_ALT['answer'])

    last_page_content = None  # Variable to store the last page content
    i = 1 # Source indicator

    # Iterate over the context documents to format and include them in the output
    for doc in output['context']:
        current_page_content = doc.page_content.replace('\n', '  \n')  # Get the current page content
        
        # Check if the current content is different from the last one
        if current_page_content != last_page_content:
            markdown_output += "- **First Source {}**: {}, page {}:\n\n{}\n".format(i, doc.metadata['source'], doc.metadata['page'], current_page_content)
            i = i + 1
        last_page_content = current_page_content  # Update the last page content

    markdown_output += "\n\n### Second Context\n"

    last_page_content = None  # Variable to store the last page content
    i = 1 # Source indicator

    # Iterate over the context documents to format and include them in the output
    for doc in output_ALT['context']:
        current_page_content = doc.page_content.replace('\n', '  \n')  # Get the current page content
        
        # Check if the current content is different from the last one
        if current_page_content != last_page_content:
            markdown_output += "- **Second Source {}**: {}, page {}:\n\n{}\n".format(i, doc.metadata['source'], doc.metadata['page'], current_page_content)
            i = i + 1
        last_page_content = current_page_content  # Update the last page content

    
    # Display the Markdown output
    display(Markdown(markdown_output))

# Example questions related to ASOPs
- explain ASOP No. 14
- What are the considerations in choosing what methods to use for asset adequacy testing?
- How are expenses reflected in cash flow testing based on ASOP No. 22?
- What is catastrophe risk?
- When do I update assumptions?
- What should I do when I do not have credible data to develop non-economic assumptions?

In [28]:
generate_output()

What is your question on ASOP?:  What should I do when I do not have credible data to develop non-economic assumptions?


### Question
What should I do when I do not have credible data to develop non-economic assumptions?

### First Answer with 1.0
When credible data is not available to develop non-economic assumptions, consider other information sources such as pricing or reserving practices of similar insurance businesses. Obtain input from individuals with relevant expertise and ensure internal consistency of assumptions used in the appraisal. Use professional judgment and available sources of data to set assumptions when no relevant historical experience is available.

### Second Answer with 0.0
When credible data is not available to develop non-economic assumptions, consider other information sources such as pricing practices in the insurance business or the experience of other similar insurance companies. Obtain input from individuals with relevant expertise and ensure that each set of assumptions used is internally consistent. If additional expertise is needed, seek input from knowledgeable individuals and give due weight to their input.

### First Context
- **First Source 1**: ../data/ASOP/asop019_137.pdf, page 7:

information is available. When experience of the business is unavailable or insufficient to   
provide a credible basis on which to develo p assumptions, the actuary should consider   
other information sources in setting assumptions. Other information sources may include the pricing or reserving practices applicable  to the insurance business and the available   
experience of other insurance businesses with comparable policies or contracts, markets,   
and operating environment.    
  In developing assumptions for which the actuary believes additional expertise is needed,   
the actuary should obtain necessary input from persons possessing the relevant   
knowledge or expertise, and should gi ve due weight to their input.    
  When setting assumptions for use in an appraisal, the actuary should take reasonable   
steps to ensure that each set of assump tions used is internally consistent.   
   
3.4 Discount Rate  
⎯If the appraisal is based on the di scounted value of projected earnings,
- **First Source 2**: ../data/ASOP/asop027_197.pdf, page 10:

d. take into account other general considerations, when applicable (section 3.5); and   
   
 e. select a reasonable assumption (section 3.6).    
   
After completing these steps for each economic assumption, the actuary should review the   
set of economic assumptions for consistency (section 3.12) and make appropriate   
adjustments if necessary.   
   
3.4 Relevant Data—To evaluate relevant data, the actuary should review appropriate recent   
and long-term historical economic data. The actuary should not give undue weight to recent   
experience. The actuary should take into account the possibility that some historical   
economic data may not be appropriate for use in developing assumptions for future periods   
due to changes in the underlying environment.     
   
3.5 General Considerations—The actuary should take into account the following when   
applicable:   
   
 3.5.1  Adverse Deviation or Plan Provisions That Are Difficult to Measure—Depending
- **First Source 3**: ../data/ASOP/asop054_193.pdf, page 11:

3.4.1.3 Assumptions When There Is No Relevant Historical Experience  
—In   
some instances, no relevant histori cal experience is available to the   
actuary. In this situation, the actu ary should use prof essional judgment,   
considering available sources of data, when setting assumptions.    
   
3.4.2 Assumption Margins—The actuary shoul d consider the appropriateness of   
including a margin in the assumptions. When setting a margin, the actuary should   
consider the following:    
   
a. the degree to which there is uncerta inty around the assumptions due to   
lack of relevant, credible company or  industry experience data to support   
the assumptions;
- **First Source 4**: ../data/ASOP/asop052_189.pdf, page 13:

company.  Where no relevant  and credible company experience is available, the actuary should use   
professional judgment in advising on the adop tion and modification of  other sources of   
experience data. Examples of items that may result in modifications to the experience   
data include the company’s underwriti ng and administrative practices, market   
demographics, product design, and econom ic and regulatory environments.   
 Section 9 of VM-20 requires sensitivity testing to determine which assumptions have the   
most significant impact on reserves. The ac tuary should consid er performing more   
extensive analyses in setting assumptions that have a significant impact on valuation   
results.  The actuary should consider granularity in setting assumptions given the model   
structure. The actuary should us e professional judgment to set granularity  to reflect   
expected experience appropriately.


### Second Context
- **Second Source 1**: ../data/ASOP/asop019_137.pdf, page 7:

information is available. When experience of the business is unavailable or insufficient to   
provide a credible basis on which to develo p assumptions, the actuary should consider   
other information sources in setting assumptions. Other information sources may include the pricing or reserving practices applicable  to the insurance business and the available   
experience of other insurance businesses with comparable policies or contracts, markets,   
and operating environment.    
  In developing assumptions for which the actuary believes additional expertise is needed,   
the actuary should obtain necessary input from persons possessing the relevant   
knowledge or expertise, and should gi ve due weight to their input.    
  When setting assumptions for use in an appraisal, the actuary should take reasonable   
steps to ensure that each set of assump tions used is internally consistent.   
   
3.4 Discount Rate  
⎯If the appraisal is based on the di scounted value of projected earnings,
- **Second Source 2**: ../data/ASOP/asop051_188.pdf, page 24:

ASOP No. 51—September 2017   
   
   
 18Section 3.4, Assumptions for Assess ment of Risk (now  section 3.5)   
Comment   
   
   
Response One commentator suggested that empirical data be used to select assumptions rather than using   
professional judgment.   
   
The reviewers note that the section Assumptions for Assessment of Risk reads “The assumptions used for assessment of risk may be based on economic and demographic data and analyses,” but   
believe “the actuary should use professional judgment in selecting [these] assumptions” and made   
no change in response to this comment.   
Comment    
   
Response Two commentators suggested the term “plausible” is  not clear and also that implausible outcomes   
should be considered.   
   
The reviewers believe the term “plausible,” combined with the requirement for the actuary to use professional judgment, is appropriate for this standard and made no change in response to this   
comment.   
Comment
- **Second Source 3**: ../data/ASOP/asop007_128.pdf, page 16:

3.10.6 Limitations of Models, Assumptions, and Data  
—Cash flow estimates can vary   
considerably as a result of the model use d, the assumptions selected, and the data.   
When results are highly volatile, additional analysis may be appropriate.   
 3.11 Negative Interim Earnings  
—The actuary should consider the impact of any negative interim   
earnings during the cash flow projection period, if it is appropriate for the purpose of the   
analysis.   
   Section 4.  Communications and Disclosures  
   
 4.1 Reliance on Others for Data, Projections, and Supporting Analysis  
—The actuary may rely   
on data, projections, and supporting analysis supplied by others. In doing so, the actuary   
should disclose both the fact and the extent of such reliance. Such disclosure may follow the   
forms prescribed in the applicable NAIC model laws and regulations. The accuracy and comprehensiveness of data, projections, or supporting analysis supplied by others are the
- **Second Source 4**: ../data/ASOP/asop046_165.pdf, page 12:

ASOP No. 46—September 2012    
   
 7b. prices in the marketplace;   
   
c. opinions of other experts;   
   
d. the fit of the assumed dist ribution to available data;   
   
e. the ability of the assumed distribution to reflect possible extreme values;    
   
f. sensitivity of results to changes in assumptions;    
   
g.  internal consistency of the assumptions; and  h. consistency in the app lication of assumptions.   
 3.3.5   Validation of the Economic Capital Model  
—Economic capital is often   
determined based on the resu lts of stochastic models that produce a large number   
of outcomes. The actuary should devise a ppropriate tests of the distribution of   
outcomes calculated by the model (for example, in comparison to the range of results in similar models or to histori cal outcomes over time) and the sensitivity of   
those distributions to changes in the assumptions and parameters. The actuary should also perform validation tests to determine whether the model results are


# 5. References
- https://www.actuarialstandardsboard.org/standards-of-practice/
- https://python.langchain.com/docs/use_cases/question_answering/quickstart
- https://python.langchain.com/docs/use_cases/question_answering/sources
- https://python.langchain.com/docs/integrations/text_embedding/
- https://docs.gpt4all.io/gpt4all_python_embedding.html#gpt4all.gpt4all.Embed4All
- https://chat.langchain.com/