# Actuarial Standards of Practice (ASOP) Q&A Machine using Retrieval Augmented Generation (RAG)
This project aims to create a Retrieval-Augmented Generation (RAG) process for actuaries to ask questions on a set of Actuarial Standards of Practice (ASOP) documents. The RAG process utilizes the power of the Large Language Model (LLM) to provide answers to questions on ASOPs.

However, RAG is not without challenges, i.e., hallucination and inaccuracy. This code allows verifiability by providing the context it used to arrive at those answers. This process enables actuaries to validate the information provided by the LLM, empowering them to make informed decisions. By combining the capabilities of LLM with verifiability, this code offers actuaries a robust tool to leverage LLM technology effectively and extract maximum value.

The current example uses either OpenAI's GPT 3.5 turbo or a local LLM. Using local LLM can address potential data privacy or security concerns.

# 1. Initial Setup
This setup includes loading environment variables from a `.env` file, setting the required environment variables, and importing the necessary modules for further processing. It ensures that the code has access to the required APIs and functions for the subsequent tasks.


In [6]:
# Initial set up
from dotenv import load_dotenv
import os

# Load the variables from .env file and set the API key (or user may manually set the API key)
load_dotenv()  # This loads the variables from .env (not part of repo)
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
#os.environ["LANGCHAIN_TRACING_V2"] = "true" # use when you want to debug or monitor the performance of your langchain applications
#os.environ["LANGCHAIN_API_KEY"] = os.getenv('LANGCHAIN_API_KEY') # use when accessing cloud-based language models or services that langchain integrates with

# Import the necessary modules
import bs4
from langchain import hub
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.llms import Ollama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableParallel # for RAG with source
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from IPython.display import display, Markdown, Latex
import glob
import chromadb

# 2. Load PDF Files and Convert to a Vector DB
1. Create a function to load and extract text from PDF files in a specified folder. It defines a function called `load_pdfs_from_folder()` that takes a folder path as input and returns a list of extracted text documents from the PDF files in that folder.

2. In the example, the folder path `../data/ASOP` is used, but you can modify it to point to your desired folder.

3. By calling the `load_pdfs_from_folder()` function with the folder path, the code loads the PDF files, extracts the text using the PyPDFLoader, and stores the extracted text documents in the `docs` list.

4. After loading and extracting the text, a `RecursiveCharacterTextSplitter` object is created with specific parameters for chunking the documents. The `split_documents()` method is then used to split the documents into smaller chunks based on the specified parameters.

5. Finally, a Chroma vectorstore is created from the document splits. The vectorstore uses the `OpenAIEmbeddings` for embedding the chunks and is persisted to the directory `../data/chroma_db1`.

In [2]:
'''# Uncomment when creating your own vector database for the first time
# Define a function to load and extract text from PDFs in a folder
def load_pdfs_from_folder(folder_path):
    # Get a list of PDF files in the specified folder
    pdf_files = glob.glob(f"{folder_path}/*.pdf")
    docs = []
    for pdf_file in pdf_files:
        # Load the PDF file using the PyPDFLoader
        loader = PyPDFLoader(pdf_file) 
        # Extract the text from the PDF and add it to the docs list
        docs.extend(loader.load())
    return docs

# Example folder path
folder_path = '../data/ASOP'

# Call the function to load and extract text from PDFs in the specified folder
docs = load_pdfs_from_folder(folder_path)

# Create a text splitter object with specified parameters
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=200,
    length_function=len,)

# Split the documents into chunks using the text splitter
splits = text_splitter.split_documents(docs)

# Create a Chroma vector database from the document splits, using OpenAIEmbeddings for embedding
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings(), 
                                    persist_directory="../data/chroma_db1")
''' # Uncomment when creating your own vector database for the first time

'# Uncomment when creating your own vector database for the first time\n# Define a function to load and extract text from PDFs in a folder\ndef load_pdfs_from_folder(folder_path):\n    # Get a list of PDF files in the specified folder\n    pdf_files = glob.glob(f"{folder_path}/*.pdf")\n    docs = []\n    for pdf_file in pdf_files:\n        # Load the PDF file using the PyPDFLoader\n        loader = PyPDFLoader(pdf_file) \n        # Extract the text from the PDF and add it to the docs list\n        docs.extend(loader.load())\n    return docs\n\n# Example folder path\nfolder_path = \'../data/ASOP\'\n\n# Call the function to load and extract text from PDFs in the specified folder\ndocs = load_pdfs_from_folder(folder_path)\n\n# Create a text splitter object with specified parameters\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=1000, \n    chunk_overlap=200,\n    length_function=len,)\n\n# Split the documents into chunks using the text splitter\nsplits = text_splitter.sp

# 3. Retrieve from the Vector DB
Once a vector database is created, Section 2 can be commented out.  

In [2]:
# Create a Chroma vector database with specified parameters
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(), 
                     persist_directory="../data/chroma_db1")

In [7]:
## Retrieve and RAG chain

# Create a retriever using the vector database as the search source
retriever = vectorstore.as_retriever(search_type="mmr", 
                                     search_kwargs={'k': 6, 'lambda_mult': 0.25}) 
# Use MMR (Maximum Marginal Relevance) to find a set of documents that are both similar to the input query and diverse among themselves
# Increase the number of documents to get, and increase diversity (lambda mult 0.5 being default, 0 being the most diverse, 1 being the least)

# Load the RAG (Retrieval-Augmented Generation) prompt
prompt = hub.pull("rlm/rag-prompt")

# Create a ChatOpenAI language model for augmented generation
# llm = ChatOpenAI(model_name="gpt-3.5-turbo-0125", 
#                 temperature=0) # context window size 16k for GPT 3.5 Turbo

# Create a local large language model for augmented generation
# Ollama is one way to easily run inference (especially on macOS)
llm = Ollama(model="solar:10.7b-instruct-v1-q5_K_M")

# Define a function to format the documents with their sources and pages
def format_docs_with_sources(docs):
    formatted_docs = "\n\n".join(doc.page_content for doc in docs)
    sources_pages = "\n".join(f"{doc.metadata['source']} (Page {doc.metadata['page'] + 1})" for doc in docs)
    # Added 1 to the page number assuming 'page' starts at 0 and we want to present it in a user-friendly way

    return f"Documents:\n{formatted_docs}\n\nSources and Pages:\n{sources_pages}"

# Create a RAG chain using the formatted documents as the context
rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs_with_sources(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

# Create a parallel chain for retrieving and generating answers
rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

# 4. Generate Q&A

In [8]:
def generate_output():
    # Prompt the user for a question on ASOP
    usr_input = input("What is your question on ASOP?: ")

    # Invoke the RAG chain with the user input as the question
    output = rag_chain_with_source.invoke(usr_input)

    # Generate the Markdown output with the question, answer, and context
    markdown_output = "### Question\n{}\n\n### Answer\n{}\n\n### Context\n".format(output['question'], output['answer'])

    last_page_content = None  # Variable to store the last page content
    i = 1 # Source indicator

    # Iterate over the context documents to format and include them in the output
    for doc in output['context']:
        current_page_content = doc.page_content.replace('\n', '  \n')  # Get the current page content
        
        # Check if the current content is different from the last one
        if current_page_content != last_page_content:
            markdown_output += "- **Source {}**: {}, page {}:\n\n{}\n".format(i, doc.metadata['source'], doc.metadata['page'], current_page_content)
            i = i + 1
        last_page_content = current_page_content  # Update the last page content
    
    # Display the Markdown output
    display(Markdown(markdown_output))

### Example questions related to ASOPs
- explain ASOP No. 14
- How are expenses relfected in cash flow testing based on ASOP No. 22?
- What is catastrophe risk?
- When do I update assumptions?
- What should I do when I do not have credible data to develop non-economic assumptions?

In [9]:
generate_output()

What is your question on ASOP?:  explain ASOP No. 14


### Question
explain ASOP No. 14

### Answer
 ASOP No. 14, which was related to cash flow testing, has been repealed and its relevant parts were incorporated into ASOP No. 7 and ASOP No. 22 in their revisions. The Actuarial Standards Board (ASB) voted to adopt these revised standards and defer the effective date of ASOP No. 7 while addressing concerns related to property/casualty practice. ASOPs, or Actuarial Standards of Practice, provide guidelines for appropriate actuarial practices in the United States. The repealed ASOP No. 14's topics have been covered by other existing standards and professional conduct codes.

### Context
- **Source 1**: ../data/ASOP/asop007_128.pdf, page 5:

virelated to cash flow testing. Finally, the ASB has adopted a new format for standards, and this   
standard has been rewritten to conform to that new format.  In addition to ASOP No. 7, as part of the project  to look at all cash flow testing standards of   
practice, ASOP No. 14 and ASOP No. 22 were al so reviewed. Relevant portions of ASOP No.   
14 were incorporated within the 2001 revisions of ASOP No. 7 and ASOP No. 22.   At its September 2001 meeting, the ASB voted to adopt the revised ASOP No. 7 and ASOP No. 22 and to repeal ASOP No. 14. In April 2002, the ASB voted to defer the effective date of ASOP No. 7 to July 15, 2002 while it reviewed concerns raised by the Academy’s Casualty Practice Council regarding the standard’s applicability to property/casualty practice. At its June 2002 meeting, the ASB amended the scope to conform to generally accepted casualty actuarial practice. Please see appendix 3 for further information.   Exposure Draft
- **Source 2**: ../data/ASOP/asop004_173.pdf, page 31:

are found in the current version of ASOP No. 4. The reviewers believe the reference to Precept 8   
remains appropriate. The reviewers do not believe that the proposed change significantly improves   
the language included in the current version of ASOP No. 4, and made no change.
- **Source 3**: ../data/ASOP/asop009_105.pdf, page 2:

that the topics in ASOP No. 9 are adequately covered in ASOP No. 41, other ASOPs, and the Code of Professional Conduct , and concluded that ASOP No. 9 should be repealed.    
 Exposure Draft  
   
 The exposure draft of this repeal document was issued in June 2007 with a comment deadline of   
August 15, 2007. Seven comment letters were receive d and were considered in finalizing this   
repeal document. For a summary of the substan tive issues and the reviewers’ responses, please   
see appendix 2.
- **Source 4**: ../data/ASOP/asop017_192.pdf, page 3:

ASOP No. 17—Doc. No. 192   
   
iv   
 ASOP No. 17 Task Force   
          
   
David R. Godofsky, Chairperson   
 James P. Galasso   Lawrence J. Sher  Carl M. Harris    Margaret Tiller Sherwood  Adam Reese    
   
   
General Committee of the ASB   
   
Margaret Tiller Sherwood, Chairperson   
Shawna S. Ackerman    Susan E. Pantely    
Ralph S. Blanchard III   Judy K. Stromback     
Andrew M. Erman  
    Hal Tepfer   
Dale S. Hagstrom   Christian J. Wolfe       
Actuarial Standards Board   
   
Beth E. Fitzgerald, Chairperson   
Christopher S. Carlson  Darrell D. Knapp  Maryellen J. Coggins   Cande J. Olsen Robert M. Damler   Kathleen A. Riley  Mita D. Drazilov   Barbara L. Snyder                    
   
The Actuarial Standards Board (ASB) sets standards for appropriate actuarial practice in the United   
States through the development and promulgation of Actuarial Standards of Practice (ASOPs). These   
ASOPs describe the procedures an actuary should follow when performing actuarial services and
- **Source 5**: ../data/ASOP/asop011_199.pdf, page 30:

after ASOP No. 11 was initially exposed.    
Section 3.14, Reliance on Experts (now section 3.16, Reliance on the Expertise of Others)    
Comment   
   
   
Response  One commentator said that this section appears to have been drawn from ASOP No. 56, and   
suggested deleting duplicative language and adding a reference to ASOP No. 56 instead.   
   
The reviewers believe the guidance is  not limited to modeling and made no change.
- **Source 6**: ../data/ASOP/asop024_184.pdf, page 0:

Actuarial Standard    
of Practice    
No. 24   
   
   
   
Compliance with the    
NAIC Life Insurance Illustrations    
Model Regulation   
   
   
Revised Edition    
    
Developed by the   
Task Force to Revise ASOP No. 24 of the   
Life Committee of the   
Actuarial Standards Board   
    
Adopted by the   
Actuarial Standards Board   
December 2016   
   
   
Doc. No. 184


# 5. References
- https://www.actuarialstandardsboard.org/standards-of-practice/
- https://python.langchain.com/docs/use_cases/question_answering/quickstart
- https://python.langchain.com/docs/use_cases/question_answering/sources
- https://chat.langchain.com/