# Challenge 2: Reverse Transactions Analyzer

This notebook implements an AI solution for the "Reverse Transactions" category of the "Strengthening the Adoption of Standards in Islamic Finance with Artificial Intelligence" Hackathon.

## Objective
Given "out-of-context" financial entries (journal entries and brief context), our AI solution will identify the relevant AAOIFI Financial Accounting Standard(s) (FAS) that govern such transactions. If multiple FAS are possible, the system provides a weighted probability and reasoning.

## Approach
1. Process and index the AAOIFI standards (FAS and SS) using RAG techniques
2. Create a specialized prompt engineering system for reverse transaction analysis 
3. Provide weighted probability estimations for applicable standards
4. Justify the analysis with references to specific sections of the standards

In [9]:
# Import required libraries
import os
import pdfplumber
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
import re
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import json

# Load environment variables
load_dotenv()

True

## Step 1: Data Extraction from FAS and SS Standards
First, we'll extract text from the relevant AAOIFI standards documents (FAS and SS). These documents contain the rules and guidelines that determine the appropriate accounting treatments for various Islamic finance transactions.

In [10]:
# # Function to extract text from PDF documents
# def extract_pdf_text(pdf_path):
#     text = ""
#     try:
#         with pdfplumber.open(pdf_path) as pdf:
#             for page in pdf.pages:
#                 page_text = page.extract_text()
#                 if page_text:
#                     text += page_text + "\n"
#         return text
#     except Exception as e:
#         print(f"Error extracting text from {pdf_path}: {e}")
#         return ""

# # Get all PDF files from the data directory
# data_dir = "../data"
# pdf_files = [f for f in os.listdir(data_dir) if f.endswith('.pdf') or f.endswith('.PDF')]

# # Extract text from each PDF
# documents = []
# for pdf_file in pdf_files:
#     pdf_path = os.path.join(data_dir, pdf_file)
#     text = extract_pdf_text(pdf_path)
#     if text:
#         documents.append({
#             "source": pdf_file,
#             "text": text
#         })
#         print(f"Successfully extracted text from {pdf_file}")
#     else:
#         print(f"Failed to extract text from {pdf_file}")

# print(f"\nTotal documents processed: {len(documents)}")

## Step 2: Text Preprocessing and Chunking
To optimize the retrieval process, we need to divide the document text into smaller, meaningful chunks. This enables more precise matching and retrieval when handling specific queries about accounting standards.

In [11]:
# # Initialize text splitter for chunking
# splitter = RecursiveCharacterTextSplitter(
#     chunk_size=1000,  # Larger chunks to capture more context
#     chunk_overlap=200,  # Significant overlap to preserve context across chunks
#     separators=["\n\n", "\n", ". ", " ", ""]
# )

# # Process documents and create chunks with metadata
# chunks = []
# for doc in documents:
#     for i, chunk in enumerate(splitter.split_text(doc["text"])):
#         # Extract standard number from filename if possible
#         standard_match = re.search(r'(FAS|SS)[\s_-]*(\d+)', doc["source"], re.IGNORECASE)
#         standard_type = standard_match.group(1).upper() if standard_match else "Unknown"
#         standard_number = standard_match.group(2) if standard_match else "Unknown"
        
#         chunks.append({
#             "source": doc["source"],
#             "text": chunk,
#             "chunk_id": i,
#             "standard_type": standard_type,
#             "standard_number": standard_number
#         })

# print(f"Total chunks created: {len(chunks)}")

## Step 3: Building the Vector Database
Now we'll create a vector database to store and retrieve our document chunks. This enables semantic search capabilities, allowing us to find the most relevant standard sections for a given financial transaction.

In [12]:
# # Define path for Chroma persistence
# CHROMA_PATH = "../vector_db/standards_reverse_transactions"

# # Initialize OpenAI embeddings
# embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# # Extract chunk texts and prepare metadata
# chunk_texts = [chunk["text"] for chunk in chunks]
# chunk_metadatas = [
#     {
#         "source": chunk["source"],
#         "chunk_id": chunk["chunk_id"],
#         "standard_type": chunk["standard_type"],
#         "standard_number": chunk["standard_number"]
#     } 
#     for chunk in chunks
# ]

# # Check if the vector store already exists
# if os.path.exists(CHROMA_PATH):
#     # Load existing vector store
#     print("Loading existing Chroma vector store...")
#     vector_store = Chroma(
#         persist_directory=CHROMA_PATH,
#         embedding_function=embeddings
#     )
# else:
#     # Create a new vector store
#     print("Creating new Chroma vector store...")
#     vector_store = Chroma.from_texts(
#         texts=chunk_texts,
#         embedding=embeddings,
#         metadatas=chunk_metadatas,
#         persist_directory=CHROMA_PATH
#     )
#     # Persist the vector store to disk
#     vector_store.persist()
#     print(f"Saved vector store to {CHROMA_PATH}")
    
# print(f"Vector store contains {vector_store._collection.count()} documents")

## Step 4: Retrieval Functions
Let's create functions to retrieve the most relevant standards for a given financial transaction. These functions will help us find the right context to determine which FAS applies to a particular journal entry.

In [13]:
def retrieve_relevant_standards(query, top_k=5):
    """
    Retrieve the most relevant chunks from our vector store for a given query.
    
    Args:
        query (str): The text query about a financial transaction
        top_k (int): Number of results to retrieve
        
    Returns:
        list: List of document chunks with metadata
    """
    # Use Chroma's similarity search to find relevant chunks
    docs = vector_store.similarity_search(query, k=top_k)
    
    # Convert the returned documents to our expected format
    results = []
    for doc in docs:
        results.append({
            "source": doc.metadata["source"],
            "standard_type": doc.metadata["standard_type"],
            "standard_number": doc.metadata["standard_number"],
            "text": doc.page_content
        })
    return results

def filter_by_standard(results, standard_type=None, standard_number=None):
    """
    Filter retrieved results by standard type and/or number.
    
    Args:
        results (list): List of document chunks
        standard_type (str, optional): Filter by standard type (FAS, SS)
        standard_number (str, optional): Filter by standard number
        
    Returns:
        list: Filtered list of document chunks
    """
    filtered = results
    
    if standard_type:
        filtered = [r for r in filtered if r["standard_type"].upper() == standard_type.upper()]
    
    if standard_number:
        filtered = [r for r in filtered if r["standard_number"] == standard_number]
        
    return filtered

## Step 5: FAS Identification and Weighting
The core of our solution is the ability to identify which AAOIFI standards are most relevant to a given financial transaction. We'll use an LLM to analyze the transaction details and provide weighted probabilities for each potentially applicable standard.

In [14]:
def analyze_transaction(transaction_description, journal_entry=None, top_k=7):
    """
    Analyze a financial transaction and identify relevant AAOIFI standards.
    
    Args:
        transaction_description (str): Description of the transaction
        journal_entry (str, optional): Journal entry related to the transaction
        top_k (int): Number of relevant chunks to retrieve
        
    Returns:
        dict: Analysis results with weighted probabilities
    """
    # Construct a comprehensive query combining description and journal entry
    query = transaction_description
    if journal_entry:
        query += f"\n{journal_entry}"
    
    # Enhance the query with accounting terminology to improve retrieval
    enhanced_query = f"""
    Financial transaction analysis:
    {query}
    
    Relevant AAOIFI standards, accounting treatments, and financial instruments
    """
    
    # Retrieve relevant chunks
    relevant_chunks = retrieve_relevant_standards(enhanced_query, top_k=top_k)
    
    # Filter to focus on FAS documents
    fas_chunks = filter_by_standard(relevant_chunks, standard_type="FAS")
    
    # Extract unique FAS numbers for consideration
    potential_standards = []
    seen_standards = set()
    
    for chunk in fas_chunks:
        standard_id = f"FAS {chunk['standard_number']}"
        if standard_id not in seen_standards and chunk['standard_number'] != "Unknown":
            seen_standards.add(standard_id)
            potential_standards.append({
                "id": standard_id,
                "context": chunk['text'][:500]  # Brief context from the standard
            })
    
    # Also include SS standards as they often contain complementary guidance
    ss_chunks = filter_by_standard(relevant_chunks, standard_type="SS")
    ss_references = []
    
    for chunk in ss_chunks:
        standard_id = f"SS {chunk['standard_number']}"
        if standard_id not in seen_standards and chunk['standard_number'] != "Unknown":
            seen_standards.add(standard_id)
            ss_references.append({
                "id": standard_id,
                "context": chunk['text'][:300]  # Brief context from the standard
            })
    
    # Combine all relevant contexts for the LLM
    context_text = ""
    for std in potential_standards:
        context_text += f"Standard {std['id']}:\n{std['context']}\n\n"
    
    for std in ss_references:
        context_text += f"Supporting standard {std['id']}:\n{std['context']}\n\n"
    
    # Create LLM prompt to analyze and weight the standards
    llm = OpenAI(temperature=0)
    
    prompt = PromptTemplate(
        input_variables=["transaction", "journal_entry", "context", "standard_ids"],
        template="""
You are an expert in Islamic finance and AAOIFI standards. Given a financial transaction and relevant excerpts from AAOIFI standards, analyze which standards are most applicable.

Transaction description:
{transaction}

Journal entry (if provided):
{journal_entry}

Excerpts from potentially relevant standards:
{context}

Potential standards to consider: {standard_ids}

Based on your analysis, provide the following in JSON format:
1. The FAS standards that apply to this transaction, with probability weights (0-100) totaling 100%
2. A brief reasoning for each standard's applicability
3. A determination if the journal entry appears to comply with the identified standards
4. Any relevant Shariah considerations from the SS standards

Return only the JSON object without additional commentary. Example format:
{{
  "applicable_standards": [
    {{
      "standard": "FAS 4",
      "probability": 70,
      "reasoning": "This standard applies because..."
    }},
    {{
      "standard": "FAS 10",
      "probability": 30,
      "reasoning": "This standard is somewhat relevant because..."
    }}
  ],
  "compliance_assessment": "The journal entry appears to comply with FAS 28 because...",
  "shariah_considerations": "According to SS 9, this transaction should also consider..."
}}
"""
    )
    
    # Prepare standard IDs for the prompt
    standard_ids = ", ".join([std["id"] for std in potential_standards])
    
    # Run the analysis
    chain = LLMChain(llm=llm, prompt=prompt)
    response = chain.run(
        transaction=transaction_description,
        journal_entry=journal_entry if journal_entry else "No journal entry provided",
        context=context_text,
        standard_ids=standard_ids
    )
    
    # Parse the JSON output
    try:
        result = json.loads(response)
        return result
    except json.JSONDecodeError:
        # If parsing fails, try to extract JSON from the response
        match = re.search(r'({.*})', response, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(1))
            except:
                pass
        
        # Return error if parsing fails
        return {
            "error": "Failed to parse LLM response",
            "raw_response": response
        }

## Step 6: Testing with Example Cases
Let's test our implementation with the example cases provided in the hackathon challenge:

### Example 1: GreenTech Exit and Buyout
- Context: GreenTech exits in Year 3, and Al Baraka Bank buys out its stake
- Adjustments: Buyout Price: $1,750,000; Bank Ownership: 100%
- Journal Entry: Dr. GreenTech Equity $1,750,000 / Cr. Cash $1,750,000

### Example 2: Contract Reversal
- Context: Client cancels a change order, reverting to original contract terms
- Adjustments: Revised Contract Value back to $5,000,000; Timeline Restored: 2 years
- Journal Entry: Dr. Accounts Payable $1,000,000 / Cr. Work-in-Progress $1,000,000

In [15]:
# Test Case 1: GreenTech Exit and Buyout
test_case_1 = {
    "description": """
    GreenTech exits in Year 3, and Al Baraka Bank buys out its stake. 
    After this buyout, the bank's ownership becomes 100%. 
    The accounting treatment involves derecognition of GreenTech's equity and recognition of the acquisition expense.
    """,
    "journal_entry": "Dr. GreenTech Equity $1,750,000 / Cr. Cash $1,750,000"
}

# Run the analysis
print("Analyzing Test Case 1: GreenTech Exit and Buyout\n")
result_1 = analyze_transaction(test_case_1["description"], test_case_1["journal_entry"])

# Display the results
print(json.dumps(result_1, indent=2))

Analyzing Test Case 1: GreenTech Exit and Buyout



RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [None]:
# Test Case 2: Contract Reversal
test_case_2 = {
    "description": """
    The client decided to cancel a change order that had previously modified the contract terms. 
    This cancellation reverts all terms back to the original contract. 
    The contract value is revised back to $5,000,000, and the project timeline is restored to the original 2 years. 
    This requires adjustment of revenue and cost projections, as well as a reversal of additional cost accruals.
    """,
    "journal_entry": "Dr. Accounts Payable $1,000,000 / Cr. Work-in-Progress $1,000,000"
}

# Run the analysis
print("Analyzing Test Case 2: Contract Reversal\n")
result_2 = analyze_transaction(test_case_2["description"], test_case_2["journal_entry"])

# Display the results
print(json.dumps(result_2, indent=2))

Analyzing Test Case 2: Contract Reversal

{
  "applicable_standards": [
    {
      "standard": "FAS 28",
      "probability": 80,
      "reasoning": "This standard applies because it deals with accounting for changes in contract terms and the cancellation of a change order would require adjustments to revenue and cost projections."
    },
    {
      "standard": "FAS 30",
      "probability": 20,
      "reasoning": "This standard is somewhat relevant because it deals with accounting for revenue and cost adjustments, which would be necessary in this transaction."
    }
  ],
  "compliance_assessment": "The journal entry appears to comply with FAS 28 because it correctly debits Accounts Payable and credits Work-in-Progress, reflecting the cancellation of the change order and the reversion to the original contract terms.",
  "shariah_considerations": "According to SS 9, this transaction should also consider the principles of fairness and transparency in the cancellation of the change orde

## Step 7: Feedback and Correction Mechanism
A critical feature of our solution is the ability to learn from feedback. When our system's prediction differs from the correct answer, we can analyze the discrepancy and improve future predictions.

In [None]:
def reconcile_with_correct_answer(analysis_result, correct_standards, transaction_data):
    """
    Compare our analysis with the correct answer and provide reconciliation insights.
    
    Args:
        analysis_result (dict): Our system's analysis result
        correct_standards (list): List of correct standards in priority order
        transaction_data (dict): Original transaction data
        
    Returns:
        dict: Reconciliation analysis with improvement insights
    """
    llm = OpenAI(temperature=0)
    
    # Format our results and the correct answer
    our_standards = [item["standard"] for item in analysis_result["applicable_standards"]]
    our_analysis = json.dumps(analysis_result, indent=2)
    
    prompt = PromptTemplate(
        input_variables=["transaction", "journal_entry", "our_analysis", "our_standards", "correct_standards"],
        template="""
You are an expert Islamic finance compliance advisor. Compare our system's analysis of a transaction with the correct standards and explain any discrepancies.

Transaction description:
{transaction}

Journal entry:
{journal_entry}

Our system's analysis:
{our_analysis}

Our predicted standards: {our_standards}
Correct standards (in priority order): {correct_standards}

Provide a detailed analysis addressing the following in JSON format:
1. Whether our prediction matched the correct standards (full match, partial match, or no match)
2. For each missed or incorrectly weighted standard, explain why it should have been identified
3. Key patterns in the transaction text and journal entry that indicate the correct standards
4. Specific contextual clues our system may have missed
5. How our analysis should be improved in the future

Return only the JSON object without additional commentary.
"""
    )
    
    # Run the reconciliation analysis
    chain = LLMChain(llm=llm, prompt=prompt)
    response = chain.run(
        transaction=transaction_data["description"],
        journal_entry=transaction_data["journal_entry"],
        our_analysis=our_analysis,
        our_standards=", ".join(our_standards),
        correct_standards=", ".join(correct_standards)
    )
    
    # Parse the JSON output
    try:
        result = json.loads(response)
        return result
    except json.JSONDecodeError:
        # If parsing fails, try to extract JSON from the response
        match = re.search(r'({.*})', response, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(1))
            except:
                pass
        
        # Return error if parsing fails
        return {
            "error": "Failed to parse LLM response",
            "raw_response": response
        }

In [None]:
# Correct answers from the challenge description
correct_answer_1 = ["FAS 4", "FAS 20", "FAS 32"]  # In priority order
correct_answer_2 = ["FAS 10"]  # In priority order

# Reconcile our predictions with correct answers
print("Reconciliation Analysis for Test Case 1:\n")
reconciliation_1 = reconcile_with_correct_answer(result_1, correct_answer_1, test_case_1)
print(json.dumps(reconciliation_1, indent=2))

print("\nReconciliation Analysis for Test Case 2:\n")
reconciliation_2 = reconcile_with_correct_answer(result_2, correct_answer_2, test_case_2)
print(json.dumps(reconciliation_2, indent=2))

Reconciliation Analysis for Test Case 1:

{
  "error": "Failed to parse LLM response",
  "raw_response": "\n{\n  \"prediction_match\": \"partial match\",\n  \"missed_standards\": [\n    {\n      \"standard\": \"FAS 4\",\n      \"reasoning\": \"Our system did not identify this standard, which deals with the accounting treatment for investments in subsidiaries, because it only focused on associates and joint ventures.\"\n    },\n    {\n      \"standard\": \"FAS 20\",\n      \"reasoning\": \"Our system did not identify this standard, which deals with the accounting treatment for equity method investments, because it only focused on associates and joint ventures.\"\n    },\n    {\n      \"standard\": \"FAS 32\",\n      \"reasoning\": \"Our system did not identify this standard, which deals with the accounting treatment for business combinations, because it only focused on the acquisition of associates and joint ventures.\"\n    }\n  ],\n  \"key_patterns\": [\n    \"buyout\",\n    \"ownersh

## Step 8: Building a Streamlined User Interface Function
Let's create a simplified user interface function that allows users to input transaction details and receive a comprehensive analysis.

In [None]:
def analyze_financial_transaction(description, journal_entry=None, correct_standards=None):
    """
    Comprehensive function to analyze a financial transaction and identify relevant AAOIFI standards.
    
    Args:
        description (str): Description of the transaction
        journal_entry (str, optional): Journal entry related to the transaction
        correct_standards (list, optional): List of correct standards for feedback
        
    Returns:
        dict: Complete analysis results
    """
    # Step 1: Analyze the transaction
    analysis = analyze_transaction(description, journal_entry)
    
    # Create a formatted output
    result = {
        "input": {
            "description": description,
            "journal_entry": journal_entry
        },
        "analysis": analysis
    }
    
    # Step 2: If correct standards are provided, add reconciliation
    if correct_standards:
        reconciliation = reconcile_with_correct_answer(
            analysis, 
            correct_standards, 
            {"description": description, "journal_entry": journal_entry}
        )
        result["reconciliation"] = reconciliation
    
    return result

In [None]:
# Example usage of our streamlined interface
new_transaction = {
    "description": """
    On 1 January 2023, Islamic Bank A (the Bank) entered into a Musharakah agreement with 
    Company B to finance a commercial real estate development project. The Bank contributed 
    60% (USD 6 million) while Company B contributed 40% (USD 4 million) of the total project 
    cost of USD 10 million. Company B will manage the project. Profits will be shared 50:50, 
    while losses will be borne in proportion to capital contribution.
    
    On 30 June 2023, after construction delays and rising costs, the partners agreed to amend 
    the agreement. The Bank contributed an additional USD 2 million, increasing its partnership 
    share to 67%. The profit-sharing ratio was adjusted to 60:40 in favor of the Bank.
    """,
    "journal_entry": "Dr. Musharakah Investment (Company B Project) $2,000,000 / Cr. Cash $2,000,000"
}

# Run the analysis with our unified function
musharakah_analysis = analyze_financial_transaction(
    new_transaction["description"], 
    new_transaction["journal_entry"]
)

# Display the results
print("Analysis of Musharakah Transaction:\n")
print(json.dumps(musharakah_analysis, indent=2))

Analysis of Musharakah Transaction:

{
  "input": {
    "description": "\n    On 1 January 2023, Islamic Bank A (the Bank) entered into a Musharakah agreement with \n    Company B to finance a commercial real estate development project. The Bank contributed \n    60% (USD 6 million) while Company B contributed 40% (USD 4 million) of the total project \n    cost of USD 10 million. Company B will manage the project. Profits will be shared 50:50, \n    while losses will be borne in proportion to capital contribution.\n\n    On 30 June 2023, after construction delays and rising costs, the partners agreed to amend \n    the agreement. The Bank contributed an additional USD 2 million, increasing its partnership \n    share to 67%. The profit-sharing ratio was adjusted to 60:40 in favor of the Bank.\n    ",
    "journal_entry": "Dr. Musharakah Investment (Company B Project) $2,000,000 / Cr. Cash $2,000,000"
  },
  "analysis": {
    "error": "Failed to parse LLM response",
    "raw_response": 

## Conclusion

This notebook demonstrates an AI-based approach to analyzing financial transactions in the context of Islamic finance, particularly addressing the "Reverse Transactions" challenge. Our solution:

1. Processes and indexes AAOIFI standards documents to create a knowledge base
2. Retrieves relevant standard excerpts for a given transaction
3. Analyzes transactions to identify applicable standards with weighted probabilities
4. Provides reasoning for each standard's applicability
5. Compares predictions with correct answers and provides reconciliation insights
6. Offers a streamlined interface for transaction analysis

This approach can be further enhanced by:
1. Incorporating more detailed metadata extraction from the standards
2. Implementing a fine-tuned model specifically for Islamic finance transaction classification
3. Expanding the knowledge base with additional interpretive guidance and scholarly opinions
4. Adding an interactive feedback loop to continuously improve the system's accuracy