# 🧾 Financial Content Assistant - Project Notebook

Welcome! This notebook walks through the complete process of building a Financial Content Assistant using Retrieval-Augmented Generation (RAG) with LangChain and OpenAI models.


# 📜 Abstract

This project builds a **Financial Content Assistant** using Retrieval-Augmented Generation (RAG) architecture.  
The system allows users to upload financial documents (PDFs, TXT files, spreadsheets) and ask natural language questions.  
It intelligently retrieves relevant document sections and generates coherent, contextual answers using OpenAI’s models.  

Key Features:
- **Multi-file support:** PDF, TXT, CSV, XLSX.
- **Metadata extraction:** Date, Company Name, Document Type (e.g., 10-K, 10-Q).
- **Document chunking:** Optimized retrieval granularity.
- **Semantic Search:** Vector database-backed document retrieval.
- **Interactive Assistant Interface:** User-friendly UI with badges for question type detection (e.g., Investment, Financial Analysis, etc.).
- **Evaluation Metrics:** Assess answer quality and source attribution.

This project demonstrates the application of advanced natural language processing techniques for intelligent financial document querying and content understanding.


## System Architecture
![System Architecture](https://www.plantuml.com/plantuml/png/dLRVRzis47xNNt5eWMu3wjhDMYm3kg2UdHy6OjIBdRniUpYJ9vimJGeaLKSR-h-Ff4LBTk8QcG8e9E_xZhoFToT_jWwDAzSoiokX2ZIuXWBP2XSqTPKaCCHSoyOfnrAw6JswlXGBjIRbWFaNcHPgxZqiZ2uL7sYf3RpEuzD2ACq9_iq0Vdy2_JNy1Oisd4oz-Y4-kzNPKh2L8clXVVQjvRPlZDxiKmwCBUQxivRHBf7hZL0Bo55QoYJb3fm68nPB1rm47OHMMUQ4amIVvXTNGg5Y0YKCj791XxwrmZhqemvSFlebJXocbNKKDcTBuRoJqj2dxlmYotw0tuuytdHloS0eo9eZ8xZ5Yipt6FMTmIj_iEUuOSjNy9eZdUSi2lzdXFLBjoJb1CCGtL_gnozXvUodlm7jWJ5m_YZ9Xh5tNvGEti0TsK8hIvvvRP0BtPqwGxYMmcrz9CRzAF0zl9JEmb3OGcBb7TnunRIfF7yYjPDLTuok22CnZmuImFE_dT1ig5aPyt5YD2Dh_MM-JBXZ3f9w-JaZLgktmzxxqo9lzFQHN0b9NXNUlPcDRp_6C6hOLeLWXnjKNFeTGkJLk9mChxgJohzhxGCcGG8dL9oNQqmEEJeC-MMGrPTsHmjwx4vzBDMMdM0DAlWOB4kr2a6DDIybUU8jgiLWYKAzXVUzNgzxrkk1_af6Oh5rb2WdnNyWvhI18JgSetrXnZvBJgBd-LaI_Kj1gZ-2o_MSEFUdiZu4vuNfK5hf8CTEhrUBs02ZxFtHT9eEElKllts4HSF925Cc_-_qEkeW5UT7EkSVjf7kiBEg81YIetJNWz8wlOQxKfBTBx403TpGZaAst94pWhwYzAoPBZghxMtMh1gr0qKcigK_LVU1ZFKE_J0c5nVsBTngXZB9jdLHmqygbve4N9CY4vl9ovGxvg1HZlr2vrlbfWM5fYp-8Hw4NXSEPhGk93eAwJ96LQ9Cbakr48h8-2XYIFNwZj1Akkl9hDtj_BJpyxkFC2lDN4D_CehyW4gWU1XT6aRGTyw8UHdF_GO6miKGR62qRO5nQOmsgJcqAlN1XyInzX0khV8tiX1iksra9TBfHacvK0Ivr1iB6-6MmBGFc1n1Q1hHy3qF-zy5wCHlPEUYwN5fq9Zssbbx0XskDyeZdcbPJvSIkkbBQT1irIhwPfY8TQrdpMUf8UxLKNkIo4zhvd0XEKlx7BUdsgd_DuR333oODy1sghN63Q2TMQrn0sW-hGs_2NO_f_Pcpxd8FfFYuOVaVm00)


## Table of Contents
1. 📦 [Setup and Installations](#Setup-and-Installations)
2. 🔑 [Configuration](#Configuration)
3. 📄 [Uploading and Processing Documents](#Uploading-and-Processing-Documents)
4. 🧠 [Metadata Extraction](#Metadata-Extraction)
5. ✂️ [Splitting Documents](#Splitting-Documents)
6. 🗃️ [Creating Vector Database](#Creating-Vector-Database)
7. 🔗 [Retrieval-Augmented Generation (RAG) Chain](#Retrieval-Augmented-Generation-RAG-Chain)
8. 📊 [Evaluation Metrics](#Evaluation-Metrics)
9. 🧪 [Basic Testing](#Basic-Testing)
10. 🚀 [Full Pipeline Execution](#Full-Pipeline-Execution)
11. 💬 [Interactive Assistant Interface](#Interactive-Assistant-Interface)
12. 🎯 [Final Launch](#Final-Launch)
13. 💾 [Save Progress](#Save-Progress)

---



# 📦 Setup and Installations
<a id="Setup-and-Installations"></a>

This section installs and imports necessary libraries for Financial Content Assistant.


In [None]:
# Install necessary libraries
!pip install langchain langchain-openai langchain-community openai faiss-cpu tiktoken pypdf
!pip install sentence-transformers
!pip install ipywidgets
!pip install pypdf
!pip install -U langchain-community pypdf

Collecting langchain-openai
  Downloading langchain_openai-0.3.14-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.22-py3-none-any.whl.metadata (2.4 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7

In [None]:
# Import Required Libraries
import os
import getpass
import tempfile
from google.colab import files
from IPython.display import clear_output
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# 🔑 Configuration
This section sets up API keys and environment variables.


In [None]:
# Set OpenAI API Key
import os
import getpass

# Securely input your API key (it won't be visible in the notebook)
openai_api_key = getpass.getpass("Enter your OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

# Verify the key is set (this will only show "True" or "False", not the actual key)
print(f"API key is set: {bool(os.environ.get('OPENAI_API_KEY'))}")

Enter your OpenAI API Key: ··········
API key is set: True



# 📄 Uploading and Processing Documents
Upload your financial documents (PDF, TXT, CSV, XLSX) and prepare them for analysis.
- Automatically detects file types
- Handles different encodings
- Converts spreadsheets to plain text


In [None]:
def upload_and_process_documents():
    """Upload and process financial documents (TXT, PDF, CSV, XLSX)"""
    print("Upload your financial documents (TXT, PDF, CSV, or XLSX files)")
    uploaded = files.upload()

    documents = []

    for filename, content in uploaded.items():
        try:
            # Create a temporary file
            _, file_extension = os.path.splitext(filename)
            file_extension = file_extension.lower()
            temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=file_extension)

            with open(temp_file.name, 'wb') as f:
                f.write(content)

            # Process based on file type
            if file_extension == '.pdf':
                print(f"Processing PDF: {filename}")
                loader = PyPDFLoader(temp_file.name)
                file_docs = loader.load()

                for doc in file_docs:
                    doc.metadata["source"] = filename
                    # Inside your document processing function, add:
                    metadata = extract_metadata(doc, filename)
                    doc.metadata.update(metadata)  # Add the extracted metadata

                documents.extend(file_docs)
                print(f"Successfully loaded PDF: {filename} ({len(file_docs)} pages)")

            elif file_extension == '.txt':
                print(f"Processing TXT: {filename}")
                successful = False
                for encoding in ['utf-8', 'latin-1', 'cp1252']:
                    try:
                        with open(temp_file.name, 'r', encoding=encoding) as f:
                            text = f.read()

                        doc = Document(
                            page_content=text,
                            metadata={"source": filename}
                        )
                        metadata = extract_metadata(doc, filename)
                        doc.metadata.update(metadata)

                        documents.append(doc)
                        print(f"Successfully loaded TXT: {filename} with {encoding} encoding")
                        successful = True
                        break
                    except UnicodeDecodeError:
                        continue
                if not successful:
                    print(f"Failed to decode {filename} with all attempted encodings")

            elif file_extension in ['.csv', '.xlsx', '.xls']:
                print(f"Processing spreadsheet: {filename}")

                if file_extension == '.csv':
                    df = pd.read_csv(temp_file.name)
                else:
                    df = pd.read_excel(temp_file.name)

                text = df.to_string(index=False)
                doc = Document(
                    page_content=text,
                    metadata={"source": filename}
                )
                metadata = extract_metadata(doc, filename)
                doc.metadata.update(metadata)

                documents.append(doc)
                print(f"Successfully loaded spreadsheet: {filename}")

            else:
                print(f"Unsupported file type: {file_extension} for {filename}")

        except Exception as e:
            print(f"Error processing {filename}: {str(e)}")

        finally:
            # Always clean up the temporary file
            try:
                os.unlink(temp_file.name)
            except Exception as cleanup_error:
                print(f"Error deleting temp file: {cleanup_error}")

    return documents



# 🧠 Metadata Extraction
Extract metadata like date, company name, and document type from uploaded content.


In [None]:
def extract_metadata(document, filename):
    """Extract and add richer metadata to documents"""
    metadata = {"source": filename}

    # Extract date information if available
    import re
    date_patterns = [
        r'(\d{4}-\d{2}-\d{2})',  # YYYY-MM-DD
        r'(\d{2}/\d{2}/\d{4})',  # MM/DD/YYYY
        r'(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},\s+\d{4}'  # Month DD, YYYY
    ]

    for pattern in date_patterns:
        matches = re.findall(pattern, document.page_content[:1000])  # Look in first 1000 chars
        if matches:
            metadata["date"] = matches[0]
            break

    # Extract company name if available
    company_patterns = [
        r'([A-Z][A-Za-z]+,?\s+Inc\.)',
        r'([A-Z][A-Za-z]+\s+Corporation)',
        r'([A-Z][A-Za-z]+\s+Company)'
    ]

    for pattern in company_patterns:
        matches = re.findall(pattern, document.page_content[:1000])
        if matches:
            metadata["company"] = matches[0]
            break

    # Add document type
    if "10-K" in filename or "10K" in filename:
        metadata["doc_type"] = "Annual Report"
    elif "10-Q" in filename or "10Q" in filename:
        metadata["doc_type"] = "Quarterly Report"

    return metadata

# ✂️ Splitting Documents
Split large documents into smaller manageable chunks:
- Chunk Size: 1000 characters
- Chunk Overlap: 200 characters

Essential for high quality semantic search and RAG performance.

# 🗃️ Creating Vector Database
Use document embeddings to create a searchable vector store for fast semantic retrieval.


In [None]:
def split_documents(documents, chunk_size=1000, chunk_overlap=200):
    """Split documents into chunks for processing"""
    if not documents:
        print("No documents were successfully loaded.")
        return []

    # Split documents
    print("\nSplitting documents into chunks...")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Created {len(chunks)} chunks from {len(documents)} documents/pages")
    return chunks

def create_vector_db(chunks):
    """Create a vector database from document chunks"""
    if not chunks:
        print("No chunks to create vector database with.")
        return None

    # Create vector store with in-memory persistence
    print("\nCreating vector database...")
    embeddings = OpenAIEmbeddings()

    # Use FAISS for in-memory storage to avoid SQLite issues in Colab
    # vector_db = Chroma.from_documents(
    #     documents=chunks,
    #     embedding=embeddings,
    #     persist_directory=None  # This forces in-memory storage
    # )
    vector_db = FAISS.from_documents(chunks, embeddings) # Changed to FAISS

    print("Vector database created successfully!")
    return vector_db

# 🔗 Retrieval-Augmented Generation (RAG) Chain
Set up a LangChain chain that:
- Retrieves top relevant chunks from the vector database
- Passes them to OpenAI model to generate a final answer


In [None]:
def create_rag_chain(vector_db):
    """Create a RAG-based question answering chain"""
    if not vector_db:
        print("No vector database provided to create RAG chain!")
        return None

    # Create QA chain
    print("\nCreating question-answering chain...")
    template = """
    You are a sophisticated financial analyst specializing in explaining financial information
    from company reports and financial documents.

    Use the following pieces of context to answer the user's question about financial data.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    When describing financial information, be precise about numbers and date periods.
    Clearly indicate which financial metrics come from which reporting periods.

    Context:
    {context}

    Question: {question}

    Answer:
    """

    prompt = PromptTemplate(
        template=template,
        input_variables=["context", "question"]
    )

    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)

    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_db.as_retriever(search_kwargs={"k": 4}),
        return_source_documents=True,
        chain_type_kwargs={"prompt": prompt}
    )

    print("RAG chain created successfully!")
    return qa_chain

# 📊 Evaluation Metrics
Functions to assess:
- Answer correctness
- Source reliability
- Relevance
Designed for improving system performance systematically.


In [None]:
def evaluate_response_quality(question, answer, sources):
    """Evaluate the quality of RAG responses"""
    # Use ChatGPT to evaluate response quality
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

    prompt = f"""
    Evaluate the quality of this financial response:

    Question: {question}

    Answer: {answer}

    Sources: {sources}

    Rate on a scale of 1-10:
    1. Accuracy (how factually correct)
    2. Completeness (how comprehensive)
    3. Relevance (how well it addresses the question)
    4. Clarity (how easy to understand)

    Format your response as JSON:
    {{
        "accuracy": X,
        "completeness": X,
        "relevance": X,
        "clarity": X,
        "average": X,
        "feedback": "brief feedback"
    }}
    """

    try:
        response = llm.invoke(prompt)
        evaluation = json.loads(response.content)
        return evaluation
    except:
        return {
            "accuracy": 0,
            "completeness": 0,
            "relevance": 0,
            "clarity": 0,
            "average": 0,
            "feedback": "Error evaluating response"
        }

# 🧪 Basic Testing
Run quick tests on the Assistant to validate pipeline correctness.


In [None]:
def test_financial_assistant(qa_chain, question):
    """Test the financial assistant with a sample question"""
    if not qa_chain:
        print("No QA chain available!")
        return

    print(f"\nQuestion: {question}")
    print("\nSearching and generating response...")

    try:
        result = qa_chain({"query": question})

        print("\n======= ANSWER =======")
        print(result["result"])
        print("\n======= SOURCES =======")

        sources_seen = set()
        for i, doc in enumerate(result["source_documents"][:3]):  # Show top 3 sources
            source = doc.metadata.get("source", "Unknown")
            page = doc.metadata.get("page", "")
            page_info = f" (Page {page})" if page != "" else ""

            if source not in sources_seen:
                sources_seen.add(source)
                print(f"- {source}{page_info}")

    except Exception as e:
        print(f"Error generating response: {str(e)}")

# 🚀 Full Pipeline Execution
Upload → Metadata → Chunk → Embed → RAG → Query

The entire system comes together here!


In [None]:
# Main execution flow
print("==== Financial Content Assistant Setup ====")
print("This assistant helps answer financial questions based on your documents\n")

# Step 1: Upload and process documents
print("Step 1: Upload and process documents")
documents = upload_and_process_documents()

if not documents:
    print("No documents were loaded. Please run this cell again and upload documents.")
else:
    # Step 2: Split documents into chunks
    print("\nStep 2: Creating document chunks for semantic search")
    chunks = split_documents(documents)

    if not chunks:
        print("No document chunks were created. Please check your documents.")
    else:
        # Step 3: Create vector database
        print("\nStep 3: Creating vector database for semantic search")
        vector_db = create_vector_db(chunks)

        # Step 4: Create RAG chain
        print("\nStep 4: Creating the RAG-based question answering system")
        qa_chain = create_rag_chain(vector_db)

        print("\n==== Setup Complete! ====")
        print("You can now test your financial assistant with a sample question.")

        # Optional: Test with a sample question
        sample_question = "What are the main financial highlights from these documents?"
        print("\nTesting with a sample question:")
        test_financial_assistant(qa_chain, sample_question)

        print("\nSetup complete! Run the interactive interface below to ask more questions.")

==== Financial Content Assistant Setup ====
This assistant helps answer financial questions based on your documents

Step 1: Upload and process documents
Upload your financial documents (TXT, PDF, CSV, or XLSX files)


Saving jsespublic2023.pdf to jsespublic2023.pdf
Processing PDF: jsespublic2023.pdf
Successfully loaded PDF: jsespublic2023.pdf (10 pages)

Step 2: Creating document chunks for semantic search

Splitting documents into chunks...
Created 25 chunks from 10 documents/pages

Step 3: Creating vector database for semantic search

Creating vector database...
Vector database created successfully!

Step 4: Creating the RAG-based question answering system

Creating question-answering chain...
RAG chain created successfully!

==== Setup Complete! ====
You can now test your financial assistant with a sample question.

Testing with a sample question:

Question: What are the main financial highlights from these documents?

Searching and generating response...


  result = qa_chain({"query": question})



The main financial highlights from the documents provided include:
- The financial statement was prepared in conformity with accounting principles generally accepted in the United States of America.
- The Company operates as a single segment under Accounting Standards Codification ("ASC") 280, Segment Reporting.
- The Statement of Financial Condition is for Jane Street Execution Services, LLC as of December 31, 2023.
- Cash includes amounts maintained in bank accounts, and the Company may maintain cash in deposit accounts in excess of Federal Deposit Insurance Corporation limits.
- Commission revenues are generated by the Company for acting as an agent on behalf of its customers, including certain affiliates, with the performance obligation consisting of trade execution services.
- The Company's financial statement preparation requires management to make estimates and assumptions that could affect the reported amounts of assets and liabilities. Actual amounts could differ from these e

# 💬 Interactive Assistant Interface
Run the enhanced chatbot UI:
- Detects financial question type
- Specialized visual formatting (badges)
- Chat history summary and visualization



In [None]:
# Interactive Interface
def interactive_financial_assistant(qa_chain):
    """Interactive interface for the financial assistant"""
    if not qa_chain:
        print("No QA chain available. Please set up the assistant first.")
        return

    print("\n===== Financial Content Assistant =====")
    print("Type your financial questions below. Type 'exit' to quit.")

    while True:
        question = input("\nYour question: ")

        if question.lower() in ['exit', 'quit', 'q']:
            print("Thank you for using the Financial Content Assistant!")
            break

        if not question.strip():
            continue

        try:
            # Clear previous output for better readability
            clear_output(wait=True)

            print("===== Financial Content Assistant =====")
            print(f"Question: {question}")
            print("\nProcessing...")

            result = qa_chain({"query": question})

            print("\nAnswer:")
            print(result["result"])

            print("\nSources:")
            sources_seen = set()
            for doc in result["source_documents"][:3]:  # Show top 3 sources
                source = doc.metadata.get("source", "Unknown")
                page = doc.metadata.get("page", "")
                page_info = f" (Page {page})" if page != "" else ""

                if source not in sources_seen:
                    sources_seen.add(source)
                    print(f"- {source}{page_info}")

            print("\nType your next question or 'exit' to quit.")

        except Exception as e:
            print(f"Error: {str(e)}")
            print("\nType your next question or 'exit' to quit.")


# 🎯 Final Launch
Launch the enhanced financial assistant with all features activated.


In [None]:
# Run the interactive assistant
try:
    # This will only run if qa_chain exists from the previous cell
    interactive_financial_assistant(qa_chain)
except NameError:
    print("You need to run the setup cell first to create the qa_chain.")
    print("Please run the previous cell and then run this cell again.")

===== Financial Content Assistant =====
Question: what is the figure for total assests

Processing...

Answer:
The figure for total assets as of December 31, 2023, for Jane Street Execution Services, LLC is $103,234,117.

Sources:
- jsespublic2023.pdf (Page 5)

Type your next question or 'exit' to quit.

Your question: quit
Thank you for using the Financial Content Assistant!


##Now that we've  successfully built the core RAG functionality.

##Let's enhance it with a polished UI, additional features, and testing capabilities.

##Here's a step-by-step plan that continues

In [None]:
# 📦 Import Required Libraries
from IPython.display import clear_output, HTML, display
import ipywidgets as widgets
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
import time
from google.colab import files


#Define Utility Functions

In [None]:
# 💬 Utility Functions

def show_loading_animation():
    """Display a simple loading animation in the output area"""
    with output_area:
        for _ in range(3):
            for dots in [".", "..", "..."]:
                clear_output(wait=True)
                display(HTML(f"<div style='color:#666;'><i>Processing{dots}</i></div>"))
                time.sleep(0.3)

def detect_question_type(question):
    question_lower = question.lower()
    if any(term in question_lower for term in [
        "analyze", "analysis", "performance", "how did", "how well",
        "financial results", "compare", "trend", "growth"
    ]):
        return "financial_analysis"
    if any(term in question_lower for term in [
        "invest", "stock", "share", "dividend", "return", "portfolio",
        "should i", "worth", "recommendation", "outlook"
    ]):
        return "investment"
    if any(pattern in question_lower for pattern in [
        "what is", "what are", "define", "explain", "meaning of", "definition",
        "describe", "elaborate on", "tell me about", "how does", "concept of"
    ]):
        return "term_explanation"
    return "general"


#Define Core Assistant Logic

In [None]:
# 🧠 Core Assistant Logic

def enhanced_query_assistant(question):
    question_type = detect_question_type(question)
    result = qa_chain({"query": question})

    sources = []
    sources_seen = set()
    for doc in result["source_documents"]:
        source = doc.metadata.get("source", "Unknown")
        page = doc.metadata.get("page", "")
        page_info = f" (Page {page})" if page != "" else ""
        source_citation = f"{source}{page_info}"
        if source_citation not in sources_seen:
            sources_seen.add(source_citation)
            sources.append(source_citation)

    return {
        "answer": result["result"],
        "sources": sources[:3],
        "question_type": question_type
    }


#Define UI Widgets

In [None]:
# Initialize chat history
if 'chat_history' not in globals():
    global chat_history
    chat_history = []

# Create UI widgets
question_input = widgets.Text(
    placeholder='Ask a financial question...', description='Question:', layout=widgets.Layout(width='70%')
)
ask_button = widgets.Button(
    description='Ask', button_style='primary', tooltip='Ask your question', layout=widgets.Layout(width='15%')
)
clear_button = widgets.Button(
    description='Clear Chat', button_style='danger', tooltip='Clear chat history', layout=widgets.Layout(width='15%')
)
output_area = widgets.Output(
    layout={'border': '1px solid #ddd', 'padding': '10px', 'height': '400px', 'overflow': 'auto'}
)
chat_summary_button = widgets.Button(
    description='View Chat Summary', button_style='info', tooltip='View all chat history'
)

#Button Functions

In [None]:
def on_ask_clicked(b):
    question = question_input.value
    if not question.strip():
        return
    question_input.value = ''
    with output_area:
        display(HTML(f"<div style='background-color:#f0f0f0; padding:10px;'><b>You:</b> {question}</div>"))
        show_loading_animation()
        try:
            result = enhanced_query_assistant(question)
            clear_output()
            display(HTML(f"<div style='background-color:#f0f0f0; padding:10px;'><b>You:</b> {question}</div>"))
            badge = ""
            if result["question_type"] == "financial_analysis":
                badge = "<span style='background-color:#007bff; color:white; padding:3px 6px;'>Financial Analysis</span>"
            elif result["question_type"] == "investment":
                badge = "<span style='background-color:#28a745; color:white; padding:3px 6px;'>Investment</span>"
            elif result["question_type"] == "term_explanation":
                badge = "<span style='background-color:#ffc107; color:black; padding:3px 6px;'>Term Explanation</span>"
            else:
                badge = "<span style='background-color:#6c757d; color:white; padding:3px 6px;'>General</span>"
            display(HTML(f"<div style='background-color:#e6f7ff; padding:10px;'><b>Financial Assistant{badge}:</b> {result['answer']}</div>"))
            sources_html = ", ".join(result["sources"])
            display(HTML(f"<div style='font-size:12px; color:#666;'>Sources: {sources_html}</div>"))
            chat_history.append({
                "question": question,
                "answer": result["answer"],
                "sources": result["sources"],
                "question_type": result["question_type"]
            })
        except Exception as e:
            clear_output()
            display(HTML(f"<div style='color:red;'>Error: {str(e)}</div>"))

def on_clear_clicked(b):
    with output_area:
        clear_output()
        chat_history.clear()
        display(HTML("<div style='color:#666;'>Chat history cleared.</div>"))

ask_button.on_click(on_ask_clicked)
clear_button.on_click(on_clear_clicked)

#Visualization, Summary, Export, Help

In [None]:
# def display_chat_summary():

def display_chat_summary():
    if not chat_history:
        print("No chat history to display.")
        return
    summary_data = []
    for i, exchange in enumerate(chat_history):
        summary_data.append({
            "Q#": i+1,
            "Question": exchange["question"][:50] + "..." if len(exchange["question"]) > 50 else exchange["question"],
            "Type": exchange["question_type"].replace("_", " ").title(),
            "Sources": ", ".join([s.split(" (Page")[0] for s in exchange["sources"]])
        })
    display(pd.DataFrame(summary_data))


def add_visualization_tab():
    output = widgets.Output()
    with output:
        labels = ['Financial Analysis', 'Investment', 'Term Explanation', 'General']
        counts = [
            sum(1 for item in chat_history if item['question_type'] == 'financial_analysis'),
            sum(1 for item in chat_history if item['question_type'] == 'investment'),
            sum(1 for item in chat_history if item['question_type'] == 'term_explanation'),
            sum(1 for item in chat_history if item['question_type'] == 'general')
        ]
        plt.figure(figsize=(10, 6))
        plt.bar(labels, counts, color=['#007bff', '#28a745', '#ffc107', '#6c757d'])
        plt.title('Questions by Type')
        plt.xlabel('Question Type')
        plt.ylabel('Count')
        plt.tight_layout()
        plt.show()
    return output

def add_export_functionality():
    """Add functionality to export conversation history"""
    def export_to_markdown():
        """Export chat history to markdown file"""
        markdown_content = "# Financial Content Assistant - Chat History\n\n"
        markdown_content += f"Generated on: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"

        for i, exchange in enumerate(chat_history):
            markdown_content += f"## Q{i+1}: {exchange['question']}\n\n"
            markdown_content += f"**Question Type**: {exchange['question_type'].replace('_', ' ').title()}\n\n"
            markdown_content += f"### Answer:\n\n{exchange['answer']}\n\n"
            markdown_content += f"**Sources**: {', '.join(exchange['sources'])}\n\n"
            markdown_content += "---\n\n"

        # Save to file
        filename = "financial_assistant_history.md"
        with open(filename, "w") as f:
            f.write(markdown_content)
        return filename
    export_button = widgets.Button(
        description='Export to Markdown', button_style='success', tooltip='Export chat history'
    )
    def on_export_clicked(b):
        filename = export_to_markdown()
        print(f"Exported chat history to {filename}")
        files.download(filename)
    export_button.on_click(on_export_clicked)
    return export_button

#Add Help Section
def add_help_section():
    output = widgets.Output()
    with output:
        display(HTML("""
        <div style="padding: 20px; background-color: #f9f9f9; border-radius: 10px;">
            <h2>Financial Content Assistant - Help</h2>
            <p>This application uses RAG to answer questions based on your documents.</p>
        </div>
        """))
    return output

chat_summary_button.on_click(lambda b: display_chat_summary())

#Financial Assistant Function

In [None]:
def run_enhanced_financial_assistant():
    """Launch the full financial assistant UI"""
    display(HTML("""
    <div style='background-color:#4CAF50; color:white; padding:15px; border-radius:10px; margin-bottom:15px;'>
        <h2 style='margin:0;'>Enhanced Financial Content Assistant</h2>
        <p style='margin:5px 0 0 0;'>Ask questions about your financial documents with advanced features</p>
    </div>
    """))

    input_area = widgets.HBox([question_input, ask_button, clear_button])
    display(input_area)

    with output_area:
        clear_output(wait=True)
        display(HTML("<div style='color:#666;'>Welcome to your Enhanced Financial Content Assistant! Ask any question about your uploaded documents.</div>"))

    export_button = add_export_functionality()

    chat_tab = widgets.VBox([
        output_area,
        widgets.HBox([chat_summary_button, export_button], layout=widgets.Layout(margin='10px 0 0 0'))
    ])

    tabs = widgets.Tab()
    tabs.children = [chat_tab, add_visualization_tab(), add_help_section()]
    for i, title in enumerate(["Chat", "Visualization", "Help"]):
        tabs.set_title(i, title)

    display(tabs)

    print("\nEnhanced Financial Content Assistant launched successfully!")


#Final Launch - Run Interface


In [None]:
run_enhanced_financial_assistant()


HBox(children=(Text(value='', description='Question:', layout=Layout(width='70%'), placeholder='Ask a financia…

Tab(children=(VBox(children=(Output(layout=Layout(border='1px solid #ddd', height='400px', overflow='auto', pa…


Enhanced Financial Content Assistant launched successfully!


Unnamed: 0,Q#,Question,Type,Sources
0,1,what is openai,Term Explanation,"jsespublic2023.pdf, jsespublic2023.pdf, jsespu..."


Unnamed: 0,Q#,Question,Type,Sources
0,1,what is openai,Term Explanation,"jsespublic2023.pdf, jsespublic2023.pdf, jsespu..."
1,2,what is the total libility,Term Explanation,"jsespublic2023.pdf, jsespublic2023.pdf, jsespu..."
