# Risk Summary Assistant

## Executive-Level Risk Review Using Generative AI

This prototype demonstrates how generative AI can support financial analysts by summarizing risk disclosures from SEC 10-K filings. Using Caterpillar Inc.’s Item 1A section, we extract relevant paragraphs, identify concrete risk statements, and produce a concise executive summary along with the top five company-specific risks. This tool is intended to accelerate risk review, support internal due diligence, and surface exposure patterns in a scalable way.


In [1]:
# libraries
import requests
import re
from IPython.display import Markdown, display
from google.colab import files
import openai
from google.colab import userdata

## Data Input and Scope

For this demonstration, we uploaded the full *Item 1A. Risk Factors* section from Caterpillar Inc.’s 10-K filing as a plain text file. This approach allowed us to focus on the core generative AI functionality—extracting and summarizing meaningful risk language—without introducing complexity from document retrieval or parsing.

In a production setting, this risk section would be extracted automatically using one of the following methods (perhaps any of them):

- **Web scraping** from SEC EDGAR (HTML/XBRL filings)
- **PDF parsing** for downloaded or scanned reports
- **Full-document processing** using structured formats (e.g., XML, JSON, iXBRL)

This pipeline step would isolate the relevant section (Item 1A), clean the formatting, and feed it into the AI workflow for downstream risk extraction and summarization. For now, the uploaded `.txt` file serves as a stand-in for this preprocessing stage.


In [2]:
# allows user to upload a txt file
uploaded = files.upload()
filename = list(uploaded.keys())[0]

with open(filename, 'r', encoding='utf-8') as f:
    risk_text = f.read()


Saving risk_cat.txt to risk_cat (3).txt


## Risk Section Extraction

The `extract_risk_sections()` function identifies the most relevant paragraphs from the uploaded 10-K risk disclosure. It filters for paragraphs that:

- Contain **risk-related keywords** (e.g., "interest rate", "regulation", "geopolitical")
- Meet a **minimum length threshold** to exclude boilerplate or headings

This helps reduce noise and focus the AI model on content that is likely to contain substantive business risks.

## Why Use It?

In this prototype, we uploaded only the *Item 1A. Risk Factors* section—so nearly all content is relevant. However, in a full production pipeline, where entire 10-Ks or other filings are processed, this kind of filtering becomes essential.

## Could We Do Better?

Yes. In a more advanced version, a language model could replace this rule-based filter to **evaluate each paragraph’s relevance** more intelligently. This would allow:

- Detection of risk language without relying on fixed keywords
- Better handling of varied writing styles across industries
- Reduced chance of missing less obvious but important disclosures

For now, the keyword approach offers a lightweight and explainable way to isolate likely risk paragraphs.

In [6]:
def extract_risk_sections(text, min_length=300):
    """
    Extracts paragraphs from a 10-K or similar financial document that likely describe business risks.

    The function filters paragraphs that:
    1. Contain at least one risk-related keyword.
    2. Meet a minimum character length threshold to avoid noise (e.g., headers or footnotes).

    Args:
        text (str): Full text of a risk disclosure section (e.g., Item 1A from a 10-K).
        min_length (int): Minimum number of characters required for a paragraph to be considered relevant.

    Returns:
        list of str: A list of paragraphs that likely contain meaningful risk-related content.
    """

    # Split the text into paragraphs based on double line breaks or carriage returns
    paragraphs = re.split(r'\n{2,}|\r{2,}', text)

    # Define a list of keywords commonly associated with business risks
    keywords = [
        "risk", "uncertainty", "volatility", "instability", "regulation",
        "compliance", "inflation", "interest rate", "geopolitical",
        "supply chain", "exposure", "macroeconomic", "liquidity", "credit", "market condition"
    ]

    relevant = []

    for para in paragraphs:
        lower = para.lower()  # Normalize for keyword matching

        # Check if paragraph contains a keyword and is long enough to be meaningful
        if any(k in lower for k in keywords) and len(para) >= min_length:
            relevant.append(para.strip())  # Strip whitespace and add to result

    return relevant

# Call the extract_risk_sections function on the previously loaded risk_text
relevant_chunks = extract_risk_sections(risk_text)

# Return the number of chunks extracted for confirmation
len(relevant_chunks)

30

## Executive-Level Meta Summary

The `generate_meta_summary()` function produces a high-level overview of the company’s risk landscape using OpenAI’s GPT model. It takes the full set of extracted risk paragraphs and prompts the model to:

- Write a concise **executive summary** describing the overall risk environment
- Identify and explain the **top 5 most significant, company-specific risks**

This step shifts the focus from paragraph-level extraction to a more strategic synthesis—useful for decision-makers who need a quick understanding of key exposures without reviewing each section individually.

While the summaries are LLM-generated, the prompt is carefully designed to discourage generic language and emphasize **Caterpillar’s own framing** of risk. The output is returned in Markdown format for clear, readable display.


In [4]:
def generate_meta_summary(all_risks_text, model="gpt-4", temperature=0.3):
    """
    Generates a high-level executive summary and lists the top 5 most significant risks
    from a full set of risk disclosures using OpenAI's chat model.

    Args:
        all_risks_text (str): The combined risk disclosure text (ideally chunked and summarized).
        model (str): The OpenAI model to use (default: "gpt-4").
        temperature (float): Sampling temperature for output randomness (default: 0.3).

    Returns:
        str: A combined executive summary and top 5 risk list.
    """

    # Get your API key from Colab Secrets
    api_key = userdata.get("OPENAI_API_KEY")
    if api_key is None: # error
        raise ValueError("OPENAI_API_KEY not set in Colab Secrets.")

    # activates the OpenAI instance
    client = openai.OpenAI(api_key=api_key)

    # System message sets the AI's role and expertise.
    # This helps guide the tone and focus of the model throughout the interaction.
    system_msg = (
        "You are a financial analyst skilled at identifying and summarizing business risks "
        "from corporate filings such as 10-K reports."
    )

    # User message defines the actual task to perform on the input text.
    # This prompt tells the model to:
    # 1. Write a short executive summary of the company's risk landscape.
    # 2. List the top 5 specific, company-grounded risks — not generic ones.
    # It also asks for Markdown formatting to support nice rendering in Colab.
    user_msg = f"""
                You are reviewing the risk disclosures of Caterpillar Inc. from their 10-K filing.
                Here is the full set of extracted risk paragraphs:

                {all_risks_text}

                Your task:
                1. Write a concise 3–4 sentence executive summary that describes the overall risk landscape.
                2. Then list the top 5 most concrete, company-specific risks with brief explanations (1–2 sentences each).
                Avoid generic or boilerplate phrasing and use Caterpillar's framing where possible.
                """

    # creates chat response
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg},
        ],
        temperature=temperature
    )

    return response.choices[0].message.content.strip()


## Displaying the Summary

To ensure the final executive summary is easily readable within the notebook, we use a helper function that renders the output using `IPython.display.Markdown`. This allows the model’s Markdown-formatted response—headings, bullet points, spacing—to be rendered cleanly, without scrollbars or code formatting issues.

Separating display from generation also keeps the logic modular: the summary text can be reused, saved, or reformatted independently of how it’s presented in the notebook.


In [7]:
def display_markdown(text):
    """
    Displays markdown-formatted text cleanly in Google Colab.

    Args:
        text (str): Markdown-formatted text to display.
    """
    display(Markdown(text))

# displays the text in a readable format
combined_text = "\n\n".join(relevant_chunks)
meta_summary = generate_meta_summary(combined_text)
display_markdown(meta_summary)

Executive Summary:
Caterpillar Inc. faces a diverse set of risks stemming from global economic conditions, industry-specific factors, and operational challenges. The company's performance is significantly influenced by cyclical demand for its products, which can be impacted by economic downturns, commodity price volatility, and infrastructure spending rates. Additionally, Caterpillar's global operations expose it to risks such as currency exchange rate fluctuations, political instability, and stringent environmental regulations.

Top 5 Company-Specific Risks:

1. Economic Sensitivity: Caterpillar's business is highly sensitive to global and regional economic conditions, with demand for its products being cyclical and potentially reduced during periods of economic weakness. This could lead to increased expenses and adversely affect the company's financial condition.

2. Commodity Price Volatility: The company's key industries (energy, transportation, and mining) are significantly influenced by commodity price dynamics. Volatility in these markets, especially in the post-COVID period, may lead to reduced capital expenditures and decreased demand for Caterpillar's products and services.

3. Supply Chain Disruptions: Caterpillar relies heavily on suppliers for materials required for manufacturing its products. Disruptions in deliveries, decreased availability of raw materials, or production challenges at suppliers could adversely affect the company's ability to meet customer demand and increase operating costs.

4. Impact of Natural Disasters and Pandemics: Major natural disasters, power losses, software or hardware malfunctions, pandemics, or cyber-attacks that the company's disaster recovery plans do not adequately address could adversely affect its operations, reputation, and financial condition.

5. Regulatory Changes: Caterpillar's operations are subject to a wide range of trade and anti-corruption laws and regulations. Changes in these regulations or additional regulations could add significant cost or operational constraints, potentially adversely affecting the company's results of operations and financial condition.

## Summary & Next Steps

This prototype demonstrates a workflow for leveraging generative AI to extract and summarize business risks from SEC 10-K filings—using Caterpillar Inc.’s Item 1A section as a case study. The system filters relevant content, applies an LLM to generate both paragraph-level and executive-level summaries, and displays results in a clear, readable format.

The goal is not to replace human analysts, but to serve as a **productivity multiplier**—reducing time spent reading dense disclosures and surfacing the most important risk narratives more quickly.

### Next Steps

- **Validate with Human Analysts**  
  Collaborate with subject-matter experts to test the usefulness and trustworthiness of the generated summaries. Analyst feedback will guide refinement of prompts, output tone, and integration into real workflows.

- **Refine Prompting and Evaluation**  
  Continue iterating on the LLM prompt design to improve specificity, reduce repetition, and align outputs with internal standards and expectations.

- **Incorporate Dynamic Context** *(Planned)*  
  Expand the model’s situational awareness by integrating:
  - **Live market data** (e.g., commodity prices, interest rates)
  - **Recent news and headlines** relevant to disclosed risks
  - **Macro and geopolitical trends** to contextualize risk impact  
  This would allow the tool to surface which risks are not only stated but **currently active or evolving**.

- **Automate Ingestion**  
  Replace manual uploads with automated ingestion of 10-K filings from sources like the SEC EDGAR system (HTML, XBRL, or PDF) to support scaling across companies and filings.

- **Enable Risk Classification & Tagging**  
  Add functionality to group and label risks by category (e.g., regulatory, supply chain, operational), enabling integration with dashboards or portfolio monitoring systems.

This lays the groundwork for a practical AI-powered risk assistant that can accelerate risk review, surface emerging exposures, and scale intelligently across coverage areas.


---

> **Disclaimer:**  
> This analysis is part of a prototype and was generated using a large language model. The accuracy and completeness of the summaries have not been independently validated and should not be relied upon to assess the actual business risks faced by Caterpillar Inc. or any other entity. This tool is intended for demonstration and exploratory purposes only.

