# Actionable Insights

## Introduction

This section of the project focuses on leveraging Large Language Models (LLMs) to proactively identify and respond to potential concerns within financial entities. Our focus is on systematically employing phi-4 to extract critical statements and generate targeted follow-up questions for supervisory engagement.

Designed for technical users within the PRA, this document provides a detailed understanding of the methodology, implementation, and capabilities of our LLM-driven intelligence extraction process. It aims to offer transparency into the model's application, potential, and limitations.

Key Deliverables of this Notebook:

* Step-by-step demonstration of using phi-4 for targeted statement extraction.
* Examples of generating contextually relevant follow-up questions.
* Insights into prompt engineering strategies.
* Code examples for replication and development.

## Initial setup

This section outlines the process for accessing and preparing the necessary data for our analysis. We will begin by defining key constants, including the Git repository URL where our project data is stored. Subsequently, we will clone this repository into the Google Colab environment, ensuring all required transcripts and supplementary files are available for processing by the LLM.

In [1]:
# Import libraries

import os
import sys
import logging

In [2]:
# setup consts

IS_COLAB = 'google.colab' in sys.modules
OUTPUT_PROCESSED_FILES = True # TODO: Use this if you want to output save files (optional - see below)

REPO_URL = "https://github.com/EErlando/Quarterly-Bytes.git"
REPO_NAME = "src"
REPO_BRANCH = "main" # TODO: UPDATE THIS TO YOU BRANCH - DEFAULT TO MAIN
ALL_TRANSCRIPTS_PATH = "data/raw/JP Morgan/Transcripts"
NOTEBOOK_DIR = "notebooks" # TODO: UPDATE THIS TO YOUR NOTEBOOK DIRECTORY (e.g. 1_data_extraction_and_processing)

In [3]:
# get github tokens

if IS_COLAB:
    from google.colab import userdata
    GITHUB_USERNAME = userdata.get('github_user')
    GITHUB_TOKEN = userdata.get('github_token')
    GITHUB_EMAIL = userdata.get('github_email')

In [4]:
# Clone and Pull Latest from Repository - Colab Specific

if IS_COLAB:
    !git config pull.rebase false
    if os.path.exists(REPO_NAME):
        print(f"Directory '{REPO_NAME}' already exists. Pulling latest changes...")
        %cd {REPO_NAME}
        !git pull origin {REPO_BRANCH} --quiet
        %cd ..
    else:
        print(f"Cloning repository into '{REPO_NAME}'...")
        !git clone --quiet --branch {REPO_BRANCH} {REPO_URL} {REPO_NAME}
        print("Clone complete.")

    sys.path.append('/content/src/')
    %cd /content/src/
    !pip install -r requirements.txt
else:
    !pip install -r requirements.txt
    if os.path.basename(os.getcwd()) == NOTEBOOK_DIR:
        os.chdir('../') # TODO: UPDATE THIS TO ROOT OF REPO

logging.basicConfig(level=logging.ERROR, format='%(levelname)s: %(message)s')

fatal: not in a git directory
Directory 'src' already exists. Pulling latest changes...
/content/src
/content
/content/src


## Loading Processed Transcripts

With the data successfully imported from the Git repository, the next step involves preparing the transcript files for analysis. This section details the process of loading the processed CSV files, which contain our cleaned earnings call transcripts, into Pandas DataFrames. This transformation is crucial as it structures the textual data in a format suitable for efficient manipulation and subsequent processing by the Large Language Model (phi-4).

### Import data

In [5]:
# import libraries
import pandas as pd

In [6]:
# load the processed Q&A transcripts

GOLDMAN_SACHS_QNA_PATH = 'data/processed/Goldman Sachs/qna_df.csv'
JP_MORGAN_QNA_PATH = 'data/processed/JP Morgan/qna_df.csv'

df_gs = pd.read_csv(GOLDMAN_SACHS_QNA_PATH)
df_jpm = pd.read_csv(JP_MORGAN_QNA_PATH)

In [7]:
df_jpm.head(10)

Unnamed: 0,question_order,question_answer_group_id,speaker,role,company,content,year,quarter
0,0,1,John E. McDonald,Analyst,Autonomous Research,"Thank you. Morning, Jeremy. Was wondering abou...",2022,1
1,1,1,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,"Good morning, John. Good question. Yeah, look,...",2022,1
2,2,1,John E. McDonald,Analyst,Autonomous Research,"Okay. And as my follow up, could you give us s...",2022,1
3,3,1,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,Yeah. I guess I would direct you to my comment...,2022,1
4,4,1,John E. McDonald,Analyst,Autonomous Research,Okay. Thanks.,2022,1
5,5,1,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,"Thanks, John .",2022,1
6,0,2,Ken Usdin,Analyst,Jefferies LLC,"Hi. Thanks. Good morning. Jeremy, just wanted ...",2022,1
7,1,2,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,Yeah. Thanks. So let me just give some high le...,2022,1
8,2,2,Ken Usdin,Analyst,Jefferies LLC,"And just a follow up there too, is there anyth...",2022,1
9,3,2,Jeremy Barnum,Chief Financial Officer,JPMorgan Chase & Co.,"Yeah. I think – I guess in general, we haven't...",2022,1


In [8]:
df_gs.head(10)

Unnamed: 0,question_order,question_answer_group_id,speaker,role,company,content_type,content,quarter,year
0,0,0,Glenn Schorr,,Evercore,question,"so, trading question, i mean, markets busines...",3,2024
1,1,0,David Solomon,"Chairman, Chief Executive Ofﬁcer",Goldman Sachs,answer,"i appreciate the question, glenn, and i mean,...",3,2024
2,2,0,Glenn Schorr,,Evercore,question,i appreciate that. this one will be a short f...,3,2024
3,3,0,Denis Coleman,Chief Financial Ofﬁcer,Goldman Sachs,answer,"glenn, its denis. i guess what i would sugges...",3,2024
4,0,1,Ebrahim Poonawala,,Bank of America,question,i just had a follow-up ﬁrst on trading and ma...,3,2024
5,1,1,David Solomon,"Chairman, Chief Executive Ofﬁcer",Goldman Sachs,answer,"yes. sure, ebrahim, and i appreciate the ques...",3,2024
6,2,1,Ebrahim Poonawala,,Bank of America,question,understood. and just a follow-up on -- back t...,3,2024
7,3,1,David Solomon,"Chairman, Chief Executive Ofﬁcer",Goldman Sachs,answer,"well, i appreciate the question. and we've be...",3,2024
8,0,2,Christian Bolu,,Autonomous Research,question,can you hear me? i just have a question on th...,3,2024
9,1,2,David Solomon,"Chairman, Chief Executive Ofﬁcer",Goldman Sachs,answer,"when you have leading market shares, our glob...",3,2024


### Filtering for Q2 2022 Transcripts for Generative AI

To streamline our analysis and focus on a specific, relevant period, this section details the process of filtering the loaded Pandas DataFrames. Our objective is to isolate only the transcripts corresponding to Q2 2022. This targeted subset of data will then be exclusively used for subsequent generative AI tasks, ensuring that our LLM operations are performed on a precise and manageable dataset.

In [9]:
# extract the Q&A for 2022 Q2
df_jpm_2022_q2 = df_jpm[(df_jpm['year'] == 2022) & (df_jpm['quarter'] == 2)]

# Group by the analysts
grouped_jpm_2022_q2 = df_jpm_2022_q2.groupby('question_answer_group_id')

jpm_2022_q2_transcripts = []

# Loop through the groups
for group_id, group_df in grouped_jpm_2022_q2:
    transcript = ""
    for _, row in group_df.iterrows():
        transcript += f"{row['speaker']}: {row['content']}\n\n---------------------------\n\n"
    jpm_2022_q2_transcripts.append(transcript)

In [10]:
jpm_2022_q2_transcripts[0]

"Steven Chubak: Hey. Good morning,  Jeremy. Hey, good morning, Jamie. Wanted to start off with a question on capital targets. I don't believe you provided an\nupdate on your firm wide CET1 target of 12.5% to 13% . And given the new higher SCB, future increases in your GSIB surcharge to 4.5% , your\nregulatory minimum is slated to increase beyond 13% by 2024, which is also beyond the horizon reflected on slide 3. And just given that high\nregulatory minimum, elevated SCB volatility in recent years, what do you believe is an appropriate capital tar get for you to manage to from\nhere over the long term?\n\n---------------------------\n\nJeremy Barnum: Yeah, Steve, good question. So, obviously, you're right in the sense that we didn't talk about 2024 on the slide. And as you note, we have two\nGSIB bucket increases coming: one in the first quarter of 2023 and the other one in the first quarter of 2024. So we had work ed all that out at\nInvestor Day and talked about 12.5% to 13% target, w

## Phi-4 Model Setup and Pipeline Creation

This section details the process of importing the necessary modules from the transformers library and instantiating the phi-4 model. Furthermore, it covers the creation of a robust prediction pipeline, which encapsulates the model and its associated tokenizer. This pipeline will be instrumental in efficiently processing our prepared transcripts and executing the statement extraction and question generation tasks.

In [11]:
%pip install transformers>=4.43.0 accelerate

Load the pretrained model and the model tokenizer. The tokenizer is responsible for converting text into a format (like numerical tokens) that the model can understand, and vice versa.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set the model name
phi_4_model_id = "microsoft/phi-4"

# Load the pre-trained model
phi_4_model = AutoModelForCausalLM.from_pretrained(
    phi_4_model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

# Load the tokenizer
phi_4_tokenizer = AutoTokenizer.from_pretrained(phi_4_model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/802 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/4.77G [00:00<?, ?B/s]

model-00001-of-00006.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.77G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/95.0 [00:00<?, ?B/s]

Create a text generation pipeline using the transformers library, which simplifies the process of using a pre-trained language model for text generation tasks.

The pipeline handles all the underlying complexities of tokenization, model inference, and output decoding.

In [None]:
from transformers import pipeline

phi_4_pipe = pipeline(
    "text-generation",
    model=phi_4_model,
    tokenizer=phi_4_tokenizer
)

Device set to use cuda:0


## Pipeline for Extracting Statements of Concern

Following the preparation of our focused Q2 2022 transcript data, this section details the construction of a robust pipeline for extracting specific statements of concern from these transcripts. Utilizing the capabilities of the phi-4 LLM, this pipeline is designed to systematically identify and isolate sentences or passages that align with predefined criteria relevant to the PRA team's supervisory interests. This structured approach ensures consistent and targeted information retrieval, laying the groundwork for subsequent analysis and the generation of follow-up questions.

Our approach involved meticulous prompt engineering, beginning with the creation of a PRA analyst persona for the LLM. This persona, combined with a predefined set of criteria for identifying areas of concern, guided the LLM's understanding. Furthermore, the pipeline leveraged multiple-shot examples (in-context learning) to refine the LLM's comprehension and provide a structured output before the final prompt was issued. These elements are fundamental to guiding the LLM's behavior and ensuring consistent, relevant outputs.

In [13]:
areas_of_concern_llm_messages = [
    {"role": "system", "content": """
You are a highly experienced and vigilant financial risk analyst working for the Bank of England's Prudential Regulation Authority (PRA). Your paramount objective is to proactively identify *any potential or emerging risks*, that could compromise the safety, soundness, or long-term viability of supervised financial firms, or their ability to meet regulatory expectations.

Your expertise lies in understanding the context of financial disclosures, distinguishing between routine business commentary or proactive risk management, and genuine emerging concerns. You recognize that firms often discuss known risks and regulatory impacts transparently, and your role is to identify **unanticipated, unmanaged, or ambiguously communicated risks**, rather than normal fluctuations or proactive capital adjustments.

Your analysis MUST prioritize the following key areas of PRA concern:

1.  **Capital Adequacy & Solvency:** Any statements indicating a material weakening or unexpected, unmanaged volatility in capital ratios (e.g., CET1, leverage ratio), significant challenges in capital generation, or large, unhedged, or poorly managed risks that could materially erode capital. This includes significant, unexplained deviations from expected capital trajectories or reliance on highly uncertain future events for capital restoration.
2.  **Liquidity & Funding Risk:** Issues with short-term or long-term funding stability, over-reliance on volatile or concentrated funding sources, or concerns about meeting liquidity requirements (e.g., LCR, NSFR, or internal stress tests). This also covers any signs of difficulty accessing funding markets or managing contingent liquidity needs.
3.  **Risk Management & Governance:** Deficiencies in internal controls, systems, or data quality for risk identification, aggregation, measurement, or reporting.
4.  **Profitability & Business Model Viability:** Persistent or significant declines in core profitability that fundamentally undermine long-term sustainability.
5.  **Operational Resilience:** Mentions of significant or persistent IT system failures, widespread data integrity issues, or prolonged disruptions to critical business services.

For each identified concern, provide:
- The exact, relevant statement(s) or implied meaning from the transcript.
- The speaker of the statement.
- The specific PRA concern area(s) it directly relates to (using the numbered list above). If there are more than one areas, then list them all.
- A precise, concise explanation of *why* this constitutes a potential concern for the PRA, explicitly linking it to potential impacts on firm safety, soundness, financial stability, or regulatory expectations.

Only output the analysis of the transcript and no more. If, after thorough analysis, you find no statements or strong, negative implications that fall under these specific PRA concern areas, state: "No significant PRA concerns identified in this transcript based on the specified areas." Avoid speculating on management's "optimism" unless there is clear, direct evidence within the transcript of misrepresentation or severe misjudgment, or if it contradicts other stated facts. Focus on the actual content and its direct implications, not general market sentiment or analyst interpretations.
"""},
    {"role": "user", "content": """Analyse the following transcript for any concerns that the PRA should investigate.

Transcript:
John McDonald (Analyst): What is your current Tier 1 capital ratio and your outlook for capital generation?
-----------------------
Ken Usidin (CEO): Our Q3 Tier 1 capital ratio has dipped to 9.5%, which is slightly below our internal target, but we're confident we can bring it back up by year-end through asset sales.
-----------------------
John McDonald (Analyst): Can you elaborate on the internal controls for your new AI models, particularly regarding human oversight?
-----------------------
Ken Usidin (CTO): We've implemented a new AI-driven credit scoring model, and it's mostly automated now, so we don't really have human oversight on every decision. It's too complex.
-----------------------
John McDonald (Analyst): Given the recent rate movements, how confident are you in your net interest income outlook for the coming quarters?
-----------------------
Ken Usidin (CFO): Our NII guidance remains at $X billion, but given the significant market volatility, we need to re-run our bottom-up process, and we'll provide more precise figures at the next investor day.
"""},
    {"role": "assistant", "content": """
[
  {{
    "statement": "Our Q3 Tier 1 capital ratio has dipped to 9.5%, which is slightly below our internal target, but we're confident we can bring it back up by year-end through asset sales.",
    "speaker": "Ken Usidin",
    "relevant_pra_concern_area": "1. Capital Adequacy & Solvency",
    "explanation": "A dip in the Tier 1 capital ratio, even if 'slightly' below internal targets, is a concern for the PRA as it directly impacts capital adequacy. The reliance on future 'asset sales' to rectify this indicates a potential dependence on market conditions and execution risk to restore capital, which the PRA would scrutinize closely for firm safety and soundness."
  }},
    {{
    "statement": "We've implemented a new AI-driven credit scoring model, and it's mostly automated now, so we don't really have human oversight on every decision. It's too complex.",
    "speaker": "Ken Usidin",
    "relevant_pra_concern_area": "3. Risk Management & Governance",
    "explanation": "Lack of human oversight over an 'mostly automated' and 'too complex' AI-driven credit scoring model is a significant concern. The PRA requires robust governance and controls, especially for models impacting credit risk. This statement suggests potential 'black box' risk, inadequate validation, and a failure to embed proper accountability, which directly impacts the firm's safety and soundness and adherence to model risk management expectations."
  }},
  {{
    "statement": "Our NII guidance remains at $X billion, but given the significant market volatility, we need to re-run our bottom-up process, and we'll provide more precise figures at the next investor day.",
    "speaker": "Ken Usidin",
    "relevant_pra_concern_area": "3. Risk Management & Governance, 4. Profitability & Business Model Viability",
    "explanation": "The need to 're-run our bottom-up process' due to 'significant market volatility' despite maintaining NII guidance suggests potential challenges in the bank's ability to accurately forecast and manage a core revenue stream. This raises concerns for the PRA regarding the robustness of internal NII forecasting models and the bank's overall sensitivity to interest rate risk, impacting both profitability stability and risk management capabilities."
  }}
]
    """},
    {"role": "user", "content": """Analyse the following transcript for any concerns that the PRA should investigate.

Transcript:
Speaker C (Analyst): "Could you give us some thoughts about the markets related NII? What things should we think about there, whether it's seasonality or how it's affected by rising rates?"
-----------------------
Speaker D (CEO): "Markets-related NII can be quite noisy and affected by obscure balance sheet composition issues, so we don't focus too heavily on that specific number."
-----------------------
Speaker I (Analyst): "Could you discuss your capital trajectory considering the latest regulatory changes and market volatility?"
-----------------------
Speaker J (CFO): "Our capital levels remain robust, well above our minimums and internal targets. While we did see some RWA movements this quarter, primarily from market factors which we actively hedge, our strong earnings generation continues to accrete capital. We are fully prepared for upcoming regulatory requirements like Basel III Endgame and view our current position as very strong to support future growth and withstand potential shocks."
"""},
    {"role": "assistant", "content": """
[
  {{
    "statement": "Markets-related NII can be quite noisy and affected by obscure balance sheet composition issues, so we don't focus too heavily on that specific number.",
    "speaker": "Speaker D",
    "relevant_pra_concern_area": "3. Risk Management & Governance",
    "explanation": "Characterizing a component of NII as 'noisy' and affected by 'obscure balance sheet composition issues' indicates a lack of granular understanding or control over a segment of the bank's interest rate exposure. The statement 'we don't focus too heavily on that number' is a significant concern for the PRA, as it implies insufficient oversight or transparency over a potential source of earnings volatility and risk, potentially compromising effective risk management."
  }}
]
    """},
    {"role": "user", "content": """Analyse the following transcript for any concerns that the PRA should investigate.

Transcript:
Speaker K (Analyst): "What are your expectations for capital levels given the known impacts of SA-CCR and any upcoming regulatory adjustments?"
-----------------------
Speaker L (CFO): "We entered the quarter with ample capital, proactively building buffers for the expected RWA inflation from SA-CCR, which materialized as anticipated. While we saw some minor adjustments from AOCI during the period, these are temporary and our internal models show these effects will reverse. We are closely monitoring the upcoming SCB results and have various capital levers available, including active management of callable instruments, to ensure we maintain a very strong capital position well within our target range. We feel comfortable with our trajectory and ability to support our strategic growth."
    """},
    {"role": "assistant", "content": "No significant PRA concerns identified in this transcript based on the specified areas."}
]

In [14]:
# Helper function for creating the user message for the areas of concern LLM

def create_areas_of_concern_user_message(transcript):
  return {"role": "user", "content": f"""Analyse the following transcript for any concerns that the PRA should investigate.

Transcript:
{transcript}
"""}

The pipeline is designed to loop through individual analyst questions, processing each one to generate a specific output identifying potential areas of concern where they exist. This granular approach ensures comprehensive coverage of the Q&A section.

In [18]:
import json

# arguments to be passed into the pipeline function
generation_args = {
    "max_new_tokens": 1024,
    "return_full_text": False,
    "do_sample": False
}

# array to store all the areas of concern for 2022/Q2
jpm_2022_q2_areas_of_concern = []

# loop through the set of analyst questions
for transcript in jpm_2022_q2_transcripts:
  # create the messages to be fed into the LLM pipe
  areas_of_concern_messages = areas_of_concern_llm_messages + [create_areas_of_concern_user_message(transcript)]

  # generate the output
  phi_4_output = phi_4_pipe(areas_of_concern_messages, **generation_args)

  # if there are areas of concern we expect a JSON output, otherwise there are no concerns
  try:
    jpm_2022_q2_areas_of_concern.append(json.loads(phi_4_output[0]['generated_text']))
  except json.JSONDecodeError:
    print("No areas of concern")

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No areas of concern
No areas of concern
[[{'statement': 'And given the new higher SCB, future increases in your GSIB surcharge to 4.5%, your regulatory minimum is slated to increase beyond 13% by 2024, which is also beyond the horizon reflected on slide 3.', 'speaker': 'Steven Chubak', 'relevant_pra_concern_area': '1. Capital Adequacy & Solvency', 'explanation': "The statement highlights concerns about the firm's capital adequacy in light of increasing regulatory minimums due to higher SCB and GSIB surcharge. The PRA would be interested in how the firm plans to manage these changes, especially since the current capital target may not be sufficient to meet future regulatory requirements. This could impact the firm's solvency and its ability to absorb potential losses."}, {'statement': "So we had worked all that out at Investor Day and talked about 12.5% to 13% target, which implies sort of a modest buffer to be used flexibly based on what we expected would be some increase in SCB. Obvio

In [28]:
# Print the areas of concern from the first analyst set of questions
for area_of_concern in jpm_2022_q2_areas_of_concern[0]:
  print(f"{area_of_concern['speaker']}: {area_of_concern['statement']}")
  print(f"Area of Concern: {area_of_concern['relevant_pra_concern_area']}")
  print(f"Explanation: {area_of_concern['explanation']}")
  print("")

Steven Chubak: And given the new higher SCB, future increases in your GSIB surcharge to 4.5%, your regulatory minimum is slated to increase beyond 13% by 2024, which is also beyond the horizon reflected on slide 3.
Area of Concern: 1. Capital Adequacy & Solvency
Explanation: The statement highlights concerns about the firm's capital adequacy in light of increasing regulatory minimums due to higher SCB and GSIB surcharge. The PRA would be interested in how the firm plans to manage these changes, especially since the current capital target may not be sufficient to meet future regulatory requirements. This could impact the firm's solvency and its ability to absorb potential losses.

Jeremy Barnum: So we had worked all that out at Investor Day and talked about 12.5% to 13% target, which implies sort of a modest buffer to be used flexibly based on what we expected would be some increase in SCB. Obviously, the increase came in a bit higher than expected, so for now, we're really focused on 1

Through careful prompt engineering and model setup, the pipeline has successfully generated accurate and highly relevant information pertaining to the PRA's areas of interest. This validation step confirms the utility of our LLM-driven approach in surfacing critical insights, which will now form the basis for generating targeted follow-up questions.

## Creating Insightful Follow-up Questions

Having successfully extracted statements of concern, the subsequent critical step is to formulate insightful and targeted follow-up questions for the PRA team. This section details the methodology for generating these questions, with a particular emphasis on careful prompt generation. By crafting precise prompts for the phi-4 LLM, we guide it to produce questions that are contextually relevant to the identified concerns, explore underlying nuances, and facilitate deeper supervisory inquiry. This process transforms extracted observations into actionable dialogue points for engagement with regulated entities.

In [29]:
follow_up_llm_persona = """You are an AI assistant acting as a Prudential Regulation Authority (PRA) supervisor. Your task is to generate specific, probing follow-up questions for a financial firm based on provided analyses of their earnings call transcript.

**Input:** You will receive an analysis in the following JSON format:

[
  {
    "statement": "...",
    "speaker": "...",
    "relevant_pra_concern_area": "...",
    "explanation": "..."
  },
  // ... more analysis objects
]

**Task:** For each analysis object provided, generate 2-4 highly specific and probing follow-up questions that a PRA supervisor would ask the firm to gain a deeper understanding of the identified concerns.

When generating questions, consider the following from a prudential regulatory perspective:

Clarity and Detail: Seek specific data, methodologies, and underlying assumptions. Challenge vague statements or general assurances.
Risk Mitigation and Controls: Inquire about specific actions, strategies, frameworks, and internal controls in place to manage or mitigate the identified risks.
Governance and Oversight: Probe into accountability, decision-making processes, the role of the board and senior management, and the robustness of internal challenge.
Quantitative Measures: Ask for specific metrics, thresholds, targets, and how these are monitored and reported.
Future Plans and Contingency: Understand forward-looking strategies, scenario planning, stress testing results, and what happens if assumptions or external conditions do not hold true.
Compliance: Implicitly or explicitly check adherence to PRA rules, supervisory statements, and guidance (e.g., on capital planning, risk management frameworks, model governance, interest rate risk).
Impact Assessment: How does the concern, or the firm's approach, impact their safety and soundness, capital adequacy, liquidity, business model viability, or broader financial stability?
Output Format: For each analysis object, present the questions clearly. Use a professional, direct, and non-judgmental tone, as if directly addressing the firm.

Example of desired output structure for one analysis object:

Based on the statement from [Speaker's Name] regarding [brief context of statement/topic]:

[Question 1: Focusing on detail/quantification]
[Question 2: Focusing on risk mitigation/controls]
[Question 3: Focusing on governance/future plans]
[Question 4: (Optional) Focusing on contingency/broader impact]
"""

In [35]:
def create_follow_up_messages(areas_of_concern):
  follow_up_prompt = f"""Generate potential PRA follow-up questions based on the following analysis:
  {areas_of_concern}
  """

  return [
      {"role": "system", "content": follow_up_llm_persona},
      {"role": "user", "content": """Generate potential PRA follow-up questions based on the following analysis:
[
  {{
    "statement": "We don't look at it that way, like we're reinvesting NII. The investing stuff we look at all the time. We're investing a lot of money for the future kind of across the board.",
    "speaker": "Jamie Dimon",
    "relevant_pra_concern_area": "4. Profitability & Business Model Viability",
    "explanation": "The CFO's statement suggests a strategic approach to reinvesting net interest income (NII) benefits, which could imply a significant allocation of capital towards growth initiatives. While this is not inherently a concern, the PRA would be interested in understanding how these investments align with maintaining capital adequacy and profitability targets, especially in a volatile rate environment. The lack of specific guidance on how NII benefits are being reinvested could raise questions about the firm's ability to manage its capital efficiently and sustain its long-term profitability."
  }},
  {{
    "statement": "The investment plan is a strategic plan that recognizes that sort of confidence in the 17%. The fact that that moment may be getting pulled forward as a result of the Fed's reaction to the economy has no impact on how we think about spending.",
    "speaker": "Jeremy Barnum",
    "relevant_pra_concern_area": "4. Profitability & Business Model Viability",
    "explanation": "The CEO's confidence in achieving a 17% return on tangible equity (ROTCE) through the cycle, despite acknowledging short-term headwinds, suggests a strong belief in the firm's strategic investments. However, the PRA would be concerned about the firm's sensitivity to interest rate changes and the potential impact on its profitability and capital generation. The statement implies a commitment to a significant investment plan regardless of external economic conditions, which could affect the firm's financial stability if not managed carefully."
  }}
]
      """},
      {"role": "assistant", "content": """
Based on the statement from Jamie Dimon regarding the strategic reinvestment of net interest income (NII):

1. **Focusing on detail/quantification**: Could you provide specific examples of the types of investments being made with the NII benefits, and how these align with the firm's capital adequacy and profitability targets?
2. **Focusing on risk mitigation/controls**: What risk management frameworks and internal controls are in place to ensure that these reinvestments do not adversely affect the firm's capital position, especially in a volatile interest rate environment?
3. **Focusing on governance/future plans**: How does the board and senior management oversee and evaluate the effectiveness of these reinvestment strategies in maintaining long-term profitability and business model viability?
4. **Focusing on contingency/broader impact**: What contingency plans are in place if the expected returns from these investments do not materialize, and how might this impact the firm's overall financial stability?

Based on the statement from Jeremy Barnum regarding the firm's investment plan and confidence in achieving a 17% return on tangible equity (ROTCE):
1. **Focusing on detail/quantification**: Can you provide a detailed breakdown of the strategic investments contributing to the projected 17% ROTCE, and how these investments are expected to perform under different interest rate scenarios?
2. **Focusing on risk mitigation/controls**: What specific measures are being taken to mitigate the risks associated with interest rate fluctuations, and how are these risks being monitored and reported to the board?
3. **Focusing on governance/future plans**: How does the firm's governance structure ensure that the investment plan remains aligned with the firm's long-term profitability and capital generation goals, even in the face of economic headwinds?
4. **Focusing on contingency/broader impact**: What are the potential impacts on the firm's financial stability if the anticipated returns are not achieved, and how does the firm plan to adjust its strategy in response to changing economic conditions?
      """},
      {"role": "user", "content": follow_up_prompt}
  ]

In [36]:
jpm_2022_q2_follow_up_questions = []

for areas_of_concern in jpm_2022_q2_areas_of_concern:
  follow_up_llm_messages = create_follow_up_messages(areas_of_concern)
  phi_4_output = phi_4_pipe(follow_up_llm_messages, **generation_args)

  jpm_2022_q2_follow_up_questions.append(phi_4_output[0]['generated_text'])

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [37]:
print(jpm_2022_q2_follow_up_questions[0])

Based on the statement from Steven Chubak regarding the increase in regulatory minimums due to higher SCB and GSIB surcharge:

1. **Focusing on detail/quantification**: What specific strategies are being implemented to ensure that the firm's capital levels will meet or exceed the projected regulatory minimum of beyond 13% by 2024?
2. **Focusing on risk mitigation/controls**: How is the firm planning to manage the potential impact of these increased capital requirements on its solvency and ability to absorb losses?
3. **Focusing on governance/future plans**: What role does the board and senior management play in overseeing the firm's capital planning process to address these regulatory changes?
4. **Focusing on contingency/broader impact**: What contingency plans are in place if the firm is unable to meet the increased regulatory capital requirements, and how might this affect the firm's overall financial stability?

Based on the statement from Jeremy Barnum regarding the higher-than-ex

Initial outputs show that the LLM has generated questions we believe are highly relevant and useful to the PRA. While some instances may produce a larger set of questions, it's important to note that the prompt can be easily adjusted to limit the number of questions and focus on specific topics, providing flexibility for the PRA's needs.

## Conclusion
This notebook has demonstrated a robust and innovative application of the phi-4 Large Language Model to enhance the PRA's supervisory capabilities. By meticulously engineering prompts, establishing a PRA analyst persona, and leveraging multi-shot examples, we have successfully developed a pipeline that can:

Accurately extract specific statements of concern from earnings call transcripts.

Generate relevant and insightful follow-up questions, transforming raw textual data into actionable intelligence.

The ability to identify nuanced risks and formulate precise inquiries directly supports the PRA team in their mandate for prudential regulation. This AI-driven approach offers significant potential for increasing efficiency, improving the depth of analysis, and enabling more proactive supervisory engagement. This project provides a foundational framework that can be extended and integrated into existing workflows, paving the way for more data-driven and intelligent regulatory oversight.