# Risk Control Review AI Agent - Starter Notebook

This notebook guides you through building an AI agent that reviews risk controls, notes exceptions, and generates a summary report. Follow the steps to set up, code, and test the agent. Upload the provided `Risk_Controls.pdf` and `Audit_Logs.csv` files to start.

## Objectives
- Build a functional prototype using LangChain, FAISS, and an LLM.
- Learn prompt engineering, retrieval-augmented generation (RAG), and exception detection.


## Prerequisites
- Google Colab account (free).
- API key for Gemini,OpenAI(or xAI;.
- Sample files: `Risk_Controls.pdf`, `Audit_Logs.csv`.

## Setup Instructions
1. Upload `Risk_Controls.pdf` and `Audit_Logs.csv` to Colab (use the file explorer on the left).
2. Set your API key in the setup cell.
3. Run cells in order. Follow comments for guidance.
4. Debug issues with your instructor or refer to the Step 3 guide.


### Environment setup

In [3]:
# Setup Environment
!pip install langchain faiss-cpu PyPDF2 pandas -q
!pip install langchain-google-genai pypdf langchain-community -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.4/1.4 MB[0m [31m49.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.6.18 which is incompatible.[0m[31m
[0m

In [None]:
#another import
!pip install -U langchain-community -q

### Import into working environment

In [7]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
import pandas as pd

In [9]:

# Set your API key (replace with your own or instructor-provided key)
os.environ["GOOGLE_API_KEY"] = '[YOUR_APi_KEY]'

In [11]:
# Verify files are uploaded
if not os.path.exists('Risk controls.pdf') or not os.path.exists('Audit_Logs.csv'):
    raise FileNotFoundError('Please upload Risk_Controls.pdf and Audit_Logs.csv')
print('Setup complete. Files found.')

Setup complete. Files found.


In [16]:
# Extract Risk Controls with LLM
# Initialize LLM and extract key controls from the PDF

# Initialize model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
)
#use prompt
prompt = PromptTemplate(
    input_variables=['text'],
    template='Extract key risk control rules from this text in a concise list: {text}'
)
chain = LLMChain(llm=llm, prompt=prompt)

# Load and process PDF
loader = PyPDFLoader('Risk controls.pdf')
docs = loader.load()
controls = chain.run(text=docs[0].page_content)

print('Extracted Controls:')
print(controls)

# TODO: Refine the prompt if output is unclear or incomplete

Extracted Controls:
Here's a concise list of key risk control rules extracted from the text:

*   **Transaction Verification:** 2FA required for transactions > $10,000; verification logged with timestamp and user ID.
*   **User Access:** Only Managers or above can approve transactions > $5,000; access logs maintained.
*   **Data Retention:** Transaction records retained for 7 years; early deletion requires Compliance Officer approval.
*   **Suspicious Activity:** Transactions with 3+ retries in 24 hours flagged; manual audit required within 48 hours.


## Use the pdf controls against the sample audit files

In [20]:
# Detect Exceptions
# Compare audit logs against controls to flag exceptions

#TODO: read from various sources

df = pd.read_csv('Audit_Logs.csv')
exceptions = []

In [23]:
prompt = PromptTemplate(
    input_variables=['entry', 'controls'],
    template='Does this log entry violate any controls? Entry: {entry}. Controls: {controls}.Provide a brief explanation.'
)
chain = LLMChain(llm=llm, prompt=prompt)

In [24]:
for _, row in df.iterrows():
    analysis = chain.run(entry=row.to_dict(), controls=controls)
    if 'violate' in analysis.lower():
        exceptions.append({'Transaction_ID': row['Transaction_ID'],
                           'Reason': analysis})

print('Exceptions Found:')
for exc in exceptions:
    print(f"Transaction {exc['Transaction_ID']}: {exc['Reason']}")

# TODO: Add a rule-based check (e.g., Amount > 10000 and Verified_2FA == False)

Exceptions Found:
Transaction TX001: Yes, the log entry violates the **Transaction Verification** control.

**Explanation:**

The transaction amount ($12,000) exceeds the $10,000 threshold requiring 2FA. However, the log entry shows `Verified_2FA: False`.  This indicates that the transaction was processed without the required 2FA verification, violating the control.
Transaction TX002: The log entry **does not violate** any of the listed controls. Here's why:

*   **Transaction Verification:** The transaction amount ($8000) is *not* greater than $10,000, so the 2FA requirement is not triggered. However, the log *does* show that 2FA was used (`Verified_2FA': True`), which is a positive sign.
*   **User Access:** The transaction amount ($8000) *is* greater than $5,000, and the `Approver_Role` is 'Manager', which satisfies the requirement that only Managers or above can approve such transactions.
*   **Data Retention:** This control is about the lifecycle of the data, not the transaction i

In [25]:
# Generate Summary Report
# Create a structured report from exceptions

report_prompt = PromptTemplate(
    input_variables=['exceptions'],
    template='Generate a report with: 1) Summary of findings, 2) List of exceptions, 3) Recommendations. Exceptions: {exceptions}'
)
report_chain = LLMChain(llm=llm, prompt=report_prompt)

report = report_chain.run(exceptions=exceptions)
with open('report.md', 'w') as f:
    f.write(report)

print('Generated Report:')
print(report)

# TODO: Check report for clarity; refine prompt if needed

Generated Report:
## Transaction Log Audit Report

**Date:** October 26, 2023 (Assumed - Please update with actual date)

**1. Summary of Findings:**

This report summarizes the findings of an audit performed on a set of transaction logs. The audit focused on verifying compliance with four key controls: Transaction Verification (2FA for transactions over $10,000), User Access (Manager or above approval for transactions over $5,000), Data Retention (not directly assessed in these exceptions), and Suspicious Activity (flagging transactions with 3 or more retries).

Out of the five transactions reviewed, three (TX001, TX003, and TX004) were found to be in violation of at least one control. TX001 violated the Transaction Verification control due to a missing 2FA verification for a transaction exceeding the threshold. TX003 violated the User Access control because a transaction exceeding the approval threshold was approved by an Analyst instead of a Manager or above. TX004 violated the Susp

## Next Steps
- **Debugging**: Check for errors (e.g., missing controls, false positives).
- **Customization**: Add a rule-based check in Cell 4 (e.g., flag high-amount unverified transactions).
- **Homework**: Adapt the agent for your industry (e.g., new dataset or control).
- **Resources**:
  - LangChain docs: https://python.langchain.com/docs
  - xAI API: https://x.ai/api
  - Debugging tips: Ask your instructor for the Step 3 guide handout.

## Troubleshooting
- **API Key Error**: Ensure your key is valid and set correctly.
- **File Not Found**: Verify `Risk_Controls.pdf` and `Audit_Logs.csv` are uploaded.
- **LLM Output Issues**: Refine prompts for clarity; reduce temperature for consistency.

In [5]:
!pip install -U langchain-community -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m1.7/2.5 MB[0m [31m52.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m41.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [14]:
!pip install pypdf -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/310.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m307.2/310.5 kB[0m [31m11.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.5/310.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25h