<a href="https://colab.research.google.com/github/Pyfin5/Machine_Learning/blob/main/Compliance_Rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Live by the pip, die by the pip
!pip install google-genai langchain-docling langchain-core langchain-huggingface langchain_milvus langchain python-dotenv
!pip install weaviate-client  --upgrade

In [None]:
# Docling to convert pdfs to markdown
!pip install docling pypdf

In [23]:
#Library imports
from google import genai
from google.colab import userdata
from google.genai import types
from google.genai.types import EmbedContentConfig, VertexRagStore
import os
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import PdfFormatOption
from pypdf import PdfReader
from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai
from IPython.display import display, Markdown

In [4]:
# Authentication if using Colab
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

In [5]:
#Upload the files to ingest in RAG
from google.colab import drive
drive.mount('/content/drive')
os.chdir('/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/')
print(os.getcwd())
!dir
path = os.getcwd()

Mounted at /content/drive
/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG
cross-industry-guidance-on-operational-resilience.pdf
Effective\ TPRM\ Foundations\ Building\ Business\ Continuity\ And\ Operational\ Resilience\ To\ Strengthen\ Supply\ Chains.pdf
osfi_resilience.pdf
osfi_tprm.md
osfi_tprm.pdf
PayLink_BCP.pdf
Risk\ Domains\ 2025.pdf


In [None]:
#Define paths for each of the files to ingest in a RAG solution
tprm_guidelines_path = "osfi_tprm.pdf"
opres_guidelines_path = "osfi_resilience.pdf"
ireland_opres_path = "cross-industry-guidance-on-operational-resilience.pdf"
paylink_bcp = "PayLink_BCP.pdf"

pdf_files = [os.path.join(path,tprm_guidelines_path),
             os.path.join(path,opres_guidelines_path),
             os.path.join(path,ireland_opres_path),
             os.path.join(path,paylink_bcp)]

print(pdf_files)

In [None]:
#Use basic docling document converter to convert to a more machine readable markdown format
pdf_markdown = {}

converter = DocumentConverter()

for source in pdf_files:
  result = converter.convert(source)
  print("result",result)
  print("source",source)
  doc = result.document
  print("doc",doc)
  pdf_markdown[source] = doc.export_to_markdown()

In [8]:
print(pdf_markdown.keys())
paylink_markdown = {}

# Assign the markdown content of the PayLink BCP to the paylink_markdown dictionary
paylink_markdown['/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/PayLink_BCP.pdf'] = pdf_markdown['/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/PayLink_BCP.pdf']

dict_keys(['/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/osfi_tprm.pdf', '/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/osfi_resilience.pdf', '/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/cross-industry-guidance-on-operational-resilience.pdf', '/content/drive/MyDrive/Colab-Notebooks/Compliance_RAG/PayLink_BCP.pdf'])


In [9]:
# TODO(developer): Update and un-comment below lines
PROJECT_ID = userdata.get('PROJECT_ID')
REGION = "us-east4"



In [10]:
client = genai.Client(vertexai=True, project=PROJECT_ID, location="us-central1")
vertexai.init(project=PROJECT_ID, location="europe-west3")

In [None]:
# Create temporary file path for the markdown as rag corpus uses local uploads or files in Colab or Google Drive.
import tempfile
import os

file_path = []

for i in pdf_markdown:
  # Create a temporary file
  # delete=False keeps the file after closing so you can use its name
  with tempfile.NamedTemporaryFile(delete=False, mode='w', encoding='utf-8') as tmp_file:
      file_path_temp = tmp_file.name
      file_path.append(file_path_temp)
      # Write content to the temporary file
      content_to_write = pdf_markdown[i]
      tmp_file.write(content_to_write)
      print(f"Temporary file created at: {file_path_temp}")


  # for example, to read its content or upload it.
  with open(file_path_temp, 'r', encoding='utf-8') as f:
      read_content = f.read()
      #print(f"Content read from temporary file:\n{read_content}")


  print(file_path_temp)

print(file_path)

In [None]:
print(file_path)

In [13]:
print('osfi_tprm.md')

osfi_tprm.md


In [None]:
loc_rag = "europe-west3"

rag_engine_config = rag.rag_data.get_rag_engine_config(
    name=f"projects/{PROJECT_ID}/locations/{loc_rag}/ragEngineConfig"
)

print(rag_engine_config)

In [17]:

new_rag_engine_config = rag.RagEngineConfig(
name=f"projects/{PROJECT_ID}/locations/{loc_rag}/ragEngineConfig",
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Basic()),
)

updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)

#Create Rag Corpus
embedding_mode_config = rag.RagEmbeddingModelConfig(
    vertex_prediction_endpoint= rag.VertexPredictionEndpoint(
        publisher_model='publishers/google/models/text-embedding-005')
)

rag_corpus = rag.create_corpus(
    display_name = 'compliance_corpus',
    backend_config = rag.RagVectorDbConfig(
        rag_embedding_model_config = embedding_mode_config
    )
)

#Upload files to corpus
rag_file = rag.upload_file(
    rag_corpus.name,
    file_path[0],

    transformation_config = rag.TransformationConfig(
        chunking_config = rag.ChunkingConfig(
            chunk_size = 400,
            chunk_overlap = 100
        )
    )
)

In [18]:

#Create RAG retrival config
rag_retreival_config = rag.RagRetrievalConfig(
    top_k = 3,
    filter = rag.Filter(vector_distance_threshold=0.5)
)

In [19]:
# Create RAG retrieval tool
rag_retrieval_tool = Tool.from_retrieval(
    retrieval = rag.Retrieval(
        source = rag.VertexRagStore(
            rag_resources = [
                rag.RagResource(
                    rag_corpus = rag_corpus.name
                )
            ],
            rag_retrieval_config=rag_retreival_config,
        ),
    )
)

In [20]:
#Invoke model instance to use as RAG chatbot
rag_model = GenerativeModel(
    model_name = "gemini-2.5-flash",
    tools = [rag_retrieval_tool]
)



In [31]:
#Query and document to evaluate
respuser_query = f"""Evaluate the {paylink_markdown} document against criteria laid out in operational resilience guideline as
                     set out in the OSFI Resilience Guidelines document. Evaluate the documents on document strengths and where control weaknesses exist"""

In [32]:

#Generate the response
response = rag_model.generate_content(respuser_query)

In [33]:
print(response)

candidates {
  content {
    role: "model"
    parts {
      text: "PayLink\'s Business Continuity Plan (BCP) demonstrates several strengths in alignment with general resilience principles, but also has areas for improvement when evaluated against OSFI Resilience Guidelines for Financial Institutions (FRFI) concerning third-party arrangements.\n\n**Strengths:**\n*   **Clear Objectives and Scope:** The BCP clearly outlines its purpose, objectives, and critical business functions with defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).\n*   **Risk Identification:** It identifies key risks, including third-party service failure, cybersecurity incidents, and cloud outages.\n*   **Robust Recovery Strategies:** The plan details multiple recovery strategies, such as redundant cloud infrastructure, database replication, automated failover, dual data centers, and backup vendor contracts, indicating strong internal resilience.\n*   **Structured Testing and Maintenance:** 

In [34]:
display(Markdown(f"### Comparison"))
display(Markdown(response.text))

### Comparison

PayLink's Business Continuity Plan (BCP) demonstrates several strengths in alignment with general resilience principles, but also has areas for improvement when evaluated against OSFI Resilience Guidelines for Financial Institutions (FRFI) concerning third-party arrangements.

**Strengths:**
*   **Clear Objectives and Scope:** The BCP clearly outlines its purpose, objectives, and critical business functions with defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
*   **Risk Identification:** It identifies key risks, including third-party service failure, cybersecurity incidents, and cloud outages.
*   **Robust Recovery Strategies:** The plan details multiple recovery strategies, such as redundant cloud infrastructure, database replication, automated failover, dual data centers, and backup vendor contracts, indicating strong internal resilience.
*   **Structured Testing and Maintenance:** PayLink commits to regular testing (quarterly tabletop, bi-annual failover drills, annual full BCP test) and semi-annual reviews, demonstrating a proactive approach to maintaining the plan's effectiveness.
*   **Communication Protocols:** It includes specific communication protocols for internal and external stakeholders, including regulator notifications.
*   **Compliance Awareness:** The BCP notes compliance with various industry standards (ISO 22301, PCI DSS, SOC 2), indicating a commitment to established best practices.

**Areas for Improvement / Control Weaknesses against OSFI Guidelines:**
*   **Third-Party Agreement Requirements (OSFI Section 2.3.4.1):** While PayLink mentions "backup vendor contracts," the BCP does not explicitly detail how its own agreements with its third-party providers *require* them to outline continuity measures, test regularly, notify PayLink of test results, or address material deficiencies.
*   **FRFI's Access to Records/Information (OSFI Section 2.3.4.1):** The BCP does not specify how PayLink ensures its clients (FRFIs) have possession of, or ready access to, necessary records to sustain their business operations, meet statutory obligations, and provide information to OSFI in the event of a disruption to PayLink's services.
*   **Joint Design and Testing (OSFI Section 2.3.4.1):** The BCP does not indicate if joint design and testing of business continuity plans are considered or performed between PayLink and its FRFI clients, commensurate with service criticality.
*   **Comprehensive Contingency and Exit Strategies (OSFI Section 2.3.5.1):** PayLink's BCP lacks a dedicated section or detailed plans for contingency and exit strategies specific to its critical third-party arrangements. OSFI guidelines require FRFIs to establish such plans, including triggers for invocation, activities for stressed/non-stressed exit, reference to contractual provisions, and sufficient detail to allow rapid execution. While "backup vendor contracts" are mentioned, this does not fulfill the comprehensive requirements for an exit strategy as outlined by OSFI.
*   **Addressing Severe/Plausible Scenarios for Third-Party Failure (OSFI Section 2.3.4.1 & 2.3.5.1):** While PayLink addresses general severe scenarios for its own operations, the BCP does not explicitly elaborate on how it addresses prolonged disruptions or multiple simultaneous disruptions specifically arising from its *own critical third-party service providers*, particularly in the context of ensuring continuity for its FRFI clients.
*   **Facilitation of OSFI/FRFI Evaluation and Audits (OSFI Section 2.3.4):** The BCP does not explicitly describe how PayLink facilitates its FRFI clients' or OSFI's ability to evaluate risks arising from their arrangement, appoint independent auditors, or access audit reports related to the services PayLink provides.