In [1]:
import nest_asyncio

nest_asyncio.apply()

In [2]:
from llama_index.core import SimpleDirectoryReader
reader= SimpleDirectoryReader(input_files=["../data/master-direction-rbi/Digital Payment Security Controls.pdf"])
docs= reader.load_data()

In [3]:
print(docs[0].metadata)

{'page_label': '1', 'file_name': 'Digital Payment Security Controls.pdf', 'file_path': '../data/master-direction-rbi/Digital Payment Security Controls.pdf', 'file_type': 'application/pdf', 'file_size': 267719, 'creation_date': '2025-01-17', 'last_modified_date': '2025-01-17'}


In [4]:
#Build RAG Pipeline, Define Structured Output Schema
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import Settings
from llama_index.llms.mistralai import MistralAI
import os
from dotenv import load_dotenv

load_dotenv()
api_key=os.getenv("MISTRAL_API_KEY")

llm=MistralAI(api_key=api_key)
embed_model=OllamaEmbedding(model_name="mxbai-embed-large")
Settings.llm=llm
Settings.embed_model=embed_model

In [5]:
#Building vectore store index
index_directory='../data/master-direction-rbi/index'
if not os.path.exists(index_directory+'/md-rbi-structured'):
    index=VectorStoreIndex(docs)
    index.storage_context.persist(persist_dir=index_directory+'/md-rbi-structured')
else:
    index= load_index_from_storage(StorageContext.from_defaults(persist_dir=index_directory+'/md-rbi-structured'))

In [6]:
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

reranker = FlagEmbeddingReranker(
    top_n=5,
    model="BAAI/bge-reranker-large"
)

In [7]:
from pydantic import BaseModel, Field
from typing import List

class Output(BaseModel):
    """Output containing the response, page numbers, and confidence."""

    response: str= Field(..., description="The answer to the question.")
    page_numbers: List[int]= Field(..., description="The page numbers of the sources used to answer this question. Do not include a page number if the context is irrelevant.")
    confidence: float= Field(..., description="Confidence value between 0-1 of the correctness of the result.")
    confidence_explanation: str= Field(..., description="Explanation for the confidence score.")

sllm=llm.as_structured_llm(output_cls=Output)

In [8]:
#Run queries
query_engine= index.as_query_engine(
    similarity_top_k= 5,
    node_postprocessors=[reranker],
    llm=sllm,
    response_mode="tree_summarize"
)

In [9]:
response=query_engine.query("What controls are suggested for card payments security?")
print(str(response))

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{"response":"Several controls are suggested for card payments security:\n\n1. **Payment Card Standards**: Implement various payment card standards such as PCI-PIN, PCI-PTS, PCI-HSM, and PCI-P2PE to ensure comprehensive payment card security.\n\n2. **Terminal Validation**: Ensure that terminals installed at merchants for capturing card details are validated against the PCI-P2PE program and that PoS terminals with PIN entry are approved by the PCI-PTS program.\n\n3. **Infrastructure Security**: Secure card payment infrastructure using methods like Unique Key Per Terminal (UKPT), Derived Unique Key Per Transaction (DUKPT), or Terminal Line Encryption (TLE).\n\n4. **Hardware Security Module (HSM) Controls**: Implement specific security controls for HSMs, including logging, access control, and secure backups.\n\n5. **Staff Training**: Educate and train staff, especially those in fraud control, about fraud analysis, investigative techniques, and fraud prevention measures.","page_numbers":[18

In [10]:
import json
import pandas as pd
from IPython.display import Markdown
def result_display(response):
    raw_json=json.loads(str(response))
    display(Markdown('### Response:\n '+raw_json['response']))
    display(Markdown('**Page Source:** '+str(raw_json['page_numbers'])))
    display(Markdown('**Confidence:** '+str(raw_json['confidence'])))
    display(Markdown('**Explanation**: '+raw_json['confidence_explanation']))
df=pd.DataFrame([json.loads(str(response))])

In [11]:
df['page_numbers']=df['page_numbers'].apply(lambda x: ', '.join(map(str, x)))
df

Unnamed: 0,response,page_numbers,confidence,confidence_explanation
0,Several controls are suggested for card paymen...,"18, 12",0.95,The response is based on the specific sections...


In [12]:
result_display(response)

### Response:
 Several controls are suggested for card payments security:

1. **Payment Card Standards**: Implement various payment card standards such as PCI-PIN, PCI-PTS, PCI-HSM, and PCI-P2PE to ensure comprehensive payment card security.

2. **Terminal Validation**: Ensure that terminals installed at merchants for capturing card details are validated against the PCI-P2PE program and that PoS terminals with PIN entry are approved by the PCI-PTS program.

3. **Infrastructure Security**: Secure card payment infrastructure using methods like Unique Key Per Terminal (UKPT), Derived Unique Key Per Transaction (DUKPT), or Terminal Line Encryption (TLE).

4. **Hardware Security Module (HSM) Controls**: Implement specific security controls for HSMs, including logging, access control, and secure backups.

5. **Staff Training**: Educate and train staff, especially those in fraud control, about fraud analysis, investigative techniques, and fraud prevention measures.

**Page Source:** [18, 12]

**Confidence:** 0.95

**Explanation**: The response is based on the specific sections of the document that detail the security controls for card payments, including the implementation of payment card standards, terminal validation, infrastructure security, HSM controls, and staff training.

In [16]:
#Next query
response=query_engine.query("What provisions related to multi factor authentication are mentioned in documents?")
result_display(response)

### Response:
 Multi-factor authentication (MFA) is required for electronic payments, fund transfers, and cash withdrawals from ATMs or digital payment applications. At least one authentication method should be dynamic or non-replicable, such as One Time Password (OTP), mobile devices (device binding and SIM), biometric/PKI/hardware tokens, or EMV chip cards with server-side verification. Adaptive authentication can be used based on risk assessment and user behavior. MFA and alerts should be applied to all payment transactions, account linkages, and changes to account details or fund transfer limits. Measures should be implemented to minimize exposure to man-in-the-middle attacks, and authenticated sessions should remain intact throughout the interaction. The maximum number of failed login or authentication attempts should be set, after which access is blocked. Device binding of mobile applications is required, and alternatives to SMS-based OTP authentication mechanisms should be considered. Mobile applications should require re-authentication after periods of inactivity and when connecting to new or unsecured networks.

**Page Source:** [10, 11, 17]

**Confidence:** 1.0

**Explanation**: All the points in the response are present in the given pages.

In [15]:
#Next query
response=query_engine.query("What provisions related to encryption are outlined in the documents?")
result_display(response)

### Response:
 The documents outline several provisions related to encryption:

1. **General Encryption Standards**: The encryption standards, key lengths, algorithms, and protocols used must be strong and adhere to internationally accepted standards that are not deprecated or demonstrated to be insecure.

2. **Mobile Application Security**: Mobile applications should not store sensitive information in unencrypted form. Any sensitive information written to temporary files must be encrypted or masked securely.

3. **Database Security**: Sensitive information should be written to the database in an encrypted form to protect against vulnerabilities such as SQL injection.

4. **Communication Protocols**: The communication protocol in digital payment channels, especially over the Internet, must adhere to a secure standard with appropriate levels of encryption.

5. **Certificate Management**: Digital certificates used in the digital payment ecosystem should be renewed well in time to ensure continuous security.

**Page Source:** [17, 7]

**Confidence:** 0.95

**Explanation**: The confidence score is high because the document explicitly mentions several provisions related to encryption, including standards for encryption, mobile application security, database security, communication protocols, and certificate management.