<a href="https://colab.research.google.com/github/MAY2704/ML_QEA_usecases/blob/main/KYC__use_case1_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install spaCy model
!pip install spacy
import spacy
!python -m spacy download en_core_web_sm

from transformers import pipeline  # For GenAI model integration
bart_model = pipeline("text-generation", model="facebook/bart-base")

# Knowledge base definition
knowledge_base = {
  "customer_types": {
    "individual": {
      "definition": "A natural person acting outside their trade, business, craft or profession.",
      "verification_methods": ["government-issued ID (passport, driving license)", "proof of residence (utility bill, bank statement)"],
      "risk_factors": ["politically exposed person (PEP)", "high net worth individual (HNWI)"]
    },
    "company": {
      "definition": "A legal entity formed for the purpose of conducting business.",
      "verification_methods": ["company registration documents", "beneficial ownership information"],
        "risk_factors": [
    "shell companies",
    "countries with high money laundering risk",
    "cash-intensive business",
    "frequent transactions with sanctioned entities or high-risk countries",
    "negative news articles or regulatory actions",
    "complex corporate structures or offshore jurisdictions",
    "unexplained or suspicious activity patterns (e.g., sudden surge in transactions, rapid turnover of employees)",
    "licensing or regulatory compliance issues"]
    }
  },
  "actions": {
    "verify_identity": {
      "description": "The process of confirming a customer's identity.",
      "associated_concepts": ["customer_types", "verification_methods"]
    },
    "open_account": {
      "description": "The process of opening an account.",
      "associated_concepts": ["account opening"]
    },
    "assess_risk": {
      "description": "The process of evaluating a customer's money laundering and terrorist financing risk.",
      "associated_concepts": ["customer_types", "risk_factors"]
    }
  },
  "aml_kyc_regulations": {
    "5th_aml_directive": {
      "description": "EU Directive 2018/843 (5th AML Directive) on the prevention of money laundering and terrorist financing.",
      "applicability": "All financial institutions operating in the EU."
    },
    "dutch_wft": {
      "description": "Wet ter voorkoming van witwassen en financieren van terrorisme (Wft - Dutch Money Laundering and Terrorist Financing Prevention Act).",
      "applicability": "Financial institutions incorporated in the Netherlands."
    }
  }
}

# Get user input for KYC user story
user_story = input("Enter a user story for test generation: ")

# Combine processing and explanation (assuming GenAI models handle context)
def process_explain(user_story):
  nlp = spacy.load("en_core_web_sm")
  doc = nlp(user_story)
  customer_type = None
  action = None
  for ent in doc.ents:
    if ent.label_ == "ORG":
      customer_type = "company"
# Look for keywords in user story to infer action
  action_verbs = {
      "verify": ["verify", "confirm"],
      "open_account": ["open", "establish"],
      "assess_risk": ["assess", "evaluate"]
  }
  for action_type, verbs in action_verbs.items():
    for verb in verbs:
      if verb.lower() in user_story.lower():
        action = action_type
        break  # Stop iterating through verbs if action is found

  # Use BART model for explanation generation
  explanation_model = pipeline("text-generation", model="facebook/bart-base")
  predictions = explanation_model(user_story, max_length=100, num_return_sequences=1)
  explanation = predictions[0]["generated_text"].strip()

  # Refine explanation prompt with inferred action
  explanation_prompt = f"User story: {user_story}. Customer type: {customer_type}. Action: {action}. Explain the reasoning based on KYC regulations in the knowledge base."
  explanation_predictions = explanation_model(explanation_prompt, max_length=100, num_return_sequences=1)
  explanation = explanation_predictions[0]["generated_text"].strip()

  return {"customer_type": customer_type, "action": action, "explanation": explanation}

  return {"customer_type": customer_type, "action": action, "explanation": explanation}

# Process user story and get results
processed_story = process_explain(user_story)
customer_type = processed_story["customer_type"]
action = processed_story["action"]
explanation = processed_story["explanation"]

print(f"Explanation: {explanation}")

# Generate test case (using knowledge base)
def generate_test_case(user_story, customer_type, action, knowledge_base):
  test_case = []
  verification_methods, risk_factors = get_verification_methods_and_risk_factors(action, customer_type, knowledge_base)
  test_case.append(f"**User Story:** {user_story}")
  if customer_type:
    test_case.append(f"**Customer Type:** {customer_type}")
  test_case.append(f"**Action:** {action}")
  if verification_methods:
    test_case.append("Verify:")
    for method in verification_methods:
      test_case.append(f"- {method}")
  else:
    test_case.append("Use risk assesment parameters.")
  if risk_factors:
    test_case.append("Assess risk based on:")
    for factor in risk_factors:
      test_case.append(f"- {factor}")
  test_case.append("Record KYC information.")
  test_case.append("Determine customer risk level.")
  return test_case

# Access data from knowledge base

def get_verification_methods_and_risk_factors(action, customer_type, knowledge_base):
  """
  This function retrieves verification methods and risk factors from the knowledge base
  based on the provided action and customer type.

  Args:
      action (str): The action to be performed (e.g., verify_identity, open_account).
      customer_type (str): The customer type (e.g., individual, company).
      knowledge_base (dict): The dictionary containing KYC knowledge base information.

  Returns:
      tuple: A tuple containing two lists:
          - verification_methods (list): List of verification methods for the action and customer type.
          - risk_factors (list): List of risk factors for the customer type.
  """

  verification_methods = []
  risk_factors = []

  # Access knowledge base based on action and customer type
  if action in knowledge_base["actions"]:
    action_info = knowledge_base["actions"][action]
    # Check if associated concepts include "customer_types"
    if "customer_types" in action_info["associated_concepts"]:
      # Access verification methods based on customer type
      if customer_type in knowledge_base["customer_types"]:
        verification_methods = knowledge_base["customer_types"][customer_type]["verification_methods"]
        print(f"Retrieved verification methods for {customer_type}: {verification_methods}")  # Added print statement for debugging

  # Access risk factors based on customer type
  if customer_type in knowledge_base["customer_types"]:
    risk_factors = knowledge_base["customer_types"][customer_type]["risk_factors"]

  return verification_methods, risk_factors

# Generate and print test case
test_case = generate_test_case(user_story, customer_type, action, knowledge_base)
print("Test Case:")
for step in test_case:
  print(step)
print("Test case generated")





Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['decoder.embed_tokens.weight', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Enter a user story for test generation: A new company needs to be onboarded prepare KYC test case


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['decoder.embed_tokens.weight', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Explanation: User story: A new company needs to be onboarded prepare KYC test case. Customer type: company. Action: None. Explain the reasoning based on KYC regulations in the knowledge base.asures ont ontasuresasuresinstall mandate plotting invollinkslinkslinks ontasures ont plottingasures plotting plotting invol plottinglinks invol invollinks invollinks485linkslinks invol formally formally plotting plottingSIZEasuresasuresasures migrating migrating involdiagn plottingSIZE invol invol plotting invol involdiagnasuresasures Ships involasuresasures invol invol invol
Test Case:
**User Story:** A new company needs to be onboarded prepare KYC test case
**Customer Type:** company
**Action:** None
Use risk assesment parameters.
Assess risk based on:
- shell companies
- countries with high money laundering risk
- cash-intensive business
- frequent transactions with sanctioned entities or high-risk countries
- negative news articles or regulatory actions
- complex corporate structures or offsho