# Use Case 6.0: Automating Text-filling tasks with LLMs

### Problem statement
  In the dashboard, there exists a function for communicating with loan applicants over the following matters:
  - Scheduling physical meetings for discussing loan contracts
  - Remediation of documentation discrepancies (ie missing documents, wrong application type, years at current job contradict entered year, etc)
  - Short-hand remediation by means of discussing top risk factors and how applicants can improve their loan risk or creditworthiness score

However, this task is rather taxing on officers as time still needs to be spent on crafting a message that can represent the company.  
  
As per personal experience, banks and financial authorities have come up with a weak solution to this by introducing GUI for selecting pre-written text snippets for standardised tasks. This reduces the workload officers experience, and somewhat solves the issue of standardising enterprise-ready responses from said officers, but is still rather troublesome.

----

### Proposed solution and workflow

We can take automate these tasks with the use of LLMs via the following workflow:

1. An officer summarises a few points that they want to convey to the user
2. The officer then optionally selects a text snippet they'd like to use to respond to the user and "control" generation
3. A preprocessing module takes this information and parses it into a prompt to be fed into an LLM. If a text snippet has already been selected, then the LLM will make use it as reference, otherwise we can use RAG to select the most appropriate text snippet for us.
4. The LLM then generates a polite response from the prompt and presents it to the officer, to which the officer can then vet before sending to the applicant.

----
### Impact and risks

Doing so will enable the officer to work more efficiently, as the content will be automatically generated and their task now shifts to assessing whether the generated content is viable for communicating with applicants, and editing accordingly, thus alleviating their workload for such menial tasks.

Furthermore, there are known studies where [humans perform better when editing work rather than through their own writing due to cognitive bias and familiarity](https://www.mentalfloss.com/article/633063/reason-its-so-hard-spot-your-own-typos), thus this would aid with mitigating human error when communicating with applicants.
  
However, we are aware that this may unwittingly result in officers haphazardly sending responses without properly reviewing them beforehand, thus it is important that they are aware of the ethical considerations with AI use and act accordingly.

----

### Possible areas for future improvements to be made:

1. Automated prompt engineering can be done to optimize the format of the prompt that is being used by the pre-processing module. We can adapt a workflow from [this article](https://medium.com/mantisnlp/automatic-prompt-engineering-part-ii-building-an-ape-workflow-88f2b2d68fc5) where:
  1. Variations of prompt formats are entered and their associated LLM-parsed responses are collected from the LLM
  2. Another LLM judges the responses and evaluates them based on selected metrics (which will be decided by MLOPs Engineers) or simply evaluated by ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score, since we are effectively performing a summarisation task
  3. picking the top responses and performing Automatic Prompt Optimization ([proposed in this paper](https://arxiv.org/pdf/2305.03495.pdf)) on our best prompts so far until we reach a satisfactory metric score.

2. We can fine-tune an LLM for our task using PEFT (Parameter Efficient Fine-Tuning) techniques like [RoSA](https://arxiv.org/abs/2401.04679) and [GaLore](https://arxiv.org/abs/2403.03507) To obtain better responses.

3. This process can be further automated by inferring what changes need to be made using the information provided by usecases 3.0 and 3.1, where we can introduce reasoning modules and AI agents to be trained to assess how best to address the top-k risk factors, while also accounting for the document summary and tailoring a strategy for improving loan viability.

In [3]:
import pandas as pd

Note, the dataset of text snippets was AI generated with the following prompts:  

"  
you are a loan approval officer working at Krungsri, Bank of Ayudhya. Give me some specific examples that fall under your job scope, being:
- Scheduling physical meetings for discussing loan contracts
- Remediation of documentation discrepancies (ie missing documents, wrong application type, years at current job contradict entered year, etc)
- Short-hand remediation by means of discussing top risk factors and how applicants can improve their loan risk or creditworthiness score  
"  
  
Followed by:

"You are a loan approval officer working at Krungsri, Bank of Ayudhya. Write an email (in english) to a loan applicant representing the company informing them that the number of years that they have worked for their current job doesn't match the provided documentation in their loan application."  
  
  and
  
"now write an email addressing this issue: \<insert appropriate issue\>"

We have made 5 prompts for this to simulate information retrieval.

# Data Generation

In [4]:
df = pd.DataFrame(columns=['text_snippet'])
df.head()

Unnamed: 0,text_snippet


In [5]:
df.loc[len(df)]= r'''Subject: Discrepancy in Loan Application Documentation

Dear [Applicant's Name],

I hope this email finds you well.

Thank you for submitting your loan application to Krungsri, Bank of Ayudhya. We have reviewed your documents and noticed a discrepancy regarding the number of years you have worked in your current job, as stated in your application versus the supporting documentation provided.

To proceed with your application, we kindly request clarification or updated documentation that accurately reflects your employment history. You may provide an updated employment verification letter, recent pay slips, or any other official documents that confirm your tenure with your current employer.

Please submit the requested documents at your earliest convenience to [email address] or visit any Krungsri branch. If you have any questions or need assistance, feel free to contact us at [phone number].

We appreciate your prompt attention to this matter and look forward to assisting you further.

Best regards,
[Your Full Name]
Loan Approval Officer
Krungsri, Bank of Ayudhya
[Your Contact Information]
[Bank Address (optional)]'''

df.loc[len(df)]= r'''Subject: Scheduling Your Personal Loan Discussion – Convenient Meeting Options

Dear [Client's Name],

Thank you for considering Krungsri, Bank of Ayudhya for your personal loan needs. We appreciate the opportunity to assist you and understand your preference for an in-person meeting to discuss the terms in detail.

To ensure we address all aspects of your loan—including interest rates, repayment schedules, and any applicable fees—I have coordinated with your Relationship Manager, [RM's Name], to arrange a meeting at your convenience. We are happy to accommodate your schedule and can meet at any of our branches or a location of your preference.

Please let us know your availability over the next few days, and we will confirm the appointment promptly. Alternatively, you may contact [RM's Name] directly at [RM's Email] or [RM's Phone Number] to schedule a time that works best for you.

We look forward to providing you with tailored solutions that align with your financial objectives. Should you have any immediate questions, feel free to reach out.

Warm regards,
[Your Full Name]
[Your Job Title]
Krungsri, Bank of Ayudhya
[Your Contact Information]
[Bank Address (if applicable)]'''

df.loc[len(df)]= r'''Subject: Action Required: Resolving Discrepancies in Your Loan Application

Dear [Applicant's Name],

Thank you for submitting your loan application to Krungsri, Bank of Ayudhya. To proceed with your request, we need your assistance in resolving a few discrepancies to ensure a smooth and efficient approval process.

1. Missing Documents:
Your application is missing the last 6 months of bank statements, which are required for review. For faster processing, please refer to the attached checklist and submit the outstanding documents via [secure upload link/email/branch visit].

2. Incorrect Application Type:
It appears you selected "Personal Loan" instead of "Car Loan." To correct this, we recommend:

Canceling the current application (if you haven’t already).

Reapplying under the correct category using [correct application link].
Our team is happy to assist if you need guidance—just reply to this email or call us at [phone number].

3. Employment History Inconsistency:
Your application states 5 years at your current job, but the submitted tax documents reflect 3 years. To clarify, please provide:

A letter of explanation detailing the discrepancy, or

An updated employment certification from your employer.

Next Steps:
To avoid delays, kindly submit the missing/corrected documents by [deadline, if applicable]. If you have any questions or need assistance, feel free to contact us at [phone number] or [email].

We appreciate your prompt attention and look forward to finalizing your application.

Best regards,
[Your Full Name]
[Your Position]
Krungsri, Bank of Ayudhya
[Contact Information]'''

df.loc[len(df)]= r'''Subject: Improving Your Loan Eligibility – Key Recommendations

Dear [Applicant's Name],

Thank you for considering Krungsri, Bank of Ayudhya for your financial needs. After reviewing your application, we’ve identified a few factors that may impact your eligibility. Below are tailored recommendations to help strengthen your creditworthiness and increase your chances of approval:

1. Low Credit Score (CIC Score < 600)
Your credit history shows past late payments, which can affect your score. To improve it:

Reduce credit card balances to lower your credit utilization ratio (aim for <30% of your limit).

Avoid new credit applications for 3–6 months before reapplying, as multiple inquiries can further lower your score.

2. Unstable Income (Freelancers/Gig Workers)
As a self-employed/independent worker, we understand income variability. To better demonstrate stability:

Submit 12 months of bank statements (instead of the standard 3–6 months) to show consistent cash flow.

Consider adding a creditworthy co-signer (e.g., a spouse or family member with stable income) to strengthen your application.

3. High Debt-to-Income Ratio (DTI > 40%)
Your current debt obligations are high relative to your income. To lower your DTI:

Pay off small debts first (e.g., credit cards or personal loans) to reduce monthly payments.

Opt for a longer loan tenor (if applicable) to lower monthly payments—though this may increase total interest paid over time.

Next Steps
If you’d like to discuss these strategies in detail, we’d be happy to schedule a consultation.

Once adjustments are made, you may reapply with updated documentation for reconsideration.

We’re committed to helping you achieve your financial goals and are here to support you through this process. Feel free to reply to this email or call us at [phone number] for further guidance.

Best regards,
[Your Full Name]
[Your Position]
Krungsri, Bank of Ayudhya
[Contact Information]'''

df.loc[len(df)]= r'''Subject: Resolution of Collateral Valuation for Your Business Loan Application

Dear [Client's Name],

Thank you for your business loan application with Krungsri, Bank of Ayudhya. We appreciate the opportunity to support your restaurant’s expansion plans.

During our standard collateral review process, we identified a discrepancy in the valuation of your commercial property. While you estimated its value at THB 5 million, our internal appraisal currently places it at THB 4.2 million, resulting in a lower-than-expected Loan-to-Value (LTV) ratio.

Our Review & Findings
We thoroughly examined the valuation report to ensure fairness, including:

Assessing whether recent renovations or location advantages (e.g., proximity to BTS) were fully accounted for.

Comparing the appraisal with recent market transactions of similar properties in your area.

While we stand by our initial valuation, we want to work with you to find a viable solution.

Available Options
To proceed with your loan request, we can explore the following alternatives:

Adjust the Loan Amount

Approve a reduced amount based on the current LTV (e.g., THB 2.94 million at 70% LTV of THB 4.2 million).

Supplement with Additional Collateral

Pledge additional assets, such as equipment, another property, or a personal guarantee, to cover the shortfall.

Request a Re-Evaluation

If you believe our valuation does not reflect the property’s true worth, we can facilitate a third-party appraisal (at your cost) for reconsideration.

Alternative Loan Structure

If collateral remains insufficient, we could structure a blended loan (partially secured/unsecured) with adjusted terms.

Strengthening Your Application
To improve approval chances, you may also:

Provide updated financial statements (e.g., demonstrating revenue growth, which you’ve already shown at 20% YOY).

Submit a revised business plan highlighting future cash flow stability.

Next Steps
Based on our discussion, you’ve kindly agreed to supplement with a personal guarantee and have provided updated financials. This allows us to approve your loan at a revised LTV. The final terms will be shared in a separate offer letter.

We’re committed to supporting your business growth and are happy to clarify any details. Please reply to this email or call me directly at [phone number] to discuss further.

Best regards,
[Your Full Name]
[Your Position]
Krungsri, Bank of Ayudhya
[Contact Information]'''

df

Unnamed: 0,text_snippet
0,Subject: Discrepancy in Loan Application Docum...
1,Subject: Scheduling Your Personal Loan Discuss...
2,Subject: Action Required: Resolving Discrepanc...
3,Subject: Improving Your Loan Eligibility – Key...
4,Subject: Resolution of Collateral Valuation fo...


In [6]:
df.to_csv('./text_snippets.csv')

# Creating vectorDB using ChromaDB

In [7]:
#installing modules
!pip install chromadb sentence-transformers
!pip install --upgrade huggingface_hub



In [8]:
import pandas as pd
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer

First, we populate the bot with information that is already obtained prior to generation:

In [9]:
client = 'Somchai Prakong'
Officer = 'Somsak Phakdee'
position = 'Loan Officer'
contact_info = '+66 05-2137475'

Next, we verify (if the dataset was not obtained from the above section) that it is valid, it should have a single column named `text_snippets`:

In [10]:
df = pd.read_csv('https://raw.githubusercontent.com/Zypperman/DBTT_G1_GRP3/main/Data/text_snippets.csv')
df.head(3)

Unnamed: 0.1,Unnamed: 0,text_snippet
0,0,Subject: Discrepancy in Loan Application Docum...
1,1,Subject: Scheduling Your Personal Loan Discuss...
2,2,Subject: Action Required: Resolving Discrepanc...


next, we create an embedding model before we pass each text snippet into the vector DB for storage and retrieval.  
  
  Note: the embedding model can be replaced for one of higher quality.

In [11]:
model = SentenceTransformer('BAAI/bge-small-en')
mbeddings = model.encode(df['text_snippet'].tolist())

Now, We initialize ChromaDB:

In [12]:
client = chromadb.Client(Settings())
collection_name = 'snippet_search'
collection = client.get_or_create_collection(name=collection_name)

ids = [str(i) for i in list(range(len(df)))]
collection.add(ids=ids,embeddings=mbeddings)

In [13]:
# write function to retrieve most similar text snippet
def vector_search(query,model,collection,top_n = 3):
  query_embedding  = model.encode(query)
  results = collection.query(query_embeddings=query_embedding,n_results=top_n)
  return results

# LLM response parsing

We can now use a given situation, where the officer has not provided us with a text snippet on how to respond to a client, and we have to obtain an appropriate text snippet.

In [14]:
query = 'schedule meeting for loan discussion' # officer keyed in this information
search_results = vector_search(query,model,collection)
print(search_results['ids'][0][0])

1


In reality, we will be using more than the 5 snippets in our vectorDB, and will consider larger scale models for more precise embedding, while also using a formal DB and SQL for retrieving entire queries. For now, we will simply retrieve our snippet with pandas.

In [17]:
snippet = df.loc[int(search_results['ids'][0][0])]['text_snippet']
print(snippet)

Subject: Scheduling Your Personal Loan Discussion – Convenient Meeting Options

Dear [Client's Name],

Thank you for considering Krungsri, Bank of Ayudhya for your personal loan needs. We appreciate the opportunity to assist you and understand your preference for an in-person meeting to discuss the terms in detail.

To ensure we address all aspects of your loan—including interest rates, repayment schedules, and any applicable fees—I have coordinated with your Relationship Manager, [RM's Name], to arrange a meeting at your convenience. We are happy to accommodate your schedule and can meet at any of our branches or a location of your preference.

Please let us know your availability over the next few days, and we will confirm the appointment promptly. Alternatively, you may contact [RM's Name] directly at [RM's Email] or [RM's Phone Number] to schedule a time that works best for you.

We look forward to providing you with tailored solutions that align with your financial objectives. Sho

Next, we will authenticate with Huggingface to use the new Gemma-2b Model

In [15]:
import os
HF_ACCESS_TOKEN = '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' # please replace this token with one from an account that you own, do not abuse. present only for project.
os.environ['HUGGINGFACE_TOKEN'] = HF_ACCESS_TOKEN
!huggingface-cli login --token $HUGGINGFACE_TOKEN --add-to-git-credential


Token is valid (permission: fineGrained).
The token `access_gemma` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.[0m
Token has not been saved to git credential helper.
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `access_gemma`


In [19]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
LLM = AutoModelForCausalLM.from_pretrained("google/gemma-2b")



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [25]:
input_text = f"""You are a loan officer, communicating with a loan applicant via email. here is an email template. Populate it with this key information:


[Client's Name] = Somchai Prakong
[Your Full Name] = 'Somsak Phakdee'
[Your Job Title] = 'Loan Officer'
[Your Contact Information] = '+66 05-2137475'
please replace the fields where applicable.

TEMPLATE:
{snippet}


"""
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = LLM.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

<bos>You are a loan officer, communicating with a loan applicant via email. here is an email template. Populate it with this key information:


[Client's Name] = Somchai Prakong
[Your Full Name] = 'Somsak Phakdee'
[Your Job Title] = 'Loan Officer'
[Your Contact Information] = '+66 05-2137475'
please replace the fields where applicable.

TEMPLATE:
Subject: Scheduling Your Personal Loan Discussion – Convenient Meeting Options

Dear [Client's Name],

Thank you for considering Krungsri, Bank of Ayudhya for your personal loan needs. We appreciate the opportunity to assist you and understand your preference for an in-person meeting to discuss the terms in detail.

To ensure we address all aspects of your loan—including interest rates, repayment schedules, and any applicable fees—I have coordinated with your Relationship Manager, [RM's Name], to arrange a meeting at your convenience. We are happy to accommodate your schedule and can meet at any of our branches or a location of your preference

While the text generation for the prompt used isn't ideal (since we wanted text replacement), we could tentatively use the system without an LLM and have manual text replacement rules in place. Furthermore, we can use better LLM models like DeekSeek's R1-Zero or Qwen.