# BankStatement Analysis with Gen AI

## Table of Contents
1. Introduction
2. Examples
3. References and Further Reading

<a id='2'></a>

## 1. Introduction

Generative AI can be utilized in various ways for writing data engineering code, specifically for creating efficient and accurate data pipelines:

1. Code Generation: Automatically generating data engineering scripts based on high-level descriptions of tasks.
2. Optimization: Improving existing data engineering code based on performance feedback and best practices.
3. Schema Understanding: Interpreting data schemas to inform code generation and optimization.
4. Error Detection and Correction: Identifying and fixing errors in data engineering code through automated analysis.
5. Code Translation: Converting code between different programming languages and frameworks used in data engineering.
6. Complex Workflow Creation: Generating complex data workflows and pipelines based on user requirements.
7. Result Interpretation: Translating data processing results into human-readable reports and summaries.
8. Data Quality Checks: Generating code for validating data quality and consistency in pipelines.
9. Documentation Generation: Creating detailed documentation for data engineering code and workflows automatically.

Using Gen AI for this task offers several benefits:

- Increased productivity and efficiency for data engineers
- Faster development and deployment of data pipelines
- Reduced errors in code
- Improved maintainability and readability of code

In [2]:
!pip install openai
!pip install pandas
!pip install scikit-learn
!pip install matplotlib



In [3]:
import openai
import os
import json
import pandas as pd
from openai import OpenAI
import sklearn

# Set up OpenAI API key
client = OpenAI(api_key='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')

def clean(dict_variable):
    return next(iter(dict_variable.values()))

<a id='3'></a>
## 2. Example 1: Data cleaning

In [5]:
df = pd.read_csv('ApprovedCase.csv')

In [6]:
df

Unnamed: 0,Type,Category,Description,Date,Credit,Debit,Running Balance
0,Credit,Salary/Regular Income,W.H,31-01-2022,511.0,0.00,716.65
1,Debit,Uncategorisable,W.H,31-01-2022,0.0,60.00,531.20
2,Debit,Uncategorisable,W.H,31-01-2022,0.0,40.00,591.20
3,Debit,Uncategorisable,W.H,31-01-2022,0.0,40.00,631.20
4,Debit,"General Stores, Superstores",TESCO STORES,31-01-2022,0.0,19.65,671.20
...,...,...,...,...,...,...,...
841,Debit,Automotive Expenses,TT2 LTD,30-05-2022,0.0,31.90,111.79
842,Debit,"General Stores, Superstores",TESCO STORES,30-05-2022,0.0,3.84,143.69
843,Debit,"General Stores, General",NISA TYNEMOUTH,30-05-2022,0.0,4.29,174.53
844,Debit,"Entertainment, Restaurants/Dining",COOPLANDS BAKERY,30-05-2022,0.0,3.00,178.82


In [7]:
df['Date'] = pd.to_datetime(df['Date'],dayfirst=True)

# Extract the year and month for grouping
df['YearMonth'] = df['Date'].dt.to_period('M')

# Calculate disposable amount (Credit - Debit) for each month
monthly_disposable = df.groupby('YearMonth').apply(lambda x: x['Credit'].sum() - x['Debit'].sum(), include_groups=False)

# Convert the result to a DataFrame
monthly_disposable_df = monthly_disposable.reset_index(name='DisposableAmount')

print(monthly_disposable_df)

  YearMonth  DisposableAmount
0   2022-01          -2311.39
1   2022-02            -77.14
2   2022-03           6539.40
3   2022-04           2497.11
4   2022-05           1552.60


In [7]:
prompt = f"""Arrive at a decision to lend a finance based on the following data of a Customer.
{monthly_disposable_df}
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
)

result = response.choices[0].message.content
print(result)

To arrive at a decision regarding lending finance to the customer based on the provided data, we first need to analyze the "DisposableAmount" over the specified months.

Here’s a breakdown of the provided data:

- **January 2022**: -2311.39 (Deficit)
- **February 2022**: -77.14 (Slight deficit)
- **March 2022**: 6539.40 (Surplus)
- **April 2022**: 2497.11 (Surplus)
- **May 2022**: 1552.60 (Surplus)

### Analysis:

1. **Negative Disposable Amount in January and February**:
   - The customer had a significant deficit in January and a minor one in February. This suggests potential cash flow issues during these months.

2. **Substantial Surplus in March**:
   - In March, the customer experienced a significant surplus of 6539.40. This indicates a positive cash flow at this point, which is important for servicing any potential debt.

3. **Continued Surplus in April and May**:
   - The customer maintained positive disposable amounts in April (2497.11) and May (1552.60). This trend indicates i

In [8]:
prompt = f"""Arrive a lending decision with status Approved,Declined or Deferred to Underwriter by reviewing the below data
{monthly_disposable_df} . Approve only if disposable Amount is more than 1000 for last 3 months, Decline if its less 1000 for all the months, Deferred to Underwriter if disposable amount is more than 1000 for 1 or more months.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
)

result = response.choices[0].message.content
print(result)

To arrive at a lending decision based on the provided criteria, we will analyze the `DisposableAmount` for the last three months (i.e., March, April, and May of 2022).

Here are the relevant `DisposableAmount` values for the last three months:

- **March 2022**: 6539.40
- **April 2022**: 2497.11
- **May 2022**: 1552.60

Next, we evaluate these amounts against the conditions:

1. **Check if the disposable amounts are all greater than 1000**:
   - March: 6539.40 (greater than 1000)
   - April: 2497.11 (greater than 1000)
   - May: 1552.60 (greater than 1000)

Since all three months (March, April, May) have disposable amounts greater than 1000, the decision is:

**Approved.**
