In [8]:
## AI Development Workflow - Fraud Detection API

This notebook supports the assignment _"Understanding the AI Development Workflow"_ for the PLP Academy.

**Use Case**: Real-time fraud detection during loan applications.

**Objective**: Apply the AI development workflow from problem definition to deployment using a realistic banking scenario. This notebook includes mock data, a basic model, and evaluation metrics.

SyntaxError: invalid syntax (ipython-input-8-739933365.py, line 3)

In [9]:
## 📑 Table of Contents

1. Problem Definition
2. Data Collection & Preprocessing
3. Model Development
4. Evaluation Metrics
5. Deployment Plan
6. Ethics & Bias Considerations
7. Workflow Diagram

SyntaxError: invalid syntax (ipython-input-9-4137540645.py, line 3)

In [10]:
## 1. Problem Definition

- **Problem:** Detect potentially fraudulent loan applications in real-time.
- **Objectives:**
  1. Accurately classify fraud vs. legitimate applications.
  2. Minimize false positives to avoid hurting genuine clients.
  3. Enable fast alerts for investigation teams.
- **Stakeholders:**
  - Bank fraud analysts
  - Loan applicants
- **KPI:** F1 Score (balances false positives and false negatives)


SyntaxError: invalid syntax (ipython-input-10-3528358311.py, line 3)

In [13]:
import pandas as pd
import numpy as np

# Scikit-learn: for modeling and evaluation
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

In [4]:
!pip install pandas scikit-learn matplotlib seaborn



In [6]:
## 2. Data Collection & Preprocessing

- **Data Sources:**
  - Client behavior logs
  - SIM swap records
- **Potential Bias:** Under-representation of rural clients.
- **Preprocessing Steps:**
  1. Handle missing values (e.g., impute location or SIM swap gaps).
  2. Encode categorical variables (e.g., one-hot encoding).
  3. Normalize numerical features.

SyntaxError: invalid syntax (ipython-input-6-2728381245.py, line 3)

In [5]:
# Load dataset from uploaded CSV
import pandas as pd

df = pd.read_csv('mock_fraud_clients.csv')
df.head()

Unnamed: 0,client_id,location,sim_swap_days,loan_amount,time_of_application,label
0,101,Midrand,1,10000,10:15,1
1,102,Soweto,90,25000,14:30,0
2,103,Cape Town,10,15000,09:50,1
3,104,Ballito,0,50000,16:45,0
4,105,Houghton,3,18000,11:00,1


In [7]:
import pandas as pd

# Convert time to hour
df['application_hour'] = pd.to_datetime(df['time_of_application'], format='%H:%M').dt.hour

# Drop time column and client ID
df = df.drop(['time_of_application', 'client_id'], axis=1)

# One-hot encode location
df = pd.get_dummies(df, columns=['location'])

# Define features and label
X = df.drop('label', axis=1)
y = df['label']

In [11]:
## 3. Model Development

We use a Random Forest Classifier due to its performance and ability to handle small datasets.

### Hyperparameters to Tune:
- `n_estimators` (number of trees)
- `max_depth` (depth of each tree)

SyntaxError: invalid syntax (ipython-input-11-2075335492.py, line 3)

In [18]:
from sklearn.ensemble import RandomForestClassifier

# Train/test split
from sklearn.model_selection import train_test_split

# Split your data (assumes X and y already defined)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train the model
model = RandomForestClassifier(max_depth=5)
model.fit(X_train, y_train)

In [16]:
# Evaluate predictions
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

Confusion Matrix:
 [[1 0]
 [0 1]]


In [19]:
## 4. Evaluation & Deployment

### Evaluation Metrics:
- **F1 Score**: Balances precision and recall, key for fraud detection.
- **Confusion Matrix**: Helps visualize model performance on each class.

### Concept Drift:
Fraud patterns change over time. We monitor performance weekly and retrain as needed.

### Deployment Challenge:
Scalability and latency are key if integrated into a real-time application system.

SyntaxError: invalid syntax (ipython-input-19-3202385545.py, line 4)

In [20]:
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

Confusion Matrix:
 [[1 0]
 [0 1]]


In [21]:
## 5. Ethics & Bias Considerations

- **Risk:** If the model is trained on biased data, it could unfairly flag applicants from certain locations.
- **Strategy:** Use diverse datasets, apply fairness constraints, and test model outcomes across demographic slices.

SyntaxError: invalid syntax (ipython-input-21-279959468.py, line 3)

In [22]:
## 6. AI Development Workflow Diagram

A visual flowchart of the steps followed in this project:

1. Problem Definition
2. Data Collection
3. Preprocessing
4. Model Development
5. Evaluation
6. Deployment
7. Monitoring

SyntaxError: invalid syntax (ipython-input-22-1339307365.py, line 3)

In [23]:
## 7. Reflection

- **Challenge:** Designing realistic mock data and choosing the right features.
- **What I’d Improve:** With more time, I’d gather real data and perform hyperparameter tuning and model explainability (e.g., SHAP).

SyntaxError: invalid character '’' (U+2019) (ipython-input-23-350229850.py, line 4)

## AI Development Workflow - Fraud Detection API

This notebook supports the assignment _"Understanding the AI Development Workflow"_ for the PLP Academy.

**Use Case**: Real-time fraud detection during loan applications.

**Objective**: Apply the AI development workflow from problem definition to deployment using a realistic banking scenario. This notebook includes mock data, a basic model, and evaluation metrics.

## 1. Problem Definition

- **Problem:** Detect potentially fraudulent loan applications in real-time.
- **Objectives:**
  1. Accurately classify fraud vs. legitimate applications.
  2. Minimize false positives to avoid hurting genuine clients.
  3. Enable fast alerts for investigation teams.
- **Stakeholders:**
  - Bank fraud analysts
  - Loan applicants
- **KPI:** F1 Score (balances false positives and false negatives)

## 2. Data Collection & Preprocessing

- **Data Sources:**
  - Client behavior logs
  - SIM swap records
- **Potential Bias:** Under-representation of rural clients.
- **Preprocessing Steps:**
  1. Handle missing values (e.g., impute location or SIM swap gaps).
  2. Encode categorical variables (e.g., one-hot encoding).
  3. Normalize numerical features.

## AI Development Workflow - Fraud Detection API

This notebook supports the assignment _"Understanding the AI Development Workflow"_ for the PLP Academy.

**Use Case**: Real-time fraud detection during loan applications.

**Objective**: Apply the AI development workflow from problem definition to deployment using a realistic banking scenario. This notebook includes mock data, a basic model, and evaluation metrics.

## 📑 Table of Contents

1. Problem Definition
2. Data Collection & Preprocessing
3. Model Development
4. Evaluation Metrics
5. Deployment Plan
6. Ethics & Bias Considerations
7. Workflow Diagram

## 3. Model Development

We use a Random Forest Classifier due to its performance and ability to handle small datasets.

### Hyperparameters to Tune:
- `n_estimators` (number of trees)
- `max_depth` (depth of each tree)

## 4. Evaluation & Deployment

### Evaluation Metrics:
- **F1 Score**: Balances precision and recall, key for fraud detection.
- **Confusion Matrix**: Helps visualize model performance on each class.

### Concept Drift:
Fraud patterns change over time. We monitor performance weekly and retrain as needed.

### Deployment Challenge:
Scalability and latency are key if integrated into a real-time application system.

## 5. Ethics & Bias Considerations

- **Risk:** If the model is trained on biased data, it could unfairly flag applicants from certain locations.
- **Strategy:** Use diverse datasets, apply fairness constraints, and test model outcomes across demographic slices.

## 6. AI Development Workflow Diagram

A visual flowchart of the steps followed in this project:

1. Problem Definition
2. Data Collection
3. Preprocessing
4. Model Development
5. Evaluation
6. Deployment
7. Monitoring

## 7. Reflection

- **Challenge:** Designing realistic mock data and choosing the right features.
- **What I'd Improve:** With more time, I'd gather real data and perform hyperparameter tuning and model explainability (e.g., SHAP).