# Offline RL Agent for LendingClub (IQL) — Notebook

This notebook contains an end-to-end offline RL pipeline using `d3rlpy`.

**What it does:**
- Load and preprocess a sampled subset of `accepted_2007_to_2018.csv`.
- Create a compact feature set and preprocessing pipeline.
- Synthesize `deny` actions so the dataset includes both approve/deny.
- Build a `d3rlpy` `MDPDataset`, train an IQL agent, and evaluate estimated policy value.

**Notes:**
- Place `accepted_2007_to_2018.csv` in the same directory before running cells.
- For quicker iteration, reduce the sample size.


In [2]:
!pip install d3rlpy==2.3.0


Collecting d3rlpy==2.3.0
  Downloading d3rlpy-2.3.0-py3-none-any.whl.metadata (10 kB)
Collecting torch>=2.0.0 (from d3rlpy==2.3.0)
  Downloading torch-2.9.0-cp312-cp312-win_amd64.whl.metadata (30 kB)
Collecting gym>=0.26.0 (from d3rlpy==2.3.0)
  Downloading gym-0.26.2.tar.gz (721 kB)
     ---------------------------------------- 0.0/721.7 kB ? eta -:--:--
     --------------------------- ---------- 524.3/721.7 kB 4.2 MB/s eta 0:00:01
     ---------------------------------------- 721.7/721.7 kB 2.9 MB/s  0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting structlog (from d3rlpy==2.3.0)
  Downloading structlog-25.4.0-py3-none-any.whl.metadata (7.6 kB)
Collecting dataclasses-json (from d3rlpy==2.3

ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\\Python312\\Scripts\\torchfrtrace.exe' -> 'C:\\Python312\\Scripts\\torchfrtrace.exe.deleteme'


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
import sys
!{sys.executable} -m pip install --upgrade pip


Collecting pip
  Downloading pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.3-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
   ----------------------- ---------------- 1.0/1.8 MB 4.2 MB/s eta 0:00:01
   ---------------------------------------- 1.8/1.8 MB 4.2 MB/s  0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.2
    Uninstalling pip-25.2:
      Successfully uninstalled pip-25.2
  Rolling back uninstall of pip
  Moving to c:\users\dkpra\appdata\roaming\python\python312\scripts\pip.exe
   from C:\Users\dkpra\AppData\Local\Temp\pip-uninstall-0xmj1qqz\pip.exe
  Moving to c:\users\dkpra\appdata\roaming\python\python312\scripts\pip3.12.exe
   from C:\Users\dkpra\AppData\Local\Temp\pip-uninstall-0xmj1qqz\pip3.12.exe
  Moving to c:\users\dkpra\appdata\roaming\python\python312\scripts\pip3.exe
   fro

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\Python312\\Lib\\site-packages\\pip\\__init__.py'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import os
import gc
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer   
from sklearn.preprocessing import OrdinalEncoder
import joblib
from d3rlpy.dataset import MDPDataset
from d3rlpy.algos import IQL
from tqdm import tqdm

print('Libraries imported')


In [None]:
def map_loan_status(status):
    s = str(status).lower()
    if 'fully paid' in s:
        return 0
    if 'charged off' in s or 'default' in s:
        return 1
    return None

def compute_reward_row(row):
    if row['target'] == 0:
        return row['loan_amnt'] * (row['int_rate'] / 100.0)
    else:
        return - float(row['loan_amnt'])

print('Utility functions defined')


In [None]:
# Parameters
CSV_PATH = 'accepted_2007_to_2018.csv'  
SAMPLE_N = 200000   # set to None to use full filtered dataset (may be large)
RANDOM_STATE = 42
N_STEPS = 20000     # steps for IQL training (adjust for speed/quality)

# Load CSV
assert os.path.exists(CSV_PATH), f"CSV not found at {CSV_PATH}. Place the file in the working dir."
print('Loading... (this may take a while)')
df = pd.read_csv(CSV_PATH, low_memory=False)
print('Loaded dataframe shape:', df.shape)


In [None]:

df['target'] = df['loan_status'].map(map_loan_status)
print('Value counts (including NaN):')
print(df['target'].value_counts(dropna=False))

before = df.shape[0]
df = df[df['target'].notnull()].copy()
after = df.shape[0]
print(f'Filtered to final statuses: kept {after} rows (dropped {before-after})')

# Optionally sample for quick runs
if SAMPLE_N is not None and SAMPLE_N < len(df):
    df = df.sample(SAMPLE_N, random_state=RANDOM_STATE).reset_index(drop=True)
    print('Sampled down to', len(df))


In [None]:
# Feature selection
features = [
    'loan_amnt', 'term', 'int_rate', 'installment', 'annual_inc', 'dti',
    'emp_length', 'home_ownership', 'verification_status', 'purpose',
    'pub_rec', 'delinq_2yrs', 'inq_last_6mths', 'open_acc', 'total_acc', 'grade'
]
features = [f for f in features if f in df.columns]
print('Using features:', features)

if 'int_rate' in df.columns and df['int_rate'].dtype == object:
    df['int_rate'] = df['int_rate'].str.rstrip('%').astype(float)

if 'emp_length' in df.columns:
    def emp_len_to_num(x):
        try:
            if pd.isna(x):
                return np.nan
            s = str(x)
            if '10+' in s:
                return 10.0
            if '<' in s:
                return 0.0
            import re
            m = re.search(r"(\d+)", s)
            if m:
                return float(m.group(1))
        except Exception:
            return np.nan
        return np.nan
    df['emp_length'] = df['emp_length'].apply(emp_len_to_num)

# Prepare X and y
X_raw = df[features].copy()
y = df['target'].astype(int).values
print('Prepared X_raw with shape', X_raw.shape)


In [None]:
numeric_cols = X_raw.select_dtypes(include=[np.number]).columns.tolist()
cat_cols = [c for c in X_raw.columns if c not in numeric_cols]

ord_cols = []
if 'grade' in cat_cols:
    ord_cols.append('grade')
    cat_cols.remove('grade')

print('Numeric cols:', numeric_cols)
print('Categorical cols:', cat_cols)
print('Ordinal cols:', ord_cols)

num_pipeline = Pipeline([
    ('impute', SimpleImputer(strategy='median')),
    ('scale', StandardScaler())
])

cat_pipeline = Pipeline([
    ('impute', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse=False))
])

transformers = []
if len(numeric_cols) > 0:
    transformers.append(('num', num_pipeline, numeric_cols))
if len(cat_cols) > 0:
    transformers.append(('cat', cat_pipeline, cat_cols))
if len(ord_cols) > 0:
    transformers.append(('ord', OrdinalEncoder(categories=[['A','B','C','D','E','F','G']]), ord_cols))

preprocessor = ColumnTransformer(transformers, remainder='drop')

print('Fitting preprocessor...')
X_processed = preprocessor.fit_transform(X_raw)
print('Processed shape:', X_processed.shape)

joblib.dump(preprocessor, 'preprocessor.joblib')
print('Saved preprocessor to preprocessor.joblib')


In [None]:
if 'loan_amnt' in df.columns:
    df['loan_amnt'] = df['loan_amnt'].astype(float)
if 'int_rate' in df.columns:
    df['int_rate'] = df['int_rate'].astype(float)

# Compute reward if approve
df['reward_if_approve'] = df.apply(lambda r: compute_reward_row(r), axis=1)

observations = X_processed.astype('float32')

n = observations.shape[0]
observations_dup = np.zeros((n*2, observations.shape[1]), dtype=np.float32)
actions = np.zeros((n*2,), dtype=np.int64)
rewards = np.zeros((n*2,), dtype=np.float32)
terminals = np.ones((n*2,), dtype=bool)

for i in range(n):
    observations_dup[2*i] = observations[i]
    actions[2*i] = 1
    rewards[2*i] = float(df.iloc[i]['reward_if_approve'])
    observations_dup[2*i+1] = observations[i]
    actions[2*i+1] = 0
    rewards[2*i+1] = 0.0

print('Built duplicated dataset with shape', observations_dup.shape)

# Train/test split
obs_train, obs_test, act_train, act_test, rew_train, rew_test = train_test_split(
    observations_dup, actions, rewards, test_size=0.2, random_state=RANDOM_STATE
)

print('Train size:', obs_train.shape[0], 'Test size:', obs_test.shape[0])

# MDPDataset for d3rlpy
dataset = MDPDataset(obs_train, act_train, rew_train, terminals[:len(obs_train)])
print('MDPDataset created')


In [None]:
# Train IQL (may take time)

algo = IQL(actor_learning_rate=3e-4, critic_learning_rate=3e-4, temp_learning_rate=1e-4)

print('Starting training...')
algo.fit(dataset, n_steps=N_STEPS, verbose=True)

# Save model
algo.save_model('iql_offline_model')
print('Saved model to iql_offline_model')


In [None]:
# Evaluate: compute estimated policy value on test subset
mask_approve_in_test = (act_test == 1)
states_to_eval = obs_test[mask_approve_in_test]
true_rewards_if_approve = rew_test[mask_approve_in_test]

predicted_actions = algo.predict(states_to_eval)

est_rewards = [true_rewards_if_approve[i] if predicted_actions[i] == 1 else 0.0 for i in range(len(predicted_actions))]
est_policy_value = np.mean(est_rewards)
baseline_always_approve = np.mean(true_rewards_if_approve)

print(f'Estimated policy value (avg reward per app) on test subset: {est_policy_value:.4f}')
print(f'Baseline always-approve avg reward on test subset: {baseline_always_approve:.4f}')


## Wrap up

- `preprocessor.joblib` saved for feature transforms.
- `iql_offline_model` directory contains saved IQL model.

**Caveats reminder:** the deny actions were synthesized; for production you need real reject data or careful OPE.

You can now inspect `iql_offline_model`, tune `N_STEPS`, or try different algorithms (CQL, AWAC) via `d3rlpy`.
