# NLP Modeling (BERT on Claim Descriptions)

## ✅ **GOAL**

Build a binary classification model using BERT for natural language inputs (e.g., claim descriptions). The model predicts the probability of a positive class (e.g., fraud = 1).

### 📦 Import Required Libraries

TFBertModel: Pretrained BERT model from HuggingFace in TensorFlow.

BertTokenizer: For tokenizing raw text into BERT inputs (input_ids, attention_mask).

Keras layers: To build the neural network.

Lambda: To wrap BERT inside a custom functional layer.

### 🤖 Load Pretrained BERT Model

Downloads and loads BERT-base (uncased) pretrained on general English text (Wikipedia + BookCorpus).

It outputs a hidden state for each token in the input, as well as the pooled output from the [CLS] token.

### 🧩 Define Input Placeholders

input_ids: Encoded token IDs from text input (max 128 tokens).

attention_mask: Indicates which tokens are actual input (1) and which are padding (0).

### 🧠 Custom Function to Get [CLS] Token

The [CLS] token at position 0 is typically used in BERT for sentence-level classification.

last_hidden_state[:, 0, :] extracts that 768-dimensional vector from BERT's output.

#### 🔁 Wrap BERT in Lambda Layer

This is necessary because Keras can’t automatically infer output shape from HuggingFace models.

Output is the [CLS] vector (shape: (batch_size, 768)), which contains the semantic summary of the input sentence.

### 🧱 Add Classification Layers

Dropout: Prevents overfitting by randomly dropping 30% of neurons.

Dense(128, relu): Learns non-linear representations.

Dense(1, sigmoid): Outputs a probability between 0 and 1—used for binary classification.

### 🧪 Compile the Model

Inputs: BERT-formatted data (input_ids, attention_mask)

Output: Probability of the positive class (e.g., fraud = 1)

Loss: Binary cross-entropy (standard for 2-class problems)

Optimizer: Adam (adaptive learning)

### 📋 Print Model Architecture

This prints a table with:

Input and output shapes of each layer

Number of trainable parameters (BERT has ~110M)

In [None]:
# %pip install tf-keras

In [58]:
# Import Text Data
import pandas as pd 

text_data = pd.read_csv('../data/text_data.csv')

# Import y data
y = pd.read_csv('../data/y.csv')

#Import X structured data
X_structured = pd.read_csv('../data/X_structured.csv')  

In [23]:
# Define maximum sequence length (typically 128 or 512 for BERT)
max_length = 128  # You can change this based on your needs

In [None]:
from transformers import TFBertModel, BertTokenizer
from tensorflow.keras.layers import Input, Dropout, Dense, Lambda
from tensorflow.keras.models import Model
import tensorflow as tf

# Load pretrained BERT model
bert_model = TFBertModel.from_pretrained("bert-base-uncased")

# Define inputs
input_ids = Input(shape=(max_length,), dtype=tf.int32, name="input_ids")
attention_mask = Input(shape=(max_length,), dtype=tf.int32, name="attention_mask")

# Wrap BERT in Lambda layer with output_shape defined
def extract_bert_cls(inputs):
    ids, mask = inputs
    outputs = bert_model([ids, mask])
    return outputs.last_hidden_state[:, 0, :]  # CLS token

# Define output shape: (batch_size, hidden_size)
cls_output = Lambda(extract_bert_cls, output_shape=(768,), name="bert_cls_output")([input_ids, attention_mask])

# Add classification layers
x = Dropout(0.3)(cls_output)
x = Dense(128, activation='relu')(x)
output = Dense(1, activation='sigmoid')(x)

# Build model
model = Model(inputs=[input_ids, attention_mask], outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Show model summary
model.summary()



Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

['../data/model.pkl']

In [None]:
predictions = model.predict(X_structured)  # Replace with your actual input
dump(predictions, "bert_predictions.joblib")

In [60]:
texts = text_data['claim_description'].tolist()  # Replace with your text column

In [61]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
encodings = tokenizer(texts, truncation=True, padding='max_length', max_length=128, return_tensors='tf')

# train the model
history = model.fit(
    x={'input_ids': encodings['input_ids'], 'attention_mask': encodings['attention_mask']},
    y=y['fraud_reported'],  # Replace with your target column
    batch_size=32,
    epochs=3,
    validation_split=0.1
)


Epoch 1/3
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 907ms/step - accuracy: 0.7609 - loss: 0.5439 - val_accuracy: 0.7900 - val_loss: 0.4072
Epoch 2/3
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 898ms/step - accuracy: 0.7778 - loss: 0.4755 - val_accuracy: 0.7700 - val_loss: 0.3973
Epoch 3/3
[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 876ms/step - accuracy: 0.7810 - loss: 0.4553 - val_accuracy: 0.7800 - val_loss: 0.3940


In [62]:
predictions = model.predict({
    "input_ids": encodings["input_ids"],
    "attention_mask": encodings["attention_mask"]
})


[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 839ms/step


In [64]:
import numpy as np
np.save("../data/bert_predictions.npy", predictions)
