# Neuro-Symbolic Reasoner â€” Rules + Linear Model (Toy but Industrial)

This notebook demonstrates a practical neuro-symbolic pattern:

- a **learned model** produces a probability (linear classifier)
- a **rule engine** enforces hard constraints / overrides
- evaluation compares: model-only vs rules-only vs hybrid

Use-case: transaction risk scoring with compliance rules.
Outputs are saved in the notebook when executed.

In [1]:
import numpy as np
import pandas as pd
from dataclasses import dataclass
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

SEED = 1337
rng = np.random.default_rng(SEED)
pd.set_option('display.max_columns', 50)

## 1) Synthetic risk dataset

In [2]:
n = 12000
amount = rng.lognormal(mean=4.2, sigma=0.6, size=n)  # ~ 30..5000
country_risk = rng.choice([0,1,2], size=n, p=[0.75,0.2,0.05])  # 0 low, 2 high
is_new_device = rng.integers(0,2,size=n)
failed_logins = rng.poisson(0.4, size=n)
hour = rng.integers(0,24,size=n)

# latent probability
logit = (
  0.002*(amount-200) +
  0.9*(country_risk==2) + 0.35*(country_risk==1) +
  0.7*is_new_device +
  0.25*np.minimum(failed_logins, 5) +
  0.25*((hour<=5) | (hour>=23)) +
  rng.normal(0, 0.7, size=n)
)
p = 1/(1+np.exp(-logit))
y = (p > 0.6).astype(int)

df = pd.DataFrame({
  'amount': amount,
  'country_risk': country_risk,
  'is_new_device': is_new_device,
  'failed_logins': failed_logins,
  'hour': hour,
  'fraud': y
})
df.head(), df['fraud'].mean()

(       amount  country_risk  is_new_device  failed_logins  hour  fraud
 0   68.235225             0              0              0     3      0
 1   88.620381             0              0              1    12      0
 2   61.396477             0              1              1    14      1
 3   28.974013             0              1              0     9      1
 4  302.496955             2              0              2     2      1,
 np.float64(0.5035))

## 2) Learned model (logistic regression)

In [3]:
X = df[['amount','country_risk','is_new_device','failed_logins','hour']].values
y = df['fraud'].values
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=SEED, stratify=y)

clf = LogisticRegression(max_iter=400, class_weight='balanced')
clf.fit(Xtr, ytr)
p_model = clf.predict_proba(Xte)[:,1]
pred_model = (p_model >= 0.5).astype(int)
print('MODEL ONLY')
print(classification_report(yte, pred_model))

MODEL ONLY
              precision    recall  f1-score   support

           0       0.71      0.66      0.69      1489
           1       0.69      0.74      0.71      1511

    accuracy                           0.70      3000
   macro avg       0.70      0.70      0.70      3000
weighted avg       0.70      0.70      0.70      3000



## 3) Symbolic rules (hard constraints)
Example rules:
- if country_risk=2 and amount>1500 => fraud
- if failed_logins>=4 and new_device => fraud
- if amount<20 and low risk and no failed_logins => legit

In [4]:
def rules(row):
  amt, cr, nd, fl, hr = row
  # forced fraud rules
  if cr == 2 and amt > 1500: return 1
  if fl >= 4 and nd == 1: return 1
  # forced legit rules
  if amt < 20 and cr == 0 and fl == 0: return 0
  return None

pred_rules = []
for r in Xte:
  out = rules(r)
  pred_rules.append(out if out is not None else 0)
pred_rules = np.array(pred_rules)
print('RULES ONLY (fallback=0)')
print(classification_report(yte, pred_rules))

RULES ONLY (fallback=0)
              precision    recall  f1-score   support

           0       0.50      1.00      0.66      1489
           1       1.00      0.00      0.00      1511

    accuracy                           0.50      3000
   macro avg       0.75      0.50      0.33      3000
weighted avg       0.75      0.50      0.33      3000



## 4) Hybrid: model + rule overrides

In [5]:
pred_hybrid = pred_model.copy()
overrides = 0
for i, r in enumerate(Xte):
  out = rules(r)
  if out is not None:
    pred_hybrid[i] = out
    overrides += 1

print('HYBRID (overrides=', overrides, ')')
print(classification_report(yte, pred_hybrid))

HYBRID (overrides= 28 )
              precision    recall  f1-score   support

           0       0.71      0.66      0.69      1489
           1       0.69      0.74      0.71      1511

    accuracy                           0.70      3000
   macro avg       0.70      0.70      0.70      3000
weighted avg       0.70      0.70      0.70      3000

