
# Latent Class Regression (Finite Mixture of Regressions) — Python Notebook

**When to Use**  
- Your data are a **mixture of subpopulations** (segments) with **different regression relationships** (e.g., price‑sensitive vs brand‑loyal customers).  
- You want to **discover unobserved classes** and estimate a separate regression for each class while learning **class membership probabilities**.

**Best Application**  
- **Segmentation with outcomes** (e.g., spend, CLV) where segments have distinct slopes/intercepts.  
- Marketing response modeling where treatment effects vary by **unobserved groups**.  
- As a precursor to **targeting** (assign customers to the most likely class).

**When Not to Use**  
- If differences are purely **continuous heterogeneity** (not clustered), prefer **hierarchical/mixed models**.  
- If class count is very large or unstable and you need **individual‑level draws**, consider **HB (random coefficients/mixed logit)**.

**How to Interpret Results**  
- Each class has its own **regression coefficients** and **variance**.  
- **Mixing weights** indicate the prevalence of each segment.  
- **Responsibilities** (posterior class probabilities) provide **soft assignments** per observation; use them to profile segments.  
- Use **BIC**/**AIC** to select the number of classes K; validate stability across seeds.


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from numpy.linalg import inv, slogdet

pd.set_option('display.max_columns', 200)
plt.rcParams['figure.figsize'] = (8,4)
rng = np.random.default_rng(123)


### Data: Synthetic segments with distinct regression slopes

In [None]:

n = 1200
# True segments (K=3)
z_true = rng.choice([0,1,2], size=n, p=[0.45, 0.35, 0.20])

X1 = rng.normal(0, 1, size=n)
X2 = rng.normal(0, 1, size=n)
X = np.c_[np.ones(n), X1, X2]  # intercept + two features

# Class-specific betas and sigmas
betas = np.array([
    [5.0,  2.0,  0.0],   # class 0: strong X1
    [3.0, -1.0,  2.5],   # class 1: negative X1, strong X2
    [1.5,  0.0, -2.0],   # class 2: strong negative X2
])
sigmas = np.array([1.0, 1.3, 0.8])

y = np.array([X[i] @ betas[z_true[i]] + rng.normal(0, sigmas[z_true[i]]) for i in range(n)])

df = pd.DataFrame({'y': y, 'x1': X1, 'x2': X2, 'true_class': z_true})
df.head()


### EM Algorithm for Finite Mixture of Regressions

In [None]:

def em_mixture_regression(y, X, K=2, max_iter=200, tol=1e-6, seed=0):
    rng = np.random.default_rng(seed)
    n, p = X.shape

    # Initialize responsibilities randomly
    resp = rng.dirichlet(alpha=np.ones(K), size=n)  # n x K

    # Parameter containers
    pis = np.ones(K) / K
    betas = np.zeros((K, p))
    sig2 = np.ones(K)

    ll_history = []

    for it in range(max_iter):
        # M-step
        Nk = resp.sum(axis=0) + 1e-12
        pis = Nk / n

        for k in range(K):
            W = np.diag(resp[:,k])
            XtW = X.T @ W
            beta_k = inv(XtW @ X + 1e-8*np.eye(p)) @ (XtW @ y)
            betas[k] = beta_k
            resid = y - X @ beta_k
            sig2[k] = (resp[:,k] * resid**2).sum() / Nk[k]

        # E-step: update responsibilities
        dens = np.zeros((n, K))
        for k in range(K):
            resid = y - X @ betas[k]
            dens[:,k] = pis[k] * (1/np.sqrt(2*np.pi*sig2[k])) * np.exp(-0.5 * (resid**2) / sig2[k])

        # avoid zeros
        dens = np.clip(dens, 1e-300, None)
        ll = np.sum(np.log(dens.sum(axis=1)))
        ll_history.append(ll)

        resp = dens / dens.sum(axis=1, keepdims=True)

        if it > 0 and abs(ll_history[-1] - ll_history[-2]) < tol:
            break

    # Compute BIC = -2*LL + k*log(n); parameters per class: p betas + 1 variance + 1 mixing - but sum pi=1 => (K-1) free pis
    n_params = K * (p + 1) + (K - 1)
    bic = -2*ll_history[-1] + n_params * np.log(n)
    return {
        'pis': pis, 'betas': betas, 'sig2': sig2,
        'resp': resp, 'loglike': ll_history[-1], 'bic': bic, 'll_hist': ll_history
    }

Xmat = np.c_[np.ones(len(df)), df['x1'].values, df['x2'].values]
yvec = df['y'].values


### Fit Models with K=2 and K=3 Classes; Select via BIC

In [None]:

res2 = em_mixture_regression(yvec, Xmat, K=2, seed=7)
res3 = em_mixture_regression(yvec, Xmat, K=3, seed=7)

comp = pd.DataFrame({
    'K': [2,3],
    'loglike': [res2['loglike'], res3['loglike']],
    'BIC': [res2['bic'], res3['bic']]
})
comp


In [None]:

best = res3 if res3['bic'] < res2['bic'] else res2
K = 3 if best is res3 else 2
K, best['pis'], best['betas'], best['sig2']


### Posterior Class Probabilities (Responsibilities) and Assignments

In [None]:

resp = best['resp']
assign = resp.argmax(axis=1)
df['class_hat'] = assign
for k in range(K):
    df[f'resp_{k}'] = resp[:,k]

df[['class_hat','resp_0']].head()


### Segment Profiles and Coefficients

In [None]:

pis = best['pis']
betas = best['betas']
sig2 = best['sig2']

coef_table = []
for k in range(K):
    coef_table.append({
        'class': k,
        'mix_weight': pis[k],
        'intercept': betas[k,0],
        'beta_x1': betas[k,1],
        'beta_x2': betas[k,2],
        'sigma': np.sqrt(sig2[k])
    })
coef_df = pd.DataFrame(coef_table).round(4)
coef_df


In [None]:

# Profile by means of x1/x2 and y within each assigned class
profile = df.groupby('class_hat')[['x1','x2','y']].mean().rename(columns=lambda c: c+'_mean')
pd.concat([coef_df.set_index('class'), profile], axis=1)


### Visualization: Scatter with Class Colors and Fitted Lines

In [None]:

colors = np.array(['tab:blue','tab:orange','tab:green','tab:red','tab:purple'])
cmap = colors[assign]

plt.scatter(df['x1'], df['y'], c=cmap, alpha=0.4, s=15)
xline = np.linspace(df['x1'].min(), df['x1'].max(), 100)
Xline = np.c_[np.ones_like(xline), xline, np.full_like(xline, df['x2'].mean())]

for k in range(K):
    yk = Xline @ betas[k]
    plt.plot(xline, yk, label=f'class {k} fit')

plt.xlabel('x1'); plt.ylabel('y')
plt.title('Latent Class Regression: y vs x1 (lines at avg x2)')
plt.legend(); plt.show()


### Predict for New Observations (mixture expectation and classwise)

In [None]:

def predict_mixture(Xnew, res):
    pis, betas = res['pis'], res['betas']
    yk = Xnew @ betas.T  # n_new x K
    # Mixture conditional expectation (ignore noise)
    return (yk * pis).sum(axis=1), yk

Xnew = np.c_[np.ones(5), np.linspace(-1.5, 1.5, 5), np.linspace(-1.0, 1.0, 5)]
y_mix, y_by_class = predict_mixture(Xnew, best)

pd.DataFrame({
    'x1': Xnew[:,1], 'x2': Xnew[:,2], 'y_mixture_pred': np.round(y_mix,3),
    **{f'y_class{k}': np.round(y_by_class[:,k],3) for k in range(K)}
})



---

### Practical Guidance
- Run EM with **multiple random seeds**; choose the best (highest loglike / lowest BIC).  
- Standardize features before modeling when scales differ; consider **regularization** for stability.  
- Use **soft assignments** (responsibilities) to avoid hard thresholding when targeting.  
- Compare against **HB/random‑coefficient** models to check whether heterogeneity is continuous rather than clustered.

### References (non‑link citations)
1. Wedel & Kamakura — *Market Segmentation: Conceptual and Methodological Foundations*.  
2. McLachlan & Peel — *Finite Mixture Models*.  
3. Rossi, Allenby & McCulloch — *Bayesian Statistics and Marketing*.
