## Naive Bayes

1. **Bayes’ Theorem**

   $$
   P(C \mid \mathbf{x})
   = \frac{P(\mathbf{x}\mid C)\,P(C)}{P(\mathbf{x})}
   $$

   * We compare $P(\mathbf{x}\mid C)\,P(C)$ across classes (the denominator is common).

<br>

2. **“Naive” Independence**

   $$
   P(\mathbf{x}\mid C)
   = \prod_{i=1}^n P(x_i\mid C)
   $$

   * Cuts parameter count from $O(k^n)$ to $O(n\,k)$.
   * Makes training feasible for high-dimensional data.

<br>

3. **Training**

   * **Priors**: $P(C)=\tfrac{\text{count}(C)}{m}$.
   * **Likelihoods**:

     * *Categorical*: $\displaystyle P(x_i=v\mid C)\!=\!\frac{N_{i,C}(v)+\alpha}{N_C+\alpha\,V_i}.$
     * *Gaussian*: estimate $\mu_{i,C}, \sigma^2_{i,C}$ per feature.

<br>

4. **Prediction**

   $$
   \log P(C\mid \mathbf{x})
   \;\propto\;
   \log P(C)
   +\sum_{i=1}^n \log P(x_i\mid C)
   $$

   * Choose the class with the highest log-score.

<br>

5. **Common Variants**

|                                 Variant                                 |                                                                            When to use                                                                            |                                                                 Likelihood Model                                                                |
| :---------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: |
|                             **Multinomial**                             |       When features are **counts or frequencies** (e.g., number of times a word appears). Best for **text classification** with Bag-of-Words or TF vectors.       |  $P(x_i = k \mid C) = \displaystyle\frac{\text{count of }i\text{ in }C + \alpha}{\sum_j \bigl(\text{count of }j\text{ in }C\bigr) + \alpha V}$  |
|                              **Bernoulli**                              |                       When features are **binary** (e.g., whether a word is present or not). Good for **short text** or sparse binary input.                      |                                    $P(x_i = 1 \mid C) = \displaystyle\frac{N_{i,C} + \alpha}{N_C + 2\alpha}$                                    |
|                               **Gaussian**                              |           When features are **continuous and approximately normally distributed**. Common in **sensor data**, **medical data**, and numerical features.           |        $P(x_i \mid C) = \displaystyle\frac{1}{\sqrt{2\pi\sigma_{i,C}^2}}\exp\!\Bigl(-\tfrac{(x_i - \mu_{i,C})^2}{2\sigma_{i,C}^2}\Bigr)$        |
|                                 **Beta**                                |  When features are **real values constrained in $[0,1]$** (e.g., proportions, probabilities, normalized pixel intensities). Useful when Gaussian is too flexible. |                 $P(x_i \mid C) = \displaystyle\frac{x_i^{\alpha_{i,C}-1}(1-x_i)^{\beta_{i,C}-1}}{B(\alpha_{i,C},\,\beta_{i,C})}$                |
|                         **Mixture of Gaussians**                        | When feature distributions are **multi-modal** and you need more flexibility than a single Gaussian—still assumes **independence** if using diagonal covariances. |              $P(x_i \mid C) = \displaystyle\sum_{m=1}^{M} w_{i,C,m}\;\mathcal{N}\bigl(x_i\mid\mu_{i,C,m},\,\sigma^2_{i,C,m}\bigr)$,             |
|                                 **KDE**                                 |   When you want **no strong assumption** on the data distribution. It’s a **non-parametric** method that places a kernel at **every training point**, so it can capture arbitrarily complex shapes but is **computationally expensive** at inference.   | $P(x_i \mid C) = \displaystyle\frac{1}{n\,h\,\sqrt{2\pi}}\sum_{j=1}^{n}\exp\!\Bigl(-\tfrac{(x_i - x_{ij})^2}{2h^2}\Bigr)$ for a Gaussian kernel |





<br>

6. **Pros & Cons**

   * **Pros**: Fast to train/predict, works with high-dimensional sparse data, needs little data.
   * **Cons**: Independence assumption often violated; zero-frequency issues (solved by smoothing).

<br>

---

> **Extra note (why logs?):**
> Multiplying many small $P(x_i\mid C)$ can underflow to zero. By taking $\log$, you convert products into sums, avoid underflow/overflow, speed up computation, and keep the same class ordering since $\log$ is monotonic.


In [14]:
import pickle
import numpy as np
import os
from tqdm import tqdm
from unpickle_CIFAR10 import get_data, get_test_data

### Load & Preprocess CIFAR-10

In [15]:
# 1) Load
x_train, y_train = get_data(1)
x_test, y_test = get_test_data()

# 2) Normalize to [0,1] and flatten
x_train = x_train.astype(np.float32) / 255.0
x_test = x_test.astype(np.float32) / 255.0

n_samples, h, w, c = x_train.shape  # h=32, w=32, c=3
x_train = x_train.reshape(n_samples, h * w * c)  # flatten to (N, 3072)
x_test = x_test.reshape(x_test.shape[0], h * w * c)  # flatten to (N, 3072)

### Estimate Class Priors

In [16]:
classes = np.unique(y_train)
n_classes = len(classes)

# Count how many training samples per class and divide by total
priors = np.array([np.mean(y_train == c) for c in np.unique(y_train)])  # Prior probabilities P(C)

### Compute Feature-Conditional Statistics
Gaussian assumption

In [17]:
# For each class 𝐶 and pixel-feature 𝑖, compute the sample mean and variance
means = np.zeros((n_classes, x_train.shape[1]), dtype=np.float32)
vars_ = np.zeros((n_classes, x_train.shape[1]), dtype=np.float32)

for idx, c in enumerate(classes):
    # select samples of class c
    x_c = x_train[y_train == c]

    means[idx, :] = x_c.mean(axis=0)  # μ_{i,C}
    vars_[idx, :] = x_c.var(axis=0)  # σ²_{i,C}


def log_gaussian_pdf(x, mean, var):
    return -0.5 * (np.log(2 * np.pi * var) + ((x - mean) ** 2 / var))

In [18]:
def predict(x):
    log_probs = np.log(priors)  # log P(C) 
    for idx in range(n_classes):
        log_probs[idx] += np.sum(log_gaussian_pdf(x, means[idx], vars_[idx]))
    return classes[np.argmax(log_probs)]


# Vectorized prediction on test set
y_pred = np.array([predict(x) for x in x_test])


accuracy = np.mean(y_pred == y_test)
print(f"CIFAR-10 test accuracy (Gaussian NB from scratch): {accuracy*100:.2f}%")

CIFAR-10 test accuracy (Gaussian NB from scratch): 29.30%


<h1 style="font-size: 40px;">Compare variants</h1>

In [19]:
# ── 1) Gaussian Naive Bayes ────────────────────────────────────────────────

def train_gaussian_nb(x, y):
    classes = np.unique(y)
    n_classes = len(classes)
    n_features = x.shape[1]

    means  = np.zeros((n_classes, n_features), dtype=np.float32)
    vars_  = np.zeros((n_classes, n_features), dtype=np.float32)
    for idx, c in enumerate(classes):
        xc = x[y == c]
        means[idx] = xc.mean(axis=0)
        vars_[idx] = xc.var(axis=0) + 1e-9
    return means, vars_

def predict_gaussian(x, priors, means, vars_):
    logp = np.log(priors).copy()
    for idx in range(n_classes):
        m, v = means[idx], vars_[idx]
        logp[idx] += np.sum(-0.5*(np.log(2*np.pi*v) + ((x-m)**2)/v))
    return classes[np.argmax(logp)]


# ── 2) Bernoulli Naive Bayes ───────────────────────────────────────────────

def train_bernoulli_nb(x, y, thr=0.5, alpha=1.0):
    classes = np.unique(y)
    n_classes = len(classes)
    n_features = x.shape[1]

    xb = (x > thr).astype(int)
    probs  = np.zeros((n_classes, n_features))
    for idx, c in enumerate(classes):
        xc = xb[y == c]
        probs[idx] = (xc.sum(axis=0) + alpha) / (len(xc) + 2*alpha)
    return probs

def predict_bernoulli(x, priors, probs, thr=0.5):
    xb = (x > thr).astype(int)
    logp = np.log(priors).copy()
    for idx in range(n_classes):
        p = probs[idx]
        logp[idx] += np.sum(xb * np.log(p) + (1-xb)*np.log(1-p))
    return classes[np.argmax(logp)]


# ── 3) Multinomial Naive Bayes (with discretization) ───────────────────────

def train_multinomial_nb(x, y, bins=8, alpha=1.0):
    classes = np.unique(y)
    n_classes = len(classes)
    n_features = x.shape[1]

    edges = np.linspace(0, 1, bins+1)
    xd = np.digitize(x, bins=edges) - 1
    cond = np.zeros((n_classes, n_features, bins))
    for idx, c in enumerate(classes):
        xc = xd[y == c]
        counts = np.stack([np.sum(xc==k, axis=0) for k in range(bins)], axis=1)
        cond[idx] = (counts + alpha) / (len(xc) + alpha*bins)
    return cond, edges

def predict_multinomial(x, priors, cond, edges):
    bins = cond.shape[2]
    xd = np.clip(np.digitize(x, bins=edges)-1, 0, bins-1)
    n_features = x.shape[0]
    logp = np.log(priors).copy()
    for idx in range(n_classes):
        logp[idx] += np.sum(np.log(cond[idx, np.arange(n_features), xd]))
    return classes[np.argmax(logp)]


# ── 4) Beta-distribution Naive Bayes ──────────────────────────────────────
from scipy.special import betaln

def train_beta_nb(x, y, eps=1e-3):
    classes = np.unique(y)
    n_classes = len(classes)
    n_features = x.shape[1]

    alphas = np.zeros((n_classes, n_features))
    betas  = np.zeros((n_classes, n_features))
    for idx, c in enumerate(classes):
        xc = np.clip(x[y==c], eps, 1-eps)
        mu = xc.mean(axis=0)
        var = xc.var(axis=0) + 1e-9
        tmp = (mu*(1-mu)/var) - 1
        alphas[idx] = np.maximum(mu*tmp, eps)
        betas[idx]  = np.maximum((1-mu)*tmp, eps)
    return alphas, betas

def predict_beta(x, priors, alphas, betas, eps=1e-6):
    x = np.clip(x, eps, 1-eps)
    logp = np.log(priors).copy()
    for idx in range(n_classes):
        a, b = alphas[idx], betas[idx]
        log_pdf = (a-1)*np.log(x) + (b-1)*np.log(1-x) - betaln(a, b)
        logp[idx] += np.sum(log_pdf)
    return classes[np.argmax(logp)]


# ── 5) Mixture-of-Gaussians Naive Bayes ────────────────────────────────────
from sklearn.mixture import GaussianMixture

def train_mog_nb(x, y, n_components=3):
    classes = np.unique(y)
    n_classes = len(classes)
    n_features = x.shape[1]

    gmms   = []
    for c in classes:
        xc = x[y == c]  # shape (N_c, n_features)
        gmm = GaussianMixture(
            n_components=n_components,
            covariance_type='diag',
            max_iter=100,
            random_state=0
        )
        gmm.fit(xc)
        gmms.append(gmm)
    return gmms

def predict_mog_nb(x, priors, gmms):
    logp = np.log(priors).copy()
    for idx, gmm in enumerate(gmms):
        # gmm.score_samples returns log p(x|class)
        logp[idx] += gmm.score_samples(x.reshape(1, -1))[0]
    return classes[np.argmax(logp)]


# # ── 6) KDE-NB ──────────────────────────────────────────────────────────────

# def train_kde_nb(x, y, kernel_bandwidth=None):
    # classes = np.unique(y)
    # n_classes = len(classes)
    # n_features = x.shape[1]

#     # collect per-class training values
#     values = [x[y==c] for c in classes]  # each is (N_c, n_features)
#     # estimate bandwidth per class+feature if not given
#     if kernel_bandwidth is None:
#         kernel_bandwidth = []
#         for xc in values:
#             # Silverman’s rule per feature
#             std = xc.std(axis=0) + 1e-9
#             n_c = xc.shape[0]
#             h   = 1.06 * std * (n_c ** -0.2)
#             kernel_bandwidth.append(h)
#     return values, kernel_bandwidth

# def predict_kde(x, priors, values, bandwidths):
#     logp = np.log(priors).copy()
#     for idx in range(n_classes):
#         xc = values[idx]      # (N_c, n_features)
#         h  = bandwidths[idx]  # (n_features,)
#         # compute per-feature KDE estimate
#         # P(x_i|C) = 1/(N_c * h_i * sqrt(2π)) * sum_j exp(-0.5*((x_i-xc_j)/h_i)^2)
#         diffs = (x - xc[:,None]) / h[None,:,]      # shape (N_c, n_features)
#         kernel_vals = np.exp(-0.5 * diffs**2)      # same shape
#         # sum over training points
#         sums = kernel_vals.sum(axis=0)             # (n_features,)
#         logp[idx] += np.sum(np.log(sums) - np.log(xc.shape[0]*h*np.sqrt(2*np.pi)))
#     return classes[np.argmax(logp)]

In [20]:
# ── RUN & COMPARE ─────────────────────────────────────────────────────────

# priors
priors = np.array([np.mean(y_train == c) for c in classes])

means_g, vars_g = train_gaussian_nb(x_train, y_train)
probs_b = train_bernoulli_nb(x_train, y_train)
cond_m, edges_m = train_multinomial_nb(x_train, y_train, bins=8)
alphas_bt, betas_bt = train_beta_nb(x_train, y_train)
gmms = train_mog_nb(x_train, y_train, n_components=3)
# values_k, bw_k = train_kde_nb(x_train, y_train)

# predictions
y_pred_g   = np.array([predict_gaussian(x, priors, means_g, vars_g)         for x in tqdm(x_test, desc="Gaussian")])
y_pred_b   = np.array([predict_bernoulli(x, priors, probs_b)            for x in tqdm(x_test, desc="Bernoulli")])
y_pred_m   = np.array([predict_multinomial(x, priors, cond_m, edges_m)          for x in tqdm(x_test, desc="Multinomial")])
y_pred_bt  = np.array([predict_beta(x, priors, alphas_bt, betas_bt)         for x in tqdm(x_test, desc="Beta")])
y_pred_mog = np.array([predict_mog_nb(x, priors, gmms)          for x in tqdm(x_test, desc="Mix-Gaussians")])
# y_pred_kde = np.array([predict_kde(x, priors, values_k, bw_k)           for x in tqdm(x_test, desc="KDE")])

# accuracies
acc_g   = np.mean(y_pred_g   == y_test)
acc_b   = np.mean(y_pred_b   == y_test)
acc_m   = np.mean(y_pred_m   == y_test)
acc_bt  = np.mean(y_pred_bt  == y_test)
acc_mog = np.mean(y_pred_mog == y_test)
# acc_kde = np.mean(y_pred_kde == y_test)

print(" \n Test accuracies on CIFAR-10:")
print(f" • Gaussian NB:     {acc_g*100:6.2f}%")
print(f" • Bernoulli NB:    {acc_b*100:6.2f}%")
print(f" • Multinomial NB:  {acc_m*100:6.2f}%")
print(f" • Beta NB:         {acc_bt*100:6.2f}%")
print(f" • MoG NB:          {acc_mog*100:.2f}%")
# print(f" • KDE NB:          {acc_kde*100:6.2f}%")

Gaussian: 100%|██████████| 10000/10000 [00:01<00:00, 6637.43it/s]
Bernoulli: 100%|██████████| 10000/10000 [00:02<00:00, 4197.23it/s]
Multinomial: 100%|██████████| 10000/10000 [00:02<00:00, 4042.31it/s]
Beta: 100%|██████████| 10000/10000 [00:11<00:00, 846.13it/s]
Mix-Gaussians: 100%|██████████| 10000/10000 [00:08<00:00, 1140.71it/s]

 
 Test accuracies on CIFAR-10:
 • Gaussian NB:      29.30%
 • Bernoulli NB:     27.71%
 • Multinomial NB:   30.15%
 • Beta NB:          28.45%
 • MoG NB:          32.51%





<h1 style="font-size: 40px;">Integrate filters</h1>

# All together

In [21]:
from image_preprocessing import (
    extract_raw_pixels,
    extract_color_histogram,
    extract_hog,
    extract_lbp,
)

In [22]:
def batch_extract(fn, X):
    """Apply single‐image fn over a batch X of shape (n,H,W,C)."""
    return np.stack([fn(im) for im in X], axis=0)


x_train, y_train = get_data(1)
x_test, y_test = get_test_data()

Xraw_train = batch_extract(extract_raw_pixels, x_train)
Xraw_test = batch_extract(extract_raw_pixels, x_test)

Xhist_train = batch_extract(extract_color_histogram, x_train)
Xhist_test = batch_extract(extract_color_histogram, x_test)

Xhog_train = batch_extract(extract_hog, x_train)
Xhog_test = batch_extract(extract_hog, x_test)

Xlbp_train = batch_extract(extract_lbp, x_train)
Xlbp_test = batch_extract(extract_lbp, x_test)



eps = 1e-6

# Raw & HOG & LBP: zero‐mean, unit‐var
def fit_standard(X):
    mu   = X.mean(axis=0)
    sigma= X.std(axis=0) + eps
    return mu, sigma

def transform_standard(X, mu, sigma):
    return (X - mu) / sigma

mu_raw, sd_raw = fit_standard(Xraw_train)
Xraw_train_standard = transform_standard(Xraw_train, mu_raw, sd_raw)
Xraw_test_standard = transform_standard(Xraw_test, mu_raw, sd_raw)

mu_hog, sd_hog = fit_standard(Xhog_train)
Xhog_train_standard = transform_standard(Xhog_train, mu_hog, sd_hog)
Xhog_test_standard = transform_standard(Xhog_test, mu_hog, sd_hog)

mu_lbp, sd_lbp = fit_standard(Xlbp_train)
Xlbp_train_standard = transform_standard(Xlbp_train, mu_lbp, sd_lbp)
Xlbp_test_standard = transform_standard(Xlbp_test, mu_lbp, sd_lbp)

# Color histogram: min–max to [0,1]
def fit_minmax(X):
    lo = X.min(axis=0)
    hi = X.max(axis=0)
    return lo, hi

def transform_minmax(X, lo, hi):
    return (X - lo) / (hi - lo + eps)

lo_hist, hi_hist = fit_minmax(Xhist_train)
Xhist_train_minmax = transform_minmax(Xhist_train, lo_hist, hi_hist)
Xhist_test_minmax = transform_minmax(Xhist_test, lo_hist, hi_hist)

In [None]:
X_train_all = np.hstack([Xraw_train_standard, Xhog_train_standard, Xlbp_train_standard, Xhist_train_minmax])
X_test_all  = np.hstack([Xraw_test_standard, Xhog_test_standard, Xlbp_test_standard, Xhist_test_minmax])

# ── RUN & COMPARE ─────────────────────────────────────────────────────────

# priors
classes = np.unique(y_train)
priors = np.array([np.mean(y_train == c) for c in classes])


means_g, vars_g = train_gaussian_nb(X_train_all, y_train)
probs_b = train_bernoulli_nb(X_train_all, y_train)
cond_m, edges_m = train_multinomial_nb(X_train_all, y_train, bins=8)
alphas_bt, betas_bt = train_beta_nb(X_train_all, y_train)
gmms = train_mog_nb(X_train_all, y_train, n_components=3)
# values_k, bw_k = train_kde_nb(X_train_all, y_train)

# predictions
classes = np.unique(y_train)
n_classes = len(classes)

y_pred_g   = np.array([predict_gaussian(x, priors, means_g, vars_g)         for x in tqdm(X_test_all, desc="Gaussian")])
y_pred_b   = np.array([predict_bernoulli(x, priors, probs_b)            for x in tqdm(X_test_all, desc="Bernoulli")])
y_pred_m   = np.array([predict_multinomial(x, priors, cond_m, edges_m)          for x in tqdm(X_test_all, desc="Multinomial")])
y_pred_bt  = np.array([predict_beta(x, priors, alphas_bt, betas_bt)         for x in tqdm(X_test_all, desc="Beta")])
y_pred_mog = np.array([predict_mog_nb(x, priors, gmms)          for x in tqdm(X_test_all, desc="Mix-Gaussians")])
# y_pred_kde = np.array([predict_kde(x, priors, values_k, bw_k)           for x in tqdm(X_test_all, desc="KDE")])

# accuracies
acc_g   = np.mean(y_pred_g   == y_test)
acc_b   = np.mean(y_pred_b   == y_test)
acc_m   = np.mean(y_pred_m   == y_test)
acc_bt  = np.mean(y_pred_bt  == y_test)
acc_mog = np.mean(y_pred_mog == y_test)
# acc_kde = np.mean(y_pred_kde == y_test)

print(" \n Test accuracies on CIFAR-10:")
print(f" • Gaussian NB:     {acc_g*100:6.2f}%")
print(f" • Bernoulli NB:    {acc_b*100:6.2f}%")
print(f" • Multinomial NB:  {acc_m*100:6.2f}%")
print(f" • Beta NB:         {acc_bt*100:6.2f}%")
print(f" • MoG NB:          {acc_mog*100:.2f}%")
# print(f" • KDE NB:          {acc_kde*100:6.2f}%")

Gaussian: 100%|██████████| 10000/10000 [00:04<00:00, 2382.79it/s]
Bernoulli: 100%|██████████| 10000/10000 [00:05<00:00, 1952.63it/s]
Multinomial: 100%|██████████| 10000/10000 [00:06<00:00, 1639.62it/s]
Beta: 100%|██████████| 10000/10000 [00:28<00:00, 352.81it/s]
Mix-Gaussians: 100%|██████████| 10000/10000 [00:14<00:00, 699.71it/s]

 
 Test accuracies on CIFAR-10:
 • Gaussian NB:      23.17%
 • Bernoulli NB:     27.94%
 • Multinomial NB:   18.53%
 • Beta NB:          10.12%
 • MoG NB:          28.67%





# Vote

In [None]:
def majority_vote(preds):
    stacked = np.vstack(preds).T 
    majority = []
    for row in stacked:
        counts = {}
        for label in row:
            counts[label] = counts.get(label, 0) + 1
        majority_label = max(counts, key=counts.get)
        majority.append(majority_label)
    return np.array(majority)



means_raw, vars_raw = train_gaussian_nb(Xraw_train.reshape(len(y_train), -1), y_train)
probs_hist = train_bernoulli_nb(Xhist_train, y_train)
cond_hog, edges_hog = train_multinomial_nb(Xhog_train, y_train)
alphas_lbp, betas_lbp = train_beta_nb(Xlbp_train, y_train)


classes = np.unique(y_train)
n_classes = len(classes)

y_pred_raw = np.array([
    predict_gaussian(x.reshape(-1), priors, means_raw, vars_raw)
    for x in tqdm(Xraw_test.reshape(len(y_test), -1), desc="Raw-Gaussian")
])
y_pred_hist = np.array([
    predict_bernoulli(x, priors, probs_hist)
    for x in tqdm(Xhist_test, desc="Hist-Bernoulli")
])
y_pred_hog = np.array([
    predict_multinomial(x, priors, cond_hog, edges_hog)
    for x in tqdm(Xhog_test, desc="HOG-Multinomial")
])
y_pred_lbp = np.array([
    predict_beta(x, priors, alphas_lbp, betas_lbp)
    for x in tqdm(Xlbp_test, desc="LBP-Beta")
])



y_pred_ensemble = majority_vote([y_pred_raw, y_pred_hist, y_pred_hog, y_pred_lbp])

acc_raw      = np.mean(y_pred_raw == y_test)
acc_hist     = np.mean(y_pred_hist == y_test)
acc_hog      = np.mean(y_pred_hog == y_test)
acc_lbp      = np.mean(y_pred_lbp == y_test)
acc_ensemble = np.mean(y_pred_ensemble == y_test)

print("\nTest accuracies on CIFAR-10:")
print(f" • Raw (Gaussian NB):       {acc_raw*100:6.2f}%")
print(f" • Color Hist (Bernoulli):  {acc_hist*100:6.2f}%")
print(f" • HOG (Multinomial):       {acc_hog*100:6.2f}%")
print(f" • LBP (Beta NB):           {acc_lbp*100:6.2f}%")
print(f" • Ensemble (majority vote):{acc_ensemble*100:6.2f}%")

Raw-Gaussian: 100%|██████████| 10000/10000 [00:01<00:00, 6814.78it/s]
Hist-Bernoulli: 100%|██████████| 10000/10000 [00:03<00:00, 3128.58it/s]
HOG-Multinomial: 100%|██████████| 10000/10000 [00:00<00:00, 17391.28it/s]
LBP-Beta: 100%|██████████| 10000/10000 [00:00<00:00, 15853.47it/s]


Test accuracies on CIFAR-10:
 • Raw (Gaussian NB):        29.30%
 • Color Hist (Bernoulli):   17.11%
 • HOG (Multinomial):        44.08%
 • LBP (Beta NB):            26.27%
 • Ensemble (majority vote): 34.82%



