# **DEPENDENCIES**

In [152]:
import numpy as np
import pandas as pd
import gdown

from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import StandardScaler


# **DATASET**

**Data Set Description:**
The task is to predict whether a citizen is happy to live in a city based on certain parameters of the city as rated by the citizens on a scale of 1-5 during a survey.

**Attribute Information:**

- **D (Decision/Class Attribute):**  
  - Values: 0 (unhappy) and 1 (happy)  
  - Column 1 of the file
- **X1 (Availability of Information about City Services):**  
  - Values: 1 to 5  
  - Column 2 of the file
- **X2 (Cost of Housing):**  
  - Values: 1 to 5
- **X3 (Overall Quality of Public Schools):**  
  - Values: 1 to 5
- **X4 (Trust in the Local Police):**  
  - Values: 1 to 5
- **X5 (Maintenance of Streets and Sidewalks):**  
  - Values: 1 to 5
- **X6 (Availability of Social Community Events):**  
  - Values: 1 to 5



In [None]:
gdown.download(
    "https://drive.google.com/uc?id=1QiwDpdCitGx9MWvB29Kc4LuifFBAO6OJ",
    "Xtr.txt",
)
gdown.download(
    "https://drive.google.com/uc?id=1aRKIeM2B2ZBmn-tKcylfyb6BYFHT6wwt",
    "Xte.txt",
)

In [88]:
Xtr = pd.read_csv("Xtr.txt", header=None).values[:, 0]
Xte = pd.read_csv("Xte.txt", header=None).values[:, 0]
Xtr = np.array(list(map(lambda x: x.split(","), Xtr)))
Xte = np.array(list(map(lambda x: x.split(","), Xte)))
Xtr = pd.DataFrame(Xtr[1:].astype(int), columns=Xtr[0])
Xte = pd.DataFrame(Xte[1:].astype(int), columns=Xte[0])
print(Xtr.head())
print(Xte.head())

   D  X1  X2  X3  X4  X5  X6
0  0   3   3   3   4   2   4
1  0   3   2   3   5   4   3
2  1   5   3   3   3   3   5
3  0   5   4   3   3   3   5
4  0   5   4   3   3   3   5
   D  X1  X2  X3  X4  X5  X6
0  0   5   1   4   4   4   5
1  0   5   2   2   4   4   5
2  0   5   3   5   4   5   5
3  1   3   4   4   5   1   3
4  1   5   1   5   5   5   5


# **BAYES CLASSIFIER**

**Bayes' Theorem**:
Bayes' theorem is the foundation of the Bayes Classifier. It's used to calculate the probability of a class (in this case, a category) given a set of features. The formula is as follows:

$ P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)} $

Where:
- $P(C|X)$ is the posterior probability of class $C$ given the features $X$.
- $P(X|C)$ is the likelihood, which is the probability of observing the features $X$ given class $C$.
- $P(C)$ is the prior probability of class $C$.
- $P(X)$ is the probability of observing the features $X$ regardless of the class.

**Laplacian Smoothing (Additive Smoothing)**:
Laplacian smoothing is applied to address issues with zero probabilities, which can occur when certain feature-class combinations have no training data. The formula for Laplacian smoothing is as follows:

$ P(X|C) = \frac{N_{X,C} + \alpha}{N_C + \alpha \cdot N_{\text{features}}}$

Where:
- $N_{X,C}$ is the count of times feature $X$ appears in class $C$.
- $N_C$ is the count of samples in class $C$.
- $\alpha$ is the smoothing parameter (usually set to 1).
- $N_{\text{features}}$ is the total number of features.

**Bayes Classifier Implementation**:

1. **Initialization**: The class takes the training data $X$ and corresponding labels $y$. It also allows setting the Laplacian smoothing parameter $\alpha$. The class calculates class priors and feature likelihoods during initialization.

2. **Priors**: The `calculate_prior` method computes the prior probabilities of each class, which is the probability of a data point belonging to a specific class based on the training data.

3. **Likelihoods**: The `calculate_likelihood` method calculates the likelihoods of each feature for each class, considering Laplacian smoothing. It calculates how often each feature appears in each class.

4. **Prediction**: The `predict` method takes a new data point $X$ and calculates the posterior probability for each class. It uses the likelihoods and priors to make predictions.

5. **Scoring**: The `score` method evaluates the classifier's accuracy by comparing its predictions with actual labels.

In practice, Laplacian smoothing helps ensure that even if certain feature-class combinations are absent in the training data, they won't result in zero probabilities, improving the classifier's robustness.

In [149]:
class BayesClassifier:
    def __init__(self, X, y, alpha=1, scaler=None):
        if scaler is not None:
            self.scaler = scaler
            X = self.scaler.fit_transform(X)
        else:
            self.scaler = None
        self.X = X
        self.y = y
        self.alpha = alpha
        self.classes = np.unique(y)
        self.n_classes = len(self.classes)
        self.n_features = X.shape[1]
        self.priors = self.calculate_prior()
        self.likelihoods = self.calculate_likelihood()

    def calculate_prior(self):
        priors = np.zeros(self.n_classes)
        for i, c in enumerate(self.classes):
            priors[i] = np.mean(self.y == c)
        return priors

    def calculate_likelihood(self):
        likelihoods = np.zeros((self.n_classes, self.n_features))
        for i, c in enumerate(self.classes):
            # Laplacian smoothing 
            likelihoods[i, :] = (np.sum(self.X[self.y == c], axis=0) + self.alpha) / (
                np.sum(self.y == c) + self.alpha * 2
            )
        return likelihoods

    def predict(self, X):
        if self.scaler is not None:
            X = self.scaler.transform(X)
        y_pred = np.zeros(len(X))
        for i, x in enumerate(X):
            posteriors = np.zeros(self.n_classes)
            for j, c in enumerate(self.classes):
                likelihood = np.prod(self.likelihoods[j, :][x == 1]) * np.prod(
                    1 - self.likelihoods[j, :][x == 0]
                )
                posteriors[j] = likelihood * self.priors[j]
            y_pred[i] = self.classes[np.argmax(posteriors)]
        return y_pred

    def score(self, X, y):
        return np.mean(self.predict(X) == y)

Testing

In [150]:
clf_ns = BayesClassifier(
    Xtr.iloc[:, 1:], Xtr.iloc[:, 0], scaler=None,
)
clf_ws = BayesClassifier(
    Xtr.iloc[:, 1:], Xtr.iloc[:, 0], scaler=StandardScaler(),
)
print("Score without scaling (Tr):", clf_ns.score(Xtr.iloc[:, 1:], Xtr.iloc[:, 0]))
print("Score with scaling (Tr):", clf_ws.score(Xtr.iloc[:, 1:], Xtr.iloc[:, 0]))
print("Score without scaling (Te):", clf_ns.score(Xte.iloc[:, 1:], Xte.iloc[:, 0]))
print("Score with scaling (Te):", clf_ws.score(Xte.iloc[:, 1:], Xte.iloc[:, 0]))

Score without scaling (Tr): 0.4418604651162791
Score with scaling (Tr): 0.5426356589147286
Score without scaling (Te): 0.5
Score with scaling (Te): 0.5


Comparing with sklearn.naive_bayes.GaussianNB

In [154]:
clf_sk = BernoulliNB(alpha=1)
clf_sk.fit(Xtr.iloc[:, 1:], Xtr.iloc[:, 0])
print("Training accuracy:", clf_sk.score(Xtr.iloc[:, 1:], Xtr.iloc[:, 0]))
print("Test accuracy:", clf_sk.score(Xte.iloc[:, 1:], Xte.iloc[:, 0]))

Training accuracy: 0.5426356589147286
Test accuracy: 0.5
