
**NB from the scratch**


This code defines a class called `GaussianNaiveBayes`, which implements Gaussian Naive Bayes classification. Here's a breakdown of the code's structure and functionality:

1. The code starts with importing the necessary dependencies, specifically the `numpy` library for mathematical operations.

2. The `GaussianNaiveBayes` class is defined.

3. The `fit` method is responsible for training the Naive Bayes classifier. It takes two parameters: `X`, the input features, and `y`, the corresponding class labels. The method calculates the mean, variance, and prior probability for each class.

4. Inside the `fit` method, the number of samples (`n_samples`) and number of features (`n_features`) are extracted from the input data `X`. The unique class labels (`self._classes`) are determined using `np.unique(y)`, and the number of classes (`n_classes`) is computed.

5. Empty arrays for storing the mean (`self._mean`), variance (`self._var`), and prior probabilities (`self._priors`) are initialized with zeros.

6. The code then enters a loop to calculate the mean, variance, and prior probability for each class. It iterates over the unique class labels using `enumerate(self._classes)`.

7. Inside the loop, `X_for_class_c` is assigned the subset of data `X` that belongs to the current class `c`. The mean of each feature is computed using `X_for_class_c.mean(axis=0)` and stored in the `self._mean` array at the corresponding class index `i`. The variance of each feature is computed using `X_for_class_c.var(axis=



```

```
```python

This code defines a `GaussianNaiveBayes` class that implements
Gaussian Naive Bayes classification algorithm.
 Here 's' a breakdown of what each section of the code does:

- Lines 1-3: Importing the necessary libraries (`numpy`).

import numpy as np

- Lines 5-37: Definition of the `GaussianNaiveBayes` class and its methods.

class GaussianNaiveBayes:

  - `fit` method:

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)

        - Line 8: Receives the training data `X` and corresponding labels `y`.

        n_classes = len(self._classes)

        - Line 9: Computes the number of samples and number of features in the data.

        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)

        - Line 10: Determines the unique classes in the labels.

        self._var = np.zeros((n_classes, n_features), dtype=np.float64)

          - Line 11: Calculates the number of classes.


        self._priors = np.zeros(n_classes, dtype=np.float64)


          - Lines 12-14: Initializes arrays to store the mean, variance, and priors for each class.


        # calculating the mean, variance, and prior P(H) for each class
        for i, c in enumerate(self._classes):
            X_for_class_c = X[y==c]
            self._mean[i, :] = X_for_class_c.mean(axis=0)
            self._var[i, :] = X_for_class_c.var(axis=0)
            self._priors[i] = X_for_class_c.shape[0] / float(n_samples)


  - Lines 16-21: Iterates over each class and calculates the mean, variance, and prior probabilities by extracting the relevant subset of data for that class.


    def _calculate_likelihood(self, class_idx, x):
        mean = self._mean[class_idx]
        var = self._var[class_idx]

        - Line 23: Calculates the likelihood of a feature value given a class by using the Gaussian probability density function.

        num = np.exp(- (x - mean)**2 / (2 * var))  # numerator
        denom = np.sqrt(2 * np.pi * var)  # denominator
        return num / denom


    - Line 26: Calculates the likelihoods for all features and returns the result.

    - `predict` method:
    def predict(self, X):
        y_pred = [self._classify_sample(x) for x in X]
        return np.array(y_pred)

    - Line 30: Receives the test data `X`.

    - Line 31: Calls the `_classify_sample` method for each sample in `X` and stores the predictions.


  - `_classify_sample` method:

    def _classify_sample(self, x):

       - Line 32: Returns the predictions as a NumPy array.

        posteriors = []

        # calculating posterior probability for each class
        for i, c in enumerate(self._classes):
            prior = np.log(self._priors[i])

            - Line 36: Receives a sample `x` to classify.

            posterior = np.sum(np.log(self._calculate_likelihood(i, x)))

            - Line 37: Initializes an empty list to store the posterior probabilities for each class.

            posterior = prior + posterior
            posteriors.append(posterior)


        - Lines 39-47: Iterates over each
        # return the class with the highest posterior probability
        return self._classes[np.argmax(posteriors)]
```























In [4]:
import numpy as np

class GaussianNaiveBayes:

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)
        self._priors =  np.zeros(n_classes, dtype=np.float64)

        # calculating the mean, variance and prior P(H) for each class
        for i, c in enumerate(self._classes):
            X_for_class_c = X[y==c]
            self._mean[i, :] = X_for_class_c.mean(axis=0)
            self._var[i, :] = X_for_class_c.var(axis=0)
            self._priors[i] = X_for_class_c.shape[0] / float(n_samples)

    def _calculate_likelihood(self, class_idx, x):
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        num = np.exp(- (x-mean)**2 / (2 * var)) #numerator
        denom = np.sqrt(2 * np.pi * var) #denominator
        return num / denom

    def predict(self, X):
         y_pred = [self._classify_sample(x) for x in X]
         return np.array(y_pred)

    def _classify_sample(self, x):
         posteriors = []
         # calculating posterior probability for each class
         for i, c in enumerate(self._classes):
             prior = np.log(self._priors[i])
             posterior = np.sum(np.log(self._calculate_likelihood(i, x)))
             posterior = prior + posterior
             posteriors.append(posterior)
         # return the class with highest posterior probability
         return self._classes[np.argmax(posteriors)]



This code snippet performs the following tasks:

1. Import necessary libraries and modules:
   - `train_test_split` from `sklearn.model_selection`: Used to split the dataset into training and testing sets.
   - `accuracy_score` from `sklearn.metrics`: Used to evaluate the accuracy of the predictions.
   - `datasets` from `sklearn`: Provides synthetic datasets for testing purposes.
   - `time`: Used to measure the execution time.

2. Generate a synthetic dataset:
   - The `make_classification` function from `datasets` is used to create a synthetic dataset.
   - It generates 1000 samples with 20 features and 2 classes.
   - The `random_state` parameter is set to 42 for reproducibility.

3. Split the dataset into training and testing sets:
   - The `train_test_split` function is used to split the dataset into training and testing sets.
   - The testing set size is set to 25% of the whole dataset.
   - The `random_state` parameter is set to 42 for reproducibility.

4. Record the starting time:
   - The `time.perf_counter()` function is used to measure the current time.

5. Create and train a Gaussian Naive Bayes classifier:
   - An instance of the `GaussianNaiveBayes` classifier is created.
   - The `fit` method is called to train the classifier on the training data.

6. Make predictions on the testing data:
   - The `predict` method is called to make predictions on the testing data.

7. Record the ending time:
   - The `time.perf_counter()` function is used again to measure the current time.

8. Print the accuracy of the predictions:
   - The `accuracy_score` function is used to calculate the accuracy of the predicted labels compared to the true labels.
   - The accuracy score is printed to the console.

9. Print the time taken to train and predict:
   - The difference between the ending time and starting time is calculated to measure the execution time.
   - The execution time is printed to the console.

The code demonstrates the process of training a Gaussian Naive Bayes classifier on a synthetic dataset and evaluating its accuracy on unseen data. The timing measurements provide insights into the efficiency of the training and prediction processes.

 The provided code performs a classification task using the Naive Bayes algorithm from the scikit-learn library. Let's go through it step by step:

```python
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets
import time
```

The code begins by importing necessary libraries and modules. `train_test_split` is imported from `sklearn.model_selection` for splitting the dataset into training and testing sets. `accuracy_score` from `sklearn.metrics` is imported to evaluate the accuracy of the predictions. The `datasets` module from scikit-learn is imported to generate a synthetic dataset for classification. The `time` module is imported to measure the execution time of the algorithm.

```python
X, y = datasets.make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
```

The code uses the `make_classification` function from `sklearn.datasets` to generate a synthetic dataset for classification. It creates 1000 samples with 20 features and 2 classes. The `random_state` parameter is set to 42 to ensure reproducibility.

```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
```

The dataset is split into training and testing sets using the `train_test_split` function. 75% of the data is used for training (`X_train` and `y_train`), and 25% is used for testing (`X_test` and `y_test`). Again, the `random_state` parameter is set to 42 for reproducibility.

```python
start = time.perf_counter()
```

The current time is recorded using `time.perf_counter()` to measure the execution time of the algorithm.

```python
nb = GaussianNaiveBayes()
nb.fit(X_train, y_train)
```

An instance of the Gaussian Naive Bayes classifier is created using `GaussianNaiveBayes()`. Note that the code provided doesn't include the import statement for `GaussianNaiveBayes`, so assuming it has been imported correctly. The classifier is then trained on the training data using the `fit` method.

```python
predictions = nb.predict(X_test)
```

The trained classifier is used to make predictions on the testing set (`X_test`) using the `predict` method.

```python
end = time.perf_counter()
```

The current time is recorded again to calculate the execution time of the algorithm.

```python
print(f"NumPy Naive Bayes accuracy: {accuracy_score(y_test, predictions)}")
```

The accuracy of the Naive Bayes classifier is calculated by comparing the predicted labels (`predictions`) with the true labels (`y_test`). The `accuracy_score` function is used for this calculation, and the result is printed.

```python
print(f'Finished in {round(end-start, 3)} second(s)')
```

The total execution time of the algorithm is calculated by subtracting the start time from the end time. It is then printed to the console, rounded to three decimal places.

Overall, this code generates a synthetic dataset, splits it into training and testing sets, trains a Gaussian Naive Bayes classifier on the training data, makes predictions on the testing data, calculates and prints the accuracy of the classifier, and measures and prints the execution time of the algorithm. It's a simple example to demonstrate the basic usage of the Naive Bayes classifier for classification tasks.



In [3]:
# Import necessary libraries and modules

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets
import time
# Generate a synthetic dataset with 1000 samples, 20 features, and 2 classes

X, y = datasets.make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the dataset into training and testing sets (75% for training, 25% for testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Record the starting time

start = time.perf_counter()

# Create an instance of the Gaussian Naive Bayes classifier

nb = GaussianNaiveBayes()

# Train the classifier on the training data

nb.fit(X_train, y_train)

# Make predictions on the testing data

predictions = nb.predict(X_test)

# Record the ending time

end = time.perf_counter()

# Print the accuracy of the predictions using the testing labels

print(f"NumPy Naive Bayes accuracy: {accuracy_score(y_test, predictions)}")

# Print the time taken to train and predict in seconds

print(f'Finished in {round(end-start, 3)} second(s)')


NumPy Naive Bayes accuracy: 0.796
Finished in 0.013 second(s)


***We use the sklearn library for Python***

Here's a markdown-formatted explanation of the provided code:

```python
from sklearn.naive_bayes import GaussianNB
start = time.perf_counter()
sk_nb = GaussianNB()
sk_nb.fit(X_train, y_train)
sk_predictions = sk_nb.predict(X_test)
end = time.perf_counter()
print(f"scikit-learn Naive Bayes accuracy: {accuracy_score(y_test, sk_predictions)}")
print(f'Finished in {round(end-start, 3)} second(s)')
```

This code snippet demonstrates the usage of scikit-learn's Gaussian Naive Bayes classifier (`GaussianNB`). Let's break it down step by step:

1. The code begins by importing the `GaussianNB` class from the `sklearn.naive_bayes` module. This class represents the Gaussian Naive Bayes classifier, which is a probabilistic machine learning algorithm based on Bayes' theorem.

3. The `start` variable is assigned the current time using the `time.perf_counter()` function. This is used to measure the execution time of the code.

4. An instance of the `GaussianNB` class is created and assigned to the variable `sk_nb`. This will be our Naive Bayes classifier.

5. The `fit()` method is called on the `sk_nb` object, with `X_train` and `y_train` as its arguments. This trains the classifier on the provided training data `X_train` (features) and `y_train` (labels). The classifier learns the underlying probability distribution of the data.

6. The `predict()` method is called on the trained `sk_nb` classifier, passing `X_test` as its argument. This predicts the labels for the test data `X_test` based on the learned probability distribution.

7. The predicted labels are assigned to the variable `sk_predictions`.

8. The `end` variable is assigned the current time using `time.perf_counter()`. This marks the end of the code execution.

9. The code prints two lines of output:
   - The first line displays the accuracy of the predictions made by the scikit-learn Naive Bayes classifier. It uses the `accuracy_score()` function, which calculates the accuracy by comparing the predicted labels (`sk_predictions`) with the true labels (`y_test`).
   - The second line displays the total time taken to execute the code, calculated by subtracting the `start` time from the `end` time and rounding the result to three decimal places.

This code is useful for classification tasks where the underlying data is assumed to follow a Gaussian distribution. It can be applied to a wide range of problems, such as spam detection, sentiment analysis, or medical diagnosis. By using scikit-learn's implementation of the Gaussian Naive Bayes classifier, developers can easily train and evaluate the model without having to implement the algorithm from scratch.

In [5]:
from sklearn.naive_bayes import GaussianNB
start = time.perf_counter()
sk_nb = GaussianNB()
sk_nb.fit(X_train, y_train)
sk_predictions = sk_nb.predict(X_test)
end = time.perf_counter()
print(f"scikit-learn Naive Bayes accuracy: {accuracy_score(y_test, sk_predictions)}")
print(f'Finished in {round(end-start, 3)} second(s)')

scikit-learn Naive Bayes accuracy: 0.796
Finished in 0.003 second(s)
