**Aim:** Demonstrate the text classifier using Naïve bayes classifier algorithm. 
 
**Program:** Write a program to implement the naive Bayesian classifier for a sample training 
data set stored as a `.CSV` file. Compute the accuracy of the classifier, considering few test 
data sets. 

In [1]:
import csv
import random
import math

def load_csv(filename):
    with open(filename) as file:
        fd = csv.reader(file)
        next(fd)
        return [list(map(float, row)) for row in fd]

def split_dataset(dataset, ratio):
    train_size = int(len(dataset) * ratio)
    random.shuffle(dataset)
    return dataset[:train_size], dataset[train_size:]

def separate_by_class(dataset):
    separated = {}
    for row in dataset:
        cls = row[-1]
        separated.setdefault(cls, []).append(row)
    return separated

def mean(numbers): 
    return sum(numbers) / len(numbers)

def stdev(numbers):
    avg = mean(numbers)
    variance = sum((x - avg) ** 2 for x in numbers) / (len(numbers) - 1)
    return math.sqrt(variance)

def summarize(dataset):
    columns = list(zip(*dataset))
    return [(mean(col), stdev(col)) for col in columns[:-1]]

def calculate_probability(x, mean, stdev): 
    return (1 / (math.sqrt(2 * math.pi) * stdev)) * math.exp(-((x - mean) ** 2 / (2 * stdev ** 2)))

def predict(summaries, x):
    probabilities = {}
    for cls, stats in summaries.items():
        probabilities[cls] = 1
        for i, (mean, stdev) in enumerate(stats):
            probabilities[cls] *= calculate_probability(x[i], mean, stdev)
    return max(probabilities, key=probabilities.get)

def get_accuracy(test_set, predictions):
    correct = sum(1 for i in range(len(test_set)) if test_set[i][-1] == predictions[i])
    return (correct / len(test_set)) * 100

dataset = load_csv('diabetes.csv')
train_set, test_set = split_dataset(dataset, 0.8)
print(f'Split {len(dataset)} rows into training={len(train_set)} and testing={len(test_set)} rows')

separated = separate_by_class(train_set)
summaries = {cls: summarize(rows) for cls, rows in separated.items()}

predictions = [predict(summaries, row[:-1]) for row in test_set] 
print(f'Classification Accuracy: {get_accuracy(test_set, predictions):.2f}%')

Split 768 rows into training=614 and testing=154 rows
Classification Accuracy: 75.32%


### 1. What is the purpose of the summarize function in the code?

The `summarize` function calculates the mean and standard deviation of each feature in the dataset for each class. These statistics are used to compute the probability of a data point belonging to a particular class.

### 2. Why do we need to separate the dataset by class?

Separating the dataset by class allows us to calculate statistics (like mean and standard deviation) for each feature within each class, which are then used to compute the conditional probabilities required by the Naive Bayes classifier.

### 3. How does the Naive Bayes classifier work?

The Naive Bayes classifier works by calculating the conditional probability of each class given the input features, assuming independence among features. It then selects the class with the highest probability as the predicted class.

### 4. What assumptions does the Naive Bayes classifier make?

The Naive Bayes classifier assumes that the features are conditionally independent given the class label, which is often referred to as the "naive" assumption.

### 5. Why do we use the Gaussian distribution in this implementation?

The Gaussian distribution is used because the Naive Bayes classifier assumes that the continuous features follow a normal distribution. The Gaussian function is used to calculate the likelihood of the data point given the class.