# WEEK -08: NAÏVE BAYES CLASSIFIER

1. Implement in python of the following problems using Bayes Theorem.
a) Of the students in the college, 60% of the students reside in the hostel and 40% of the students are day
scholars. Previous year results report that 30% of all students who stay in the hostel scored A Grade and 20%
of day scholars scored A grade. At the end of the year, one student is chosen at random and found that he/she
has an A grade. What is the probability that the student is a hosteler?

In [41]:
P_A_given_hosteler = 0.30  # Probability of scoring A grade given a student is a hosteler
P_hosteler = 0.60  # Probability that a student is a hosteler
P_day_scholar = 0.40  # Probability that a student is a day scholar

P_A_grade = (P_A_given_hosteler * P_hosteler) + (0.20 * P_day_scholar)

P_hosteler_given_A_grade = (P_A_given_hosteler * P_hosteler) / P_A_grade

print("The probability that the student is a hosteler given an A grade is={:.2f}".format(P_hosteler_given_A_grade))


The probability that the student is a hosteler given an A grade is=0.69


b) Suppose you're testing for a rare disease, and you have the following information:
• The disease has a prevalence of 0.01 (1% of the population has the disease).
• The test is not perfect:
• The test correctly identifies the disease (true positive) 99% of the time (sensitivity).
• The test incorrectly indicates the disease (false positive) 2% of the time (1 - specificity).
Calculate the probability of having the disease given a positive test result using Bayes' theorem.

In [42]:
P_disease = 0.01  # Prevalence of the disease
P_positive_given_disease = 0.99  # Sensitivity (true positive rate)
P_positive_given_no_disease = 0.02  # False positive rate

# probability of not having the disease
P_no_disease = 1 - P_disease

#  overall probability of testing positive
P_positive = (P_positive_given_disease * P_disease) + (P_positive_given_no_disease * P_no_disease)

# Bayes' theorem
P_disease_given_positive = (P_positive_given_disease * P_disease) / P_positive

# Print the result

print("The probability of having the disease given a positive test result is={:.2f}".format(P_disease_given_positive))

The probability of having the disease given a positive test result is=0.33


2. Write a program to implement the naïve Bayesian classifier without using scikit-learn library for the
following sample training data set stored as a .CSV file. Calculate the accuracy, precision, and recall for your
train/test data set. To classify ‘If the weather is sunny, then the Player should play or not’?

In [44]:
import csv
import math

# Step 1: Load and preprocess the dataset
def load_dataset(file_path):
    with open(file_path, 'r') as file:
        reader = csv.DictReader(file)
        data = [row for row in reader]
    return data

def preprocess_data(data):
    outlook_counts = {}
    play_counts = {}
    total_count = len(data)

    for row in data:
        outlook = row['outlook']
        play = row['play']
        if outlook not in outlook_counts:
            outlook_counts[outlook] = 0
        if play not in play_counts:
            play_counts[play] = 0
        outlook_counts[outlook] += 1
        play_counts[play] += 1

    return outlook_counts, play_counts, total_count

# Step 2: Build the Naïve Bayesian classifier
def calculate_probabilities(outlook_counts, play_counts, total_count):
    outlook_probs = {}
    play_probs = {}

    for outlook, count in outlook_counts.items():
        outlook_probs[outlook] = count / total_count

    for play, count in play_counts.items():
        play_probs[play] = count / total_count

    return outlook_probs, play_probs

# Step 3: Test the classifier on a test case
def classify(outlook_probs, play_probs, test_outlook):
    play_yes_prob = play_probs['Yes']
    play_no_prob = play_probs['No']
    outlook_prob = outlook_probs[test_outlook]

    play_given_outlook_yes = play_yes_prob * outlook_prob
    play_given_outlook_no = play_no_prob * outlook_prob

    return 'Yes' if play_given_outlook_yes > play_given_outlook_no else 'No'

# Step 4: Calculate accuracy, precision, and recall
def evaluate_classifier(data, outlook_probs, play_probs):
    true_positive = true_negative = false_positive = false_negative = 0

    for row in data:
        predicted_play = classify(outlook_probs, play_probs, row['outlook'])
        actual_play = row['play']

        if predicted_play == 'Yes' and actual_play == 'Yes':
            true_positive += 1
        elif predicted_play == 'No' and actual_play == 'No':
            true_negative += 1
        elif predicted_play == 'Yes' and actual_play == 'No':
            false_positive += 1
        elif predicted_play == 'No' and actual_play == 'Yes':
            false_negative += 1

    accuracy = (true_positive + true_negative) / len(data)
    precision = true_positive / (true_positive + false_positive)
    recall = true_positive / (true_positive + false_negative)

    return accuracy, precision, recall

# Step 5: Display probabilities and decide the best class
def display_probabilities(test_outlook, play_probs):
    play_prob = play_probs['Yes'] if test_outlook == 'Sunny' else play_probs['No']
    no_play_prob = 1 - play_prob

    print(f"Probability of Player should play: {play_prob}")
    print(f"Probability of Player shouldn't play: {no_play_prob}")

def decide_best_class(play_probs):
    return 'Yes' if play_probs['Yes'] > play_probs['No'] else 'No'

# Load and preprocess data
data = load_dataset('nbc.csv')
outlook_counts, play_counts, total_count = preprocess_data(data)

# Calculate probabilities
outlook_probs, play_probs = calculate_probabilities(outlook_counts, play_counts, total_count)

# Evaluate the classifier
accuracy, precision, recall = evaluate_classifier(data, outlook_probs, play_probs)

# Print results
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")

# Test on a new case: If the weather is sunny, then the Player should play or not
test_outlook = 'Sunny'
display_probabilities(test_outlook, play_probs)

# Decide the best class
best_class = decide_best_class(play_probs)
print(f"The best class is: {best_class}")


Accuracy: 0.7142857142857143
Precision: 0.7142857142857143
Recall: 1.0
Probability of Player should play: 0.7142857142857143
Probability of Player shouldn't play: 0.2857142857142857
The best class is: Yes
