# Multi-Class Bayesian Classifier for Housing Prices

This notebook implements a Bayesian classifier from scratch for a **multi-class** (3 price categories) problem using two discrete features: **number of bedrooms** and **parking spots**.

A key aspect of this implementation is that it calculates the **joint likelihood**—the probability of observing a specific combination of bedrooms and parking spots together—by directly counting occurrences within each class.

In [1]:
import pandas as pd
import math

## 1. Classifier Function Definition

This function contains the core logic. It takes the processed data and a new data point, then calculates the prior, likelihood, evidence, and posterior probabilities for all three classes.

In [2]:
def bayesian_classifier(price, collected_data, given_data):
    """Implements the Bayesian classification logic for three classes."""
    # Calculate the Prior probability of each class based on its frequency
    prior = [price.count(0) / len(price), price.count(1) / len(price), price.count(2) / len(price)]

    # Count how many times the exact given_data combination appears in each class
    temp = [0, 0, 0]
    for i in range(len(collected_data[0][0])): # Class 1
        if collected_data[0][0][i] == given_data[0] and collected_data[1][0][i] == given_data[1]:
            temp[0] += 1
    for i in range(len(collected_data[0][1])): # Class 2
        if collected_data[0][1][i] == given_data[0] and collected_data[1][1][i] == given_data[1]:
            temp[1] += 1
    for i in range(len(collected_data[0][2])): # Class 3
        if collected_data[0][2][i] == given_data[0] and collected_data[1][2][i] == given_data[1]:
            temp[2] += 1

    # Calculate the Likelihood for each class: P(Data | Class)
    likelihood = [temp[0] / len(collected_data[0][0]), temp[1] / len(collected_data[0][1]), temp[2] / len(collected_data[0][2])]
    
    # Calculate the Evidence: P(Data)
    evidence = prior[0] * likelihood[0] + prior[1] * likelihood[1] + prior[2] * likelihood[2]
    
    # Calculate the Posterior probabilities for each class: P(Class | Data)
    # Handle division by zero case if evidence is 0 (combination never seen before)
    if evidence == 0:
        hp = [0.0, 0.0, 0.0]
    else:
        hp = [(likelihood[0] * prior[0]) / evidence, (likelihood[1] * prior[1]) / evidence,
              (likelihood[2] * prior[2]) / evidence]

    # Print the calculation breakdown
    print("--- Bayes Calculation Breakdown ---")
    print(f"Prior: {prior}")
    print(f"Likelihoods: {likelihood}")
    print(f"Evidence: {evidence}")
    print(f"Posterior Probabilities:\n\tClass 1 (<4M): {hp[0]}\n\tClass 2 (4-8M): {hp[1]}\n\tClass 3 (>8M): {hp[2]}")
    print("-----------------------------------")
    
    return hp

## 2. Data Loading and Preprocessing

Here, we load the `Housing.csv` dataset and categorize each house into one of three price classes. We then segregate the feature data (`bedrooms`, `parking`) based on these classes.

In [3]:
# Set the path to your CSV file.
# For best results, place 'Housing.csv' in the same folder as this notebook.
path = r"Housing.csv"

try:
    file = pd.read_csv(path)
    print(f"File '{path}' loaded successfully!")
    
    # Extract data into lists
    price = list(file["price"]) 
    bedrooms_data = list(file["bedrooms"])
    parking_data = list(file["parking"])
    
    # Set thresholds to define three price classes
    thresholds = [4 * (10 ** 6), 8 * (10 ** 6)]
    
    # Initialize lists to hold segregated feature data
    bedrooms, parking = [[], [], []], [[], [], []]

    # Loop through each record to categorize it
    for i in range(len(price)):
        if price[i] < thresholds[0]:        # Class 1: price < 4M
            price[i] = 0
            bedrooms[0].append(bedrooms_data[i])
            parking[0].append(parking_data[i])
        elif price[i] < thresholds[1]:      # Class 2: 4M <= price < 8M
            price[i] = 1
            bedrooms[1].append(bedrooms_data[i])
            parking[1].append(parking_data[i])
        else:                               # Class 3: price >= 8M
            price[i] = 2
            bedrooms[2].append(bedrooms_data[i])
            parking[2].append(parking_data[i])

    # Combine into the final structure for the classifier
    collected_data = [bedrooms, parking]
    print(f"Data processed into {len(bedrooms[0])} samples for Class 1, {len(bedrooms[1])} for Class 2, and {len(bedrooms[2])} for Class 3.")

except FileNotFoundError:
    print(f"Error: The file was not found at '{path}'")
    print("Please make sure the 'Housing.csv' file is in the same directory as the notebook, or update the 'path' variable.")
    collected_data = None

File 'Housing.csv' loaded successfully!
Data processed into 219 samples for Class 1, 289 for Class 2, and 37 for Class 3.


## 3. Making a Prediction

Finally, we define a new data point (a house with a specific number of bedrooms and parking spots) and use our classifier to predict its price category. We also calculate the probability of error for this prediction.

In [4]:
if collected_data is not None:
    # Define the new data point we want to classify
    given_bedrooms = 4 
    given_parking = 2 
    given_data = [given_bedrooms, given_parking]
    print(f"\nPredicting for a house with {given_bedrooms} bedrooms and {given_parking} parking spots...\n")
    
    # Call the classifier to get the posterior probabilities
    result = bayesian_classifier(price, collected_data, given_data)

    # --- Make a Decision ---
    # Find the class with the highest posterior probability
    if not any(result):
        print("This combination of features has never been seen before. Cannot make a prediction.")
    elif max(result) == result[0]:
        print("\nPrediction: Price is less than 40,00,000")
    elif max(result) == result[1]:
        print("\nPrediction: Price is between 40,00,000 and 80,00,000")
    else:
        print("\nPrediction: Price is greater than 80,00,000")
    
    # Calculate the probability of error
    # This is 1 minus the probability of the most likely class
    p_error = 1 - max(result)
    print(f"Probability of error: {p_error:.4f}")


Predicting for a house with 4 bedrooms and 2 parking spots...

--- Bayes Calculation Breakdown ---
Prior: [0.4018348623853211, 0.5302752293577981, 0.06788990825688074]
Likelihoods: [0.0091324200913242, 0.05190311418685121, 0.21621621621621623]
Evidence: 0.045871559633027525
Posterior Probabilities:
	Class 1 (<4M): 0.08
	Class 2 (4-8M): 0.6
	Class 3 (>8M): 0.32
-----------------------------------

Prediction: Price is between 40,00,000 and 80,00,000
Probability of error: 0.4000
