# Bayesian Classifier for Housing Prices

This notebook implements a two-class Bayesian classifier from scratch. It predicts whether a house price is above or below **4,000,000** based on two discrete features: the **number of bedrooms** and **parking spots**.

The classifier calculates the **joint likelihood**—the probability of observing a specific combination of bedrooms and parking spots together—by directly counting its occurrences in each price category.

In [1]:
import pandas as pd
import math

## 1. Classifier Function Definition

This function contains the core logic. It takes the processed data and a new data point, then calculates the prior, likelihood, evidence, and posterior probabilities for the two price classes.

In [2]:
def bayesian_classifier(price, collected_data, given_data):
    """Implements the Bayesian classification logic for two classes."""
    # Calculate the Prior probability of each class based on its frequency
    prior = [price.count(0) / len(price), price.count(1) / len(price)]

    # Count how many times the exact given_data combination appears in each class
    temp = [0, 0]
    for i in range(len(collected_data[0][0])): # Class 1 (<4M)
        if collected_data[0][0][i] == given_data[0] and collected_data[1][0][i] == given_data[1]:
            temp[0] += 1
    for i in range(len(collected_data[0][1])): # Class 2 (>4M)
        if collected_data[0][1][i] == given_data[0] and collected_data[1][1][i] == given_data[1]:
            temp[1] += 1

    # Calculate the Likelihood for each class: P(Data | Class)
    likelihoods = [temp[0] / len(collected_data[0][0]),
                   temp[1] / len(collected_data[0][1])]
                   
    # Calculate the Evidence: P(Data)
    evidence = prior[0] * likelihoods[0] + prior[1] * likelihoods[1]
    
    # Calculate the final Posterior probabilities: P(Class | Data)
    # Handle division by zero if evidence is 0 (combination never seen before)
    if evidence == 0:
        hp = [0.0, 0.0]
    else:
        hp = [(likelihoods[0] * prior[0]) / evidence, (likelihoods[1] * prior[1]) / evidence]

    # Print the calculation breakdown
    print("--- Bayes Calculation Breakdown ---")
    print(f"Prior: {prior}")
    print(f"Likelihoods: {likelihoods}")
    print(f"Evidence: {evidence}")
    print(f"Posterior Probabilities: \n\tClass 1 (<4M): {hp[0]}\n\tClass 2 (>4M): {hp[1]}")
    print("-----------------------------------")
          
    return hp

## 2. Data Loading and Preprocessing

Here, we load the `Housing.csv` dataset and categorize each house into one of two price classes. We then segregate the feature data (`bedrooms`, `parking`) based on these classes.

In [3]:
# Set the path to your CSV file.
# For best results, place 'Housing.csv' in the same folder as this notebook.
path = r"Housing.csv"

try:
    file = pd.read_csv(path)
    print(f"File '{path}' loaded successfully!")
    
    # Extract data into lists
    price = list(file["price"])
    bedrooms_data = list(file["bedrooms"])
    parking_data = list(file["parking"])
    
    # Set the price threshold to 4,000,000
    threshold = 4 * (10 ** 6)
    
    # Initialize lists to hold segregated feature data
    bedrooms = [[], []]
    parking = [[], []]

    # Loop through each record to categorize it
    for i in range(len(price)):
        if price[i] < threshold: # Class 1: price < 4M
            price[i] = 0
            bedrooms[0].append(bedrooms_data[i])
            parking[0].append(parking_data[i])
        else: # Class 2: price >= 4M
            price[i] = 1
            bedrooms[1].append(bedrooms_data[i])
            parking[1].append(parking_data[i])

    # Combine into the final structure for the classifier
    collected_data = [bedrooms, parking]
    print(f"Data processed: {len(bedrooms[0])} samples in Class 1 (<4M) and {len(bedrooms[1])} samples in Class 2 (>4M).")

except FileNotFoundError:
    print(f"Error: The file was not found at '{path}'")
    print("Please make sure the 'Housing.csv' file is in the same directory as the notebook, or update the 'path' variable.")
    collected_data = None

File 'Housing.csv' loaded successfully!
Data processed: 219 samples in Class 1 (<4M) and 326 samples in Class 2 (>4M).


## 3. Making a Prediction

Finally, we define a new data point (a house with a specific number of bedrooms and parking spots) and use our classifier to predict its price category. We also calculate the probability of error for this prediction.

In [4]:
if collected_data is not None:
    # Define the new data point we want to classify
    given_bedrooms = 4
    given_parking = 2
    given_data = [given_bedrooms, given_parking]
    print(f"\nPredicting for a house with {given_bedrooms} bedrooms and {given_parking} parking spots...\n")
    
    # Call the classifier to get the posterior probabilities
    result = bayesian_classifier(price, collected_data, given_data)

    # --- Make a Decision and Calculate Error ---
    # Compare the final probabilities to make a classification decision
    if not any(result):
        print("This combination of features has never been seen before. Cannot make a prediction.")
    elif result[0] > result[1]:
        print("\nPrediction: Price is less than 4,000,000")
        # The probability of error is 1 minus the probability of the chosen class
        print(f"Probability of error: {1 - result[0]:.4f}")
    else:
        print("\nPrediction: Price is greater than 4,000,000")
        # The probability of error represents the uncertainty in the decision
        print(f"Probability of error: {1 - result[1]:.4f}")


Predicting for a house with 4 bedrooms and 2 parking spots...

--- Bayes Calculation Breakdown ---
Prior: [0.4018348623853211, 0.5981651376146789]
Likelihoods: [0.0091324200913242, 0.0705521472392638]
Evidence: 0.045871559633027525
Posterior Probabilities: 
	Class 1 (<4M): 0.08
	Class 2 (>4M): 0.92
-----------------------------------

Prediction: Price is greater than 4,000,000
Probability of error: 0.0800
