# Descrete Bayesian Classifier for Housing Prices

This notebook implements a simple Naive Bayes classifier from scratch to predict whether a house price is above or below a certain threshold (4,000,000) based on the number of bedrooms.

In [1]:
import pandas as pd

## 1. Classifier Function Definition

Here we define the core function that implements the Naive Bayes logic. It calculates the **prior**, **likelihood**, and **evidence** to determine the **posterior probability** for each class.

In [2]:
def bayesian_classifier(price, bedroom_class_1, bedroom_class_2, given_bedrooms):
    """
    Predicts the price class based on the number of bedrooms using Naive Bayes.
    """
    # Calculate the Prior probabilities of each class
    # P(Class 1) and P(Class 2)
    prior = [price.count(0) / len(price), price.count(1) / len(price)]

    # Calculate the Likelihood of observing 'given_bedrooms' for each class
    # P(Bedrooms | Class 1) and P(Bedrooms | Class 2)
    likelihoods = [bedroom_class_1.count(given_bedrooms) / len(bedroom_class_1),
                   bedroom_class_2.count(given_bedrooms) / len(bedroom_class_2)]

    # Calculate the Evidence (overall probability of the given bedrooms)
    # P(Bedrooms)
    evidence = prior[0] * likelihoods[0] + prior[1] * likelihoods[1]

    # Calculate the Posterior probability for Class 1 using Bayes' Theorem
    # P(Class 1 | Bedrooms)
    hp1 = (likelihoods[0] * prior[0]) / evidence

    # Calculate the Posterior probability for Class 2
    # P(Class 2 | Bedrooms)
    hp2 = (likelihoods[1] * prior[1]) / evidence

    # Print the components for clarity
    print(f"--- Bayes Calculation Breakdown ---")
    print(f"Prior P(<4M): {prior[0]:.3f} | P(>4M): {prior[1]:.3f}")
    print(f"Likelihood P({given_bedrooms} beds | <4M): {likelihoods[0]:.3f} | P({given_bedrooms} beds | >4M): {likelihoods[1]:.3f}")
    print(f"Evidence P({given_bedrooms} beds): {evidence:.3f}")
    print(f"Posterior P(<4M | {given_bedrooms} beds): {hp1:.3f}")
    print(f"Posterior P(>4M | {given_bedrooms} beds): {hp2:.3f}")
    print("-----------------------------------")
    
    # Compare posterior probabilities for classification
    if hp1 > hp2:
        return "Prediction: Price is less than 4,000,000"
    else:
        return "Prediction: Price is greater than 4,000,000"

## 2. Data Loading and Preprocessing

Load the housing data from a CSV file and process it into the required format.
- **Binarize Prices:** Convert the continuous price data into two classes (0 for < 4M, 1 for > 4M).
- **Segregate Bedrooms:** Separate the bedroom data based on the price class.

In [3]:
# --- Acquiring the data ---


try:
    file = pd.read_csv("Housing.csv")
    print("File loaded successfully!")
except FileNotFoundError:
    print(f"Error: The file was not found at 'Housing.csv'")
    print("Please update the 'path' variable to the correct location of your 'Housing.csv' file.")
    file = None # Set file to None to prevent errors in subsequent cells

if file is not None:
    price = list(file["price"])        # price data
    bedrooms = list(file["bedrooms"])    # data for number of bedrooms
    threshold = 4 * (10 ** 6)         # price threshold
    bedrooms_class_1 = []             # number of bedrooms in class 1 (price < threshold)
    bedrooms_class_2 = []             # number of bedrooms in class 2 (price > threshold)

    # --- Managing the data ---
    for i in range(len(price)):
        if price[i] > threshold:
            price[i] = 1 # Class 2
            bedrooms_class_2.append(bedrooms[i])
        else:
            price[i] = 0 # Class 1
            bedrooms_class_1.append(bedrooms[i])
            
    print(f"Data processed: {price.count(0)} samples in Class 1 (<4M) and {price.count(1)} samples in Class 2 (>4M).")

File loaded successfully!
Data processed: 219 samples in Class 1 (<4M) and 326 samples in Class 2 (>4M).


## 3. Making a Prediction

Now, let's use our classifier to predict the price category for a house with a specific number of bedrooms.

In [11]:
# Define the evidence: we want to predict the price for a house with 2 bedrooms.
given_bedrooms = 2

# Call the classifier function only if the file was loaded successfully.
if file is not None:
    print(f"Predicting price category for a house with {given_bedrooms} bedrooms:\n")
    prediction = bayesian_classifier(price, bedrooms_class_1, bedrooms_class_2, given_bedrooms)
    print(prediction)

Predicting price category for a house with 2 bedrooms:

--- Bayes Calculation Breakdown ---
Prior P(<4M): 0.402 | P(>4M): 0.598
Likelihood P(2 beds | <4M): 0.443 | P(2 beds | >4M): 0.120
Evidence P(2 beds): 0.250
Posterior P(<4M | 2 beds): 0.713
Posterior P(>4M | 2 beds): 0.287
-----------------------------------
Prediction: Price is less than 4,000,000
