#                                                   HOUSE PRICE PREDICTION

## Frameworks Used

    Pandas
    Numpy
    Scikitlearn
    Matplotlib
    pickle module
    time module
    sweetviz module
    ipywidgets
    Geopy

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import time

In [4]:
from sklearn.datasets import fetch_california_housing            #Data
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.preprocessing import LabelEncoder 

In [5]:
try:
    import sweetviz as sv
    SWEETVIZ_LOADED = True
except ImportError:
    print("Sweetviz not installed. Skipping EDA report generation.")
    print("Install using: pip install sweetviz")
    SWEETVIZ_LOADED = False

In [6]:
try:
    from geopy.geocoders import Nominatim
    from geopy.exc import GeocoderTimedOut, GeocoderServiceError
    GEOPY_LOADED = True
except ImportError:
    print("Geopy not installed. Geocoding feature engineering will be skipped unless location_cache.pickle exists.")
    print("Install using: pip install geopy")
    GEOPY_LOADED = False

In [7]:
print("Fetching California housing dataset...")
data = fetch_california_housing()
print("Dataset fetched.")

Fetching California housing dataset...
Dataset fetched.


In [8]:
print(data.DESCR)  # Description of the data

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [9]:
#Independent Data

df = pd.DataFrame(data.data, columns=data.feature_names)

In [10]:
# Dependent data
df['Target'] = data.target # Target is median house value in $100,000s

In [11]:
print("Initial DataFrame head:")
print(df.head())
print("\nInitial DataFrame info:")
df.info()

Initial DataFrame head:
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  Target  
0    -122.23   4.526  
1    -122.22   3.585  
2    -122.24   3.521  
3    -122.25   3.413  
4    -122.25   3.422  

Initial DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MedInc      20640 non-null  float64
 1   HouseAge    20640 non-null  float64
 2   AveRooms    20640 non-null  float64
 3   AveBedrms   20640 non-null  float64
 4

## Exploratory Data Analysis



Sweetviz is an open-source pandas-based library to perform the primary EDA task without much hassle or with just two lines of code. It also generates a summarised report with great visualizations.

In [13]:
if SWEETVIZ_LOADED:
    print("\nGenerating Sweetviz EDA report (this may take a moment)...")
    try:
        report = sv.analyze(df)
        report_path = "./california_housing_report.html"
        report.show_html(report_path) # Save in the form of html
        print(f"Sweetviz report saved to {report_path}")
    except Exception as e:
        print(f"Could not generate Sweetviz report: {e}")
else:
    print("\nSkipping Sweetviz EDA report generation.")



Generating Sweetviz EDA report (this may take a moment)...


                                             |      | [  0%]   00:00 -> (? left)

Report ./california_housing_report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
Sweetviz report saved to ./california_housing_report.html


## Feature Engineering  

In [15]:
# Set to True to run the time-consuming geocoding process.
# Set to False to attempt loading pre-computed data from 'location_cache.pickle'.
RUN_GEOCODING = False
LOCATION_CACHE_FILE = "location_cache.pickle" # File to save/load geocoding results

# About Pickle Module 

`geopy` is a Python library that allows you to convert addresses to coordinates and converts coordinates to addresses)

user_agent='house_predictor' is required because Nominatim requires a user-agent to identify the source of requests.
	• The user_agent can be any unique string

The pickle module is used to serialize (save) and deserialize (load) Python objects. This is helpful when you need to:
    	• Store location data for future use instead of repeatedly making API calls.
    	• Cache results to speed up processing.
    	• Save and load complex Python objects (like dictionaries or lists) for later use.
     
If you want to save the loc.update dictionary (which stores locations) and reload it later, you can use pickle.

## Why Use pickle Here?

	Avoid unnecessary API calls: If location data is already saved, you can load it instead of making another API request.
 
	Persistent storage: Data is saved even after the program exits.
 
	Fast retrieval: Loading from a file is faster than querying an external API.


`pickle.dump(obj, file)` → Saves (serializes) an object to a file.

`pickle.load(file)` → Loads (deserializes) an object from a file.


`wb:` Write Binary (Used for saving objects)
`rb:`Read Binary (Used for loading objects)


In [19]:
# Geocoding Setup (Needed if RUN_GEOCODING is True) 
if GEOPY_LOADED:
    geolocator = Nominatim(user_agent='house_predictor_agent_v1', timeout=10) # Added timeout
else:
    geolocator = None

In [20]:
# geolocator.reverse("37.88" + "," + "-122.23")[0]

# geolocator.reverse("37.88" + "," + "-122.23").raw['address']

# geolocator.reverse("37.88" + "," + "-122.23")

#reverse Geocoding (Coordinates to Address)

In [21]:
def get_location(coords, cache, geolocator_instance):
    #Fetches location data (road, county) for coordinates, using a cache.
    
    if not geolocator_instance:
        print("Geopy not available, cannot fetch location.")
        return {"road": None, "county": None}

    lat, lon = map(str, coords)
    coord_tuple = (lat, lon) # Use tuple as dict key

    # Check cache first
    if coord_tuple in cache:
        # print(f"Cache hit for {coord_tuple}")
        return cache[coord_tuple]

     # Fetch from API if not in cache
    print(f"Cache miss for {coord_tuple}. Querying API...")
    try:
        # Use addressdetails=True if available and needed for specific fields
        location = geolocator_instance.reverse(coord_tuple, exactly_one=True, language='en')
        address_data = location.raw.get('address', {}) if location else {}

        road = address_data.get('road', None)
        county = address_data.get('county', None)

        loc_data = {"road": road, "county": county}
        cache[coord_tuple] = loc_data # Update cache
        # print(f"Fetched & Cached: {coord_tuple} -> {loc_data}")
        time.sleep(1) 

    except GeocoderTimedOut:
        print(f"Warning: Geocoder timed out for {coord_tuple}. Returning None.")
        loc_data = {"road": None, "county": None}
    except GeocoderServiceError as e:
         print(f"Warning: Geocoder service error for {coord_tuple}: {e}. Returning None.")
         loc_data = {"road": None, "county": None}
    except Exception as e:
        print(f"Warning: An unexpected error occurred during geocoding for {coord_tuple}: {e}. Returning None.")
        loc_data = {"road": None, "county": None}

    return loc_data

In [22]:
# Function to Load/Save Cache 
def load_location_data(filepath):
    """Loads location cache from a pickle file."""
    try:
        with open(filepath, "rb") as f:
            loc_update = pickle.load(f)
            print(f"Loaded location cache from {filepath}")
            return loc_update if isinstance(loc_update, dict) else {}
    except (FileNotFoundError, EOFError):
        print(f"Cache file {filepath} not found or empty. Starting with an empty cache.")
        return {}
    except Exception as e:
        print(f"Error loading cache file {filepath}: {e}. Starting with an empty cache.")
        return {}


In [23]:
def save_location_data(cache, filepath):
    """Saves location cache to a pickle file."""
    try:
        with open(filepath, "wb") as f:
            pickle.dump(cache, f)
            print(f"Saved location cache to {filepath}")
    except Exception as e:
        print(f"Error saving cache file {filepath}: {e}")

In [24]:
if RUN_GEOCODING and GEOPY_LOADED and geolocator:
    print("\n--- Starting Geocoding Process (This will take a very long time!) ---")
    location_cache = load_location_data(LOCATION_CACHE_FILE) # Load existing data first

    # Prepare coordinates, skipping those already in cache
    coords_to_process = []
    for lat, lon in df[['Latitude', 'Longitude']].values:
         coord_tuple = (str(lat), str(lon))
         if coord_tuple not in location_cache:
             coords_to_process.append(coord_tuple)

    print(f"Total rows: {len(df)}. Cached entries: {len(location_cache)}. Need to fetch: {len(coords_to_process)}")

    location_results = []
    start_time = time.time()
    save_interval = 100 # Save progress every N iterations

    for i, coord_tuple in enumerate(coords_to_process):
        loc_data = get_location(coord_tuple, location_cache, geolocator)
        location_results.append(loc_data) # Store result temporarily

        if (i + 1) % save_interval == 0:
            # Update the main cache with newly fetched results before saving
            newly_fetched_coords = coords_to_process[:i+1]
            newly_fetched_results = location_results[:i+1]
            for c, r in zip(newly_fetched_coords, newly_fetched_results):
                 location_cache[c] = r # Ensure cache object is updated
            save_location_data(location_cache, LOCATION_CACHE_FILE)
            elapsed = time.time() - start_time
            print(f"Processed {i+1}/{len(coords_to_process)} new locations. Time elapsed: {elapsed:.2f}s. Cache saved.")

    # Final save after loop completes
    newly_fetched_coords = coords_to_process
    newly_fetched_results = location_results
    for c, r in zip(newly_fetched_coords, newly_fetched_results):
        location_cache[c] = r
    save_location_data(location_cache, LOCATION_CACHE_FILE)
    print("Geocoding process finished.")

    # Rebuild loc_df using the final cache in the correct order
    all_location_data = [location_cache.get((str(lat), str(lon)), {"road": None, "county": None})
                         for lat, lon in df[['Latitude', 'Longitude']].values]
    loc_df = pd.DataFrame(all_location_data, index=df.index)


elif not RUN_GEOCODING:
    print(f"\n--- Attempting to load location data from {LOCATION_CACHE_FILE} ---")
    location_cache = load_location_data(LOCATION_CACHE_FILE)
    if location_cache:
         # Reconstruct DataFrame from cache, ensuring order matches df
         ordered_location_data = [location_cache.get((str(lat), str(lon)), {"road": None, "county": None})
                                  for lat, lon in df[['Latitude', 'Longitude']].values]
         loc_df = pd.DataFrame(ordered_location_data, index=df.index)
         print(f"Successfully created DataFrame from cached location data. Found {len(loc_df)} entries.")
         # Check if the reconstructed df has expected columns and length
         if not all(col in loc_df.columns for col in ['road', 'county']):
              print("Warning: Loaded cache seems incomplete (missing 'road' or 'county'). Location features might be unusable.")
              loc_df = pd.DataFrame() # Reset if invalid
         elif len(loc_df) != len(df):
              print(f"Warning: Cache length ({len(loc_df)}) does not match main DataFrame length ({len(df)}). Skipping merge.")
              loc_df = pd.DataFrame() # Reset if invalid
    else:
        print("Could not load location data from cache. Location features ('road', 'county') will not be added.")
        loc_df = pd.DataFrame() # Ensure it's empty



--- Attempting to load location data from location_cache.pickle ---
Cache file location_cache.pickle not found or empty. Starting with an empty cache.
Could not load location data from cache. Location features ('road', 'county') will not be added.


In [25]:
# --- Merge Location Data (if available) ---
if not loc_df.empty:
    print("\n--- Merging location data into main DataFrame ---")
    # Ensure indices align before concatenation
    df = pd.concat([df, loc_df[['road', 'county']]], axis=1)
    print("Location data merged.")
    # print("\nDataFrame head after merging location:")
    # print(df.head())
    # print("\nDataFrame info after merging location:")
    # df.info()
else:
    print("\n--- Skipping location data merge ---")


--- Skipping location data merge ---


##  Drop Original Latitude/Longitude 

In [27]:
print("\n--- Dropping Latitude and Longitude columns ---")
cols_to_drop = ['Latitude', 'Longitude']
existing_cols_to_drop = [col for col in cols_to_drop if col in df.columns]
if existing_cols_to_drop:
    df = df.drop(labels=existing_cols_to_drop, axis=1)
    print(f"Dropped columns: {existing_cols_to_drop}")
else:
    print("Latitude/Longitude columns not found or already dropped.")


--- Dropping Latitude and Longitude columns ---
Dropped columns: ['Latitude', 'Longitude']


In [28]:
#  Handle Missing Values in 'road' and 'county' (using Mode Imputation) 
# This step is crucial if geocoding failed for some entries or cache was incomplete


if 'road' in df.columns:
    if df['road'].isnull().any():
        print("\n--- Imputing missing values in 'road' column using mode ---")
        road_mode = df['road'].mode()[0]
        df['road'].fillna(road_mode, inplace=True)
        print(f"Missing 'road' values filled with: {road_mode}")
    else:
        print("\nNo missing values found in 'road' column.")
else:
     print("\n'road' column not found, skipping imputation and encoding.")



'road' column not found, skipping imputation and encoding.


In [29]:
if 'county' in df.columns:
    if df['county'].isnull().any():
        print("\n--- Imputing missing values in 'county' column using mode ---")
        county_mode = df['county'].mode()[0]
        df['county'].fillna(county_mode, inplace=True)
        print(f"Missing 'county' values filled with: {county_mode}")
    else:
        print("\nNo missing values found in 'county' column.")
else:
     print("\n'county' column not found, skipping imputation and encoding.")


'county' column not found, skipping imputation and encoding.


##  Label Encode Categorical Features ('road', 'county') 
Label encoding is a technique used in machine learning and data analysis to convert categorical variables into numerical format.

### Initialize encoders

In [31]:
le_road = LabelEncoder()
le_county = LabelEncoder()
ROAD_ENCODER_FILE = 'le_road_encoder.pkl'
COUNTY_ENCODER_FILE = 'le_county_encoder.pkl'

In [32]:
if 'road' in df.columns:
    print("\n--- Label Encoding 'road' column ---")
    df['road'] = le_road.fit_transform(df['road'])
    # Save the encoder
    with open(ROAD_ENCODER_FILE, 'wb') as f:
        pickle.dump(le_road, f)
    print(f"'road' column encoded. Encoder saved to {ROAD_ENCODER_FILE}")

In [33]:
if 'county' in df.columns:
    print("\n--- Label Encoding 'county' column ---")
    df['county'] = le_county.fit_transform(df['county'])
    # Save the encoder
    with open(COUNTY_ENCODER_FILE, 'wb') as f:
        pickle.dump(le_county, f)
    print(f"'county' column encoded. Encoder saved to {COUNTY_ENCODER_FILE}")

In [34]:
print("\n--- Final DataFrame before splitting ---")
print(df.head())
print("\nFinal DataFrame info:")
df.info() # Check for NaNs and data types


--- Final DataFrame before splitting ---
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Target
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556   4.526
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842   3.585
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260   3.521
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945   3.413
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467   3.422

Final DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   MedInc      20640 non-null  float64
 1   HouseAge    20640 non-null  float64
 2   AveRooms    20640 non-null  float64
 3   AveBedrms   20640 non-null  float64
 4   Population  20640 non-null  float64
 5   AveOccup    20640 non-null  float64
 6   Target      20640 non-null  float64
dtypes: float64(7)
memory usage:

##   Prepare Data for Modeling 

In [36]:
print("\n--- Preparing data for modeling ---")
if 'Target' not in df.columns:
     raise ValueError("Target column is missing from the DataFrame before splitting.")

y = df['Target'].values
X = df.drop('Target', axis=1)
X_cols = X.columns.tolist() # Store column names/order for prediction consistency
X = X.values # Convert to numpy array

print(f"Features (X shape): {X.shape}")
print(f"Target (y shape): {y.shape}")
print(f"Feature columns used for training: {X_cols}")


--- Preparing data for modeling ---
Features (X shape): (20640, 6)
Target (y shape): (20640,)
Feature columns used for training: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup']


##  Train-Test Split 

In [38]:
print("\n--- Splitting data into Training and Test sets ---")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")


--- Splitting data into Training and Test sets ---
X_train shape: (16512, 6)
X_test shape: (4128, 6)
y_train shape: (16512,)
y_test shape: (4128,)


##  Model Training (Random Forest) 

In [40]:
print("\n--- Training RandomForestRegressor model ---")
model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1) # Example parameters
model.fit(X_train, y_train)
print("Model training complete.")


--- Training RandomForestRegressor model ---
Model training complete.


## Model Prediction

In [42]:
print("\n--- Making predictions on the test set ---")
y_pred = model.predict(X_test) 


--- Making predictions on the test set ---


## Check Model Accuracy

In [44]:
print("\n--- Evaluating model performance ---")
accuracy = r2_score(y_test, y_pred)
print(f"Model R2 Score on Test Set: {accuracy:.4f} ({accuracy*100:.2f}%)")


--- Evaluating model performance ---
Model R2 Score on Test Set: 0.6772 (67.72%)


## Predict on Custom/New Data

In [46]:
print("\n--- Example: Predicting on new data ---")

# 1. Define new data with raw categorical values
# Use column names matching the original df before Target was dropped
raw_input_data = {
    'MedInc': 8.3252,       # Example value from the dataset
    'HouseAge': 41.0,
    'AveRooms': 6.984127,
    'AveBedrms': 1.023810,
    'Population': 322.0,
    'AveOccup': 2.555556,
    # --- Categorical features need to exist if they were used in training ---
    'county': 'Alameda County',  # Example: Must be a county seen during training
    'road': 'Tunnel Road'      # Example: Must be a road seen during training
}


--- Example: Predicting on new data ---


In [47]:
# Add placeholders for columns if they weren't created (e.g., if geocoding failed/skipped)
if 'county' not in X_cols:
     del raw_input_data['county']
if 'road' not in X_cols:
     del raw_input_data['road']

In [48]:
# 2. Load the saved Label Encoders

encoders_loaded = True
try:
    if 'county' in raw_input_data:
        with open(COUNTY_ENCODER_FILE, 'rb') as f:
            le_county_loaded = pickle.load(f)
        print(f"Loaded County encoder from {COUNTY_ENCODER_FILE}")
    if 'road' in raw_input_data:
         with open(ROAD_ENCODER_FILE, 'rb') as f:
            le_road_loaded = pickle.load(f)
         print(f"Loaded Road encoder from {ROAD_ENCODER_FILE}")
except FileNotFoundError:
    print("Error: Could not load saved encoders. Cannot transform new categorical data.")
    encoders_loaded = False
except Exception as e:
    print(f"Error loading encoders: {e}")
    encoders_loaded = False

In [49]:
if encoders_loaded or ('county' not in raw_input_data and 'road' not in raw_input_data):
    # 3. Create a DataFrame and Transform Categorical Features
    inp_df = pd.DataFrame([raw_input_data])
    prediction_possible = True

    try:
        if 'county' in inp_df.columns:
             # Check if label is known before transforming
             county_label = inp_df['county'].iloc[0]
             if county_label not in le_county_loaded.classes_:
                  print(f"Warning: County '{county_label}' was not seen during training. Prediction might be inaccurate.")
                  
             inp_df['county'] = le_county_loaded.transform(inp_df['county'])

        if 'road' in inp_df.columns:
             road_label = inp_df['road'].iloc[0]
             if road_label not in le_road_loaded.classes_:
                   print(f"Warning: Road '{road_label}' was not seen during training. Prediction might be inaccurate.")
             inp_df['road'] = le_road_loaded.transform(inp_df['road'])

    except ValueError as e:
        print(f"Error transforming new data: {e}. This usually means an unknown category was provided.")
        prediction_possible = False
    except NameError as e:
         # This happens if an encoder wasn't loaded but the column exists
         print(f"Error: Encoder not loaded for a required column. {e}")
         prediction_possible = False


    if prediction_possible:
        # 4. Ensure Feature Order Matches Training Data (X_cols)
        try:
            inp_final = inp_df[X_cols].values # Select columns in the correct order
        except KeyError as e:
             print(f"Error: Missing expected column in input data: {e}")
             inp_final = None
             prediction_possible = False

    if prediction_possible and inp_final is not None:
        # 5. Make Prediction
        new_prediction_scaled = model.predict(inp_final)
        new_prediction_dollars = new_prediction_scaled[0] * 100000 # Scale back to dollars

        print(f"\nInput Data (Processed):")
        print(inp_df[X_cols]) # Show the numeric data fed to the model
        print(f"\nPredicted Median House Value: ${new_prediction_dollars:,.2f}")

else:
    print("Skipping prediction on new data due to issues with loading encoders or transforming input.")



Input Data (Processed):
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup
0  8.3252      41.0  6.984127    1.02381       322.0  2.555556

Predicted Median House Value: $473,342.77


### Manual Input Prediction Section 

In [73]:
# First, check if the model and necessary variables exist 


if 'model' not in locals() or model is None:
    print("Error: Model has not been trained yet. Please run the training cells.")
elif 'X_cols' not in locals() or not X_cols:
    print("Error: Feature list 'X_cols' used for training is not defined.")
else:
    print("\n--- Enter values below to predict house price ---")
    # # Inform the user which features are needed
    # print(f"(Model expects the following {len(X_cols)} features: {X_cols})")

    input_features = {}
    prediction_possible = True

    # Iterate over the columns the model was ACTUALLY trained on
    for feature in X_cols:
        while True:
            try:
                # Get input for the specific feature
                val_str = input(f"Enter value for '{feature}': ")
                # Convert to float - handle potential errors
                val = float(val_str)
                input_features[feature] = val
                break # Exit inner loop (while True) on successful input
            except ValueError:
                print(f"  Invalid input. '{feature}' requires a numeric value.")
            except EOFError:
                 print("\nInput interrupted. Cannot predict.")
                 prediction_possible = False
                 break # Exit inner loop (while True)
            except Exception as e:
                print(f"  An unexpected error occurred during input: {e}")
                # Decide if you want to retry or abort
                retry = input("  Retry input for this feature? (y/n): ").lower()
                if retry != 'y':
                    prediction_possible = False
                    break # Exit inner loop (while True)
        if not prediction_possible:
             break # Exit outer loop (for feature in X_cols)

    # --- Proceed only if all inputs were collected successfully ---
    if prediction_possible and len(input_features) == len(X_cols):
        # Convert collected features to a DataFrame
        # The keys in input_features automatically become column names
        user_input_df = pd.DataFrame([input_features])

        # --- Reorder columns to match the exact training order (CRITICAL) ---
        # Even if the dictionary likely preserves order, explicit reordering is safer.
        try:
            user_input_df = user_input_df[X_cols]
        except KeyError as e:
             print(f"\nError: Mismatch between input features and expected columns: {e}")
             print("Cannot proceed with prediction.")
             prediction_possible = False # Redundant but clear

        if prediction_possible:
            try:
                # Predict using the model
                # The input DataFrame now has the correct 6 features in the correct order
                predicted_price_scaled = model.predict(user_input_df)[0]

                # Scale the prediction back to dollars (assuming target was in $100,000s)
                predicted_price_dollars = predicted_price_scaled * 100000

                print("-" * 30) # Separator
                print(f"\n Predicted Median House Value: ${predicted_price_dollars:,.2f} ")
                print("-" * 30) # Separator

            except Exception as e:
                print(f"\nError during model prediction: {e}")

    elif prediction_possible:
        # This case happens if the outer loop finished but not all features were collected (shouldn't happen with current logic, but good check)
        print("\nError: Did not collect the correct number of features. Cannot predict.")
    # else: (prediction_possible is False - error message already printed during input)


--- Enter values below to predict house price ---


Enter value for 'MedInc':  


  Invalid input. 'MedInc' requires a numeric value.


Enter value for 'MedInc':  


  Invalid input. 'MedInc' requires a numeric value.


KeyboardInterrupt: Interrupted by user