<a href="https://colab.research.google.com/github/aish466-p/AIML-2025/blob/main/2303A51466_batch_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Prediction of real estate valuation based on geo-referencing

1.Identify the top 5 house age values with the most estate value.
2.Does the MRT station or stores contribute to real estate value?
3.Identify the max and min house price of unit area.
4.Predict the date when most transactions happened in real estate.
5.Identify the closest distance perimeter for MRT stations, stores for real estate value.

In [14]:

import pandas as pd

file_path = '/content/Real estate valuation data set.csv'
df = pd.read_csv(file_path)


print(df.columns)


df.rename(columns={
    'X1 transaction date': 'transaction_date',
    'X2 house age': 'house_age',
    'X3 distance to the nearest MRT station': 'distance_to_MRT',
    'X4 number of convenience stores': 'convenience_stores',
    'Y house price of unit area': 'price_per_unit_area'
}, inplace=True)

print(df.columns)


Index(['No', 'X1 transaction date', 'X2 house age',
       'X3 distance to the nearest MRT station',
       'X4 number of convenience stores', 'X5 latitude', 'X6 longitude',
       'Y house price of unit area'],
      dtype='object')
Index(['No', 'transaction_date', 'house_age', 'distance_to_MRT',
       'convenience_stores', 'X5 latitude', 'X6 longitude',
       'price_per_unit_area'],
      dtype='object')


In [15]:
if 'house_age' in df.columns and 'price_per_unit_area' in df.columns:
    house_age_value = df.groupby('house_age')['price_per_unit_area'].sum()
    top_5_house_ages = house_age_value.sort_values(ascending=False).head(5)
    print("Top 5 house age values with the most estate value:")
    print(top_5_house_ages)
else:
    print("Columns 'house_age' or 'price_per_unit_area' not found in the dataset.")


Top 5 house age values with the most estate value:
house_age
0.0     920.3
13.3    256.7
16.4    255.6
1.1     248.9
13.6    227.4
Name: price_per_unit_area, dtype: float64


In [16]:
if 'distance_to_MRT' in df.columns and 'convenience_stores' in df.columns and 'price_per_unit_area' in df.columns:
    mrt_correlation = df['distance_to_MRT'].corr(df['price_per_unit_area'])
    stores_correlation = df['convenience_stores'].corr(df['price_per_unit_area'])

    print(f"Correlation between MRT distance and price: {mrt_correlation:.2f}")
    print(f"Correlation between stores and price: {stores_correlation:.2f}")

    if mrt_correlation < 0:
        print("Closer MRT stations increase estate value.")
    if stores_correlation > 0:
        print("More convenience stores increase estate value.")
else:
    print("Columns 'distance_to_MRT', 'convenience_stores', or 'price_per_unit_area' not found.")


Correlation between MRT distance and price: -0.67
Correlation between stores and price: 0.57
Closer MRT stations increase estate value.
More convenience stores increase estate value.


In [17]:
if 'price_per_unit_area' in df.columns:
    max_price = df['price_per_unit_area'].max()
    min_price = df['price_per_unit_area'].min()

    max_price_row = df[df['price_per_unit_area'] == max_price]
    min_price_row = df[df['price_per_unit_area'] == min_price]

    print("Maximum house price per unit area:")
    print(max_price_row)

    print("\nMinimum house price per unit area:")
    print(min_price_row)
else:
    print("Column 'price_per_unit_area' not found.")


Maximum house price per unit area:
      No  transaction_date  house_age  distance_to_MRT  convenience_stores  \
270  271          2013.333       10.8         252.5822                   1   

     X5 latitude  X6 longitude  price_per_unit_area  
270      24.9746     121.53046                117.5  

Minimum house price per unit area:
      No  transaction_date  house_age  distance_to_MRT  convenience_stores  \
113  114          2013.333       14.8         393.2606                   6   

     X5 latitude  X6 longitude  price_per_unit_area  
113     24.96172     121.53812                  7.6  


In [18]:
if 'transaction_date' in df.columns:
    transaction_counts = df['transaction_date'].value_counts()
    most_transactions_date = transaction_counts.idxmax()
    most_transactions_count = transaction_counts.max()

    print(f"Date with most transactions: {most_transactions_date} ({most_transactions_count} transactions)")
else:
    print("Column 'transaction_date' not found.")


Date with most transactions: 2013.417 (58 transactions)


In [19]:
if 'price_per_unit_area' in df.columns and 'distance_to_MRT' in df.columns and 'convenience_stores' in df.columns:
    threshold = df['price_per_unit_area'].quantile(0.90)  # Top 10% high-value properties
    high_value_properties = df[df['price_per_unit_area'] >= threshold]

    closest_MRT_distance = high_value_properties['distance_to_MRT'].min()
    closest_store_distance = high_value_properties['convenience_stores'].max()

    print(f"Closest MRT distance for high-value properties: {closest_MRT_distance}")
    print(f"Closest store count for high-value properties: {closest_store_distance}")

    closest_rows = high_value_properties[
        (high_value_properties['distance_to_MRT'] == closest_MRT_distance) |
        (high_value_properties['convenience_stores'] == closest_store_distance)
    ]
    print("\nHigh-value properties with closest MRT and store counts:")
    print(closest_rows)
else:
    print("Columns 'price_per_unit_area', 'distance_to_MRT', or 'convenience_stores' not found.")


Closest MRT distance for high-value properties: 49.66105
Closest store count for high-value properties: 10

High-value properties with closest MRT and store counts:
      No  transaction_date  house_age  distance_to_MRT  convenience_stores  \
160  161          2012.917        3.5         49.66105                   8   
236  237          2013.167        3.6        373.83890                  10   
377  378          2013.333        3.9         49.66105                   8   

     X5 latitude  X6 longitude  price_per_unit_area  
160     24.95836     121.53756                 57.8  
236     24.98322     121.53765                 61.9  
377     24.95836     121.53756                 56.8  
