# Real Estate Amenities Distance Calculation

This notebook calculates the Euclidean distance from each real estate property to the nearest metro station, hospital, and school. The distances are then added as new features to the real estate dataset, which is saved as a CSV file.

## 1. Setting Up the Environment and Loading Data

First, we import the necessary libraries. If you don't have them installed, you can install them using `pip`:
`pip install pandas numpy`

Then, we load the real estate data and separate datasets for metro stations, hospitals, and schools. Ensure these CSV files are located in a 'CSV data' subfolder relative to your notebook, or update the paths accordingly. We use `latin1` encoding as specified in your original script.

In [ ]:
import pandas as pd
import numpy as np

# Load datasets with latin1 encoding
# Ensure 'CSV data' folder exists and contains the specified CSV files
realestate_df = pd.read_csv('CSV data/Delhi_RealEstate.csv', encoding='latin1')
metro_df = pd.read_csv('CSV data/MetroStations_Data.csv', encoding='latin1')
hospital_df = pd.read_csv('CSV data/Hospitals_Data.csv', encoding='latin1')
school_df = pd.read_csv('CSV data/Schools_Data.csv', encoding='latin1')

## 2. Defining Distance Calculation Functions

This section defines two helper functions:
- `euclidean_distance`: Calculates the standard Euclidean distance between two points given their (X, Y) coordinates.
- `get_nearest_distance`: Applies the Euclidean distance calculation to find the minimum distance from a given point to any point in a target DataFrame (e.g., nearest metro station).

In [ ]:
def euclidean_distance(x1, y1, x2, y2):
    # Calculates the Euclidean distance between two points
    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)

def get_nearest_distance(x, y, locations_df):
    # Finds the minimum Euclidean distance from (x, y) to any point in locations_df
    return locations_df.apply(lambda row: euclidean_distance(x, y, row['X'], row['Y']), axis=1).min()

## 3. Calculating Distances to Amenities

We apply the `get_nearest_distance` function to each real estate property to calculate its distance to the nearest metro station, hospital, and school. The results are stored in new columns in the `realestate_df`.

In [ ]:
realestate_df['distance_nearest_metro'] = realestate_df.apply(
    lambda row: get_nearest_distance(row['X'], row['Y'], metro_df), axis=1
)

realestate_df['distance_nearest_hospital'] = realestate_df.apply(
    lambda row: get_nearest_distance(row['X'], row['Y'], hospital_df), axis=1
)

realestate_df['distance_nearest_school'] = realestate_df.apply(
    lambda row: get_nearest_distance(row['X'], row['Y'], school_df), axis=1
)

## 4. Converting Distances to Kilometers and Final Save

The calculated distances (which are in degrees, assuming your X, Y coordinates are latitude/longitude) are converted to kilometers using an approximate conversion factor. The intermediate distance columns are then dropped, and the final enriched DataFrame is saved to a CSV file.

In [ ]:
conversion_factor = 111  # Approximate km per degree (latitude/longitude)
realestate_df['distance_nearest_metro_km'] = realestate_df['distance_nearest_metro'] * conversion_factor
realestate_df['distance_nearest_hospital_km'] = realestate_df['distance_nearest_hospital'] * conversion_factor
realestate_df['distance_nearest_school_km'] = realestate_df['distance_nearest_school'] * conversion_factor

# Drop the original degree-based distance columns
realestate_df = realestate_df.drop(['distance_nearest_metro', 'distance_nearest_hospital', 'distance_nearest_school'], axis=1)

# Save the enriched DataFrame to a CSV file
realestate_df.to_csv('CSV data/delhi_realestate_with_distances.csv', index=False, encoding='latin1')

print("CSV file saved as 'delhi_realestate_with_distances.csv'")