## Filling Missing Coordinates in Rental Listings Dataset

This notebook addresses the missing geographic coordinates (latitude and longitude) in the rental listings dataset. These coordinates are essential for spatial analysis, mapping, and understanding the geographic distribution of rental properties.




In [1]:
import pandas as pd
import geopandas as gpd
from utilities import find_location_id, get_coordinates

In [2]:
rent_data = pd.read_parquet("/root/project-2-group-real-estate-industry-project-34/data/landing/feature_selected_rental_listings.parquet")
sf = gpd.read_file("/root/MAST30034_Python/data/vic_zones/SA2_2021_AUST_GDA2020.shp")


In [3]:
rent_data.count()

address               196908
state                 196908
suburb                196908
bedrooms              196908
bathrooms             196908
propertyTypes         196898
carspaces             196908
date_listed           196908
latitude              191569
longitude             191569
is_new_development    196908
price                 196908
propertyId            131200
is_furnished          196908
year                  196908
month                 196908
day                   196908
dtype: int64

Filling Missing Coordinates: We loop over each listing that is missing latitude and longitude. For each missing value, we use the get_coordinates() function, which likely uses the listing’s address to retrieve geographic coordinates using Google Maps API

In [4]:
# Fill missing geoLocation.latitude and geoLocation.longitude values
for index, row in rent_data[rent_data["latitude"].isnull()].iterrows():
    address = row["address"]
    if address:  # Ensure address is not empty
        lat, lon = get_coordinates(address)  # Get the coordinates using the function
        rent_data.at[index, "latitude"] = lat
        rent_data.at[index, "longitude"] = lon


Mapping to SA2 Regions: Once the coordinates are filled, we map each property to its corresponding SA2 region using the find_location_id() function. 

In [5]:
# Apply the 'find_location_id' function to each row in 'rent_data'
rent_data['SA2_CODE21'] = rent_data.apply(
    lambda row: find_location_id(row['latitude'], row['longitude'], sf), axis=1
)

In [6]:
# Save the cleaned DataFrame to a Parquet file
rent_data.to_parquet("preprocessed_rent_data.parquet", index=False)