# Data Collection & Cleaning

## Objective

The objective of this step is to gather all relevant real estate transaction and rent contract data from sources like Dubai Land Department (DLD) and Dubai Pulse. We will clean and prepare these datasets, ensuring they are ready for thorough analysis and modeling later. Data cleaning is critical to eliminate inconsistencies and errors that can impact the accuracy of our models and the insights provided to investors.

**Load the datasets**

- Transactions dataset

- Rent Contracts dataset

- Projects dataset

In [1]:
# Importing libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [2]:
# Loading the datasets
transactions = pd.read_csv('../data/raw/transactions.csv')
rent_contracts = pd.read_csv('../data/raw/rent_contracts.csv')
projects = pd.read_csv('../data/raw/projects.csv')
developers = pd.read_csv('../data/raw/developers.csv')

In [3]:
# Setting pandas options to display all columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [4]:
# Displaying the first few rows of the datasets
print("Transactions dataset:")
display(transactions.head())

print("\nRent contracts dataset:")
display(rent_contracts.head())

print("\nProjects dataset:")
display(projects.head())

Transactions dataset:


Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,property_usage_ar,property_usage_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
0,1-11-2024-10138,11,1,مبايعات,Sales,بيع,Sell,19-03-2024,4,فيلا,Villa,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,278,منخول,Mankhool,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,1305.29,5769000.0,4419.71,,,8.0,1.0,0.0
1,3-9-2002-39,9,3,هبات,Gifts,هبه,Grant,25-03-2002,1,أرض,Land,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,365,الحضيبه,Al Hudaiba,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,1466.94,1105300.0,753.47,,,1.0,1.0,0.0
2,1-11-2016-12930,11,1,مبايعات,Sales,بيع,Sell,02-11-2016,4,فيلا,Villa,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,,,0,390.0,2089900.0,5358.72,,,1.0,1.0,0.0
3,1-11-2005-300028,11,1,مبايعات,Sales,بيع,Sell,28-02-2005,4,فيلا,Villa,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,0,396.09,511612.0,1291.66,,,1.0,1.0,0.0
4,1-11-2010-17709,11,1,مبايعات,Sales,بيع,Sell,09-12-2010,2,مبنى,Building,,,,سكني / تجاري,Residential / Commercial,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,0,559.18,5700000.0,10193.5,,,1.0,1.0,0.0



Rent contracts dataset:


Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,property_usage_en,property_usage_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,area_id,area_name_ar,area_name_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en
0,CRT1012981266,1,جديد,New,07-04-2019,06-04-2020,85000,85000,1,1,1.0,2.0,وحدة,Unit,2.0,Office,مكتب,422.0,Office,مكتب,Commercial,تجاري,467.0,إمباير هايتس,EMPIRE HEIGHTS,الخليج التجاري,Business Bay,526.0,الخليج التجارى,Business Bay,140.0,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
1,CRT1012983196,1,جديد,New,20-04-2019,19-04-2020,110000,110000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,2.0,2 bed rooms+hall,غرفتين و صالة,Residential,سكني,,,,قرية جميرا المثلثة,Jumeirah Village Triangle,442.0,البرشاء جنوب الخامسة,Al Barsha South Fifth,734.0,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,1.0,شخص,Person
2,CRT1012984226,1,جديد,New,11-04-2019,10-04-2020,100000,100000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,3.0,3 bed rooms+hall,ثلاثة غرفة و صالة,Residential,سكني,1488.0,ريم - ميرا أوسيس كوميونتي,REEM - MIRA OASIS COMMUNITY,,,506.0,اليلايس 1,Al Yelayiss 1,324.0,دورة دبي للدراجات,Dubai Cycling Course,,,,,1.0,شخص,Person
3,CRT1012984996,2,تجديد,Renew,18-03-2019,17-03-2020,150000,150000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,3.0,3 bed rooms+hall,ثلاثة غرفة و صالة,Residential,سكني,1377.0,أرابيان رانشز - مجمع بالما,ARABIAN RANCHES - PALMA COMMUNITY,المرابع العربية 2 - بالما,Arabian Ranches II - PALMA,463.0,وادي الصفا 7,Wadi Al Safa 7,405.0,موتور سيتي,Motor City,,,,,1.0,شخص,Person
4,CRT1012986616,1,جديد,New,15-04-2019,14-04-2020,95000,95000,1,1,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,Residential,سكني,,,,جميرا بيتش ريزيدنس - الجيه بي آر,Jumeriah Beach Residence - JBR,330.0,مرسى دبي,Marsa Dubai,103.0,برج العرب,Burj Al Arab,مساكن شاطئ جميرا,Jumeirah Beach Residency,مارينا مول,Marina Mall,,,



Projects dataset:


Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
0,139,139,دايموند,48.0,48.0,مدينة دبي الرياضية (ش. ذ. م. م),200,200,مدينة دبي الرياضية (ش. ذ. م. م),01-03-2008,15-11-2009,1,عادي,1,مباني,33.0,بنك المشرق (شركة مساهمة عامة),FINISHED,منجز,100.0,15-11-2009,,مشروع بناية سكنية للتملك الحر تقع في منطقة مدي...,Freehold residential building located in Dubai...,1100101900,435.0,الحبيه الرابعة,Al Hebiah Fourth,مدينة دبي الرياضية,Dubai Sports City,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,128
1,146,146,نيلوفر,54.0,54.0,دبي للعقارات (ش.ذ.م.م),850,850,دبي للعقارات (ش.ذ.م.م),19-02-2008,23-05-2017,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),FINISHED,منجز,100.0,23-05-2017,,مشروع مبنى سكني للتملك الحر يقع في منطقة الجدا...,Freehold residential building development loca...,1100129352,334.0,الجداف,Al Jadaf,قرية الثقافة,Culture Village,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,94
2,159,159,برج برايم,65.0,65.0,الخليج التجاري (ش.ذ.م.م),898,898,الخليج التجاري (ش.ذ.م.م),01-12-2007,18-11-2014,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),FINISHED,منجز,100.0,18-11-2014,,مشروع برج مكاتب للتملك الحر يقع في منطقة الخلي...,Freehold commercial tower development in Dubai...,1100169566,526.0,الخليج التجارى,Business Bay,الخليج التجاري,Business Bay,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,206
3,41,41,51 @ بزنس باي,15.0,15.0,دبي للعقارات (ش.ذ.م.م),850,850,دبي للعقارات (ش.ذ.م.م),12-02-2008,09-02-2012,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),FINISHED,منجز,100.0,09-02-2012,,مشروع برج تجاري للتملك الحر يقع في منطقة الخلي...,Freehold commercial development in Business Ba...,1100166188,526.0,الخليج التجارى,Business Bay,الخليج التجاري,Business Bay,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,176
4,46,46,برج برلنجتون,15.0,15.0,الخليج التجاري (ش.ذ.م.م),898,898,الخليج التجاري (ش.ذ.م.م),30-04-2007,30-12-2012,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),FINISHED,منجز,100.0,19-09-2013,,مشروع برج تجاري للتملك الحر يقع في منطقة الخلي...,Freehold commercial development in Business Ba...,1100166979,526.0,الخليج التجارى,Business Bay,الخليج التجاري,Business Bay,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,501


In [5]:
# Displaying information about each dataset
print("Transactions dataset info:")
print(transactions.info())

print("\nRent contracts dataset info:")
print(rent_contracts.info())

print("\nProjects dataset info:")
print(projects.info())

Transactions dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1324062 entries, 0 to 1324061
Data columns (total 46 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   transaction_id        1324062 non-null  object 
 1   procedure_id          1324062 non-null  int64  
 2   trans_group_id        1324062 non-null  int64  
 3   trans_group_ar        1324062 non-null  object 
 4   trans_group_en        1324062 non-null  object 
 5   procedure_name_ar     1324062 non-null  object 
 6   procedure_name_en     1324062 non-null  object 
 7   instance_date         1324062 non-null  object 
 8   property_type_id      1324062 non-null  int64  
 9   property_type_ar      1324062 non-null  object 
 10  property_type_en      1324062 non-null  object 
 11  property_sub_type_id  1037827 non-null  float64
 12  property_sub_type_ar  1037827 non-null  object 
 13  property_sub_type_en  1037827 non-null  object 
 14  propert

# 

**Datasets Observations**

1. **Transactions Dataset**:

    - **Size & Coverage**: With over 1.32 million records, this dataset offers extensive coverage of Dubai’s real estate transactions, encompassing a variety of property types, transaction types (sales, mortgages), and relevant location data.

    - **Missing Data**:

        - **Property Subtypes and Building Names**: Significant portions of records lack building names (~30% missing), project numbers, and property subtypes, which may affect more granular location-based analyses.

        - **Rent-related Columns**: The columns related to rent (`rent_value` and `meter_rent_price`) are sparsely populated (only ~35,000 non-null values), suggesting that rental information is incomplete in this dataset.

        - **Proximity Features**: Information on landmarks, metro stations, and malls is incomplete, with approximately 20-30% missing values across these columns. However, the presence of this data for a majority of transactions still offers valuable insights.

    - **Key Columns for Analysis**:

        - **Transaction Price**: Columns like `procedure_area`, `meter_sale_price`, and `actual_worth` are fully populated, providing a solid basis for price prediction and property trend analysis.

        - **Proximity to Landmarks**: These columns will be crucial for understanding the impact of location on property values, but missing data needs careful handling.

        - **Property Types and Usage**: The dataset includes a wide range of property types (villas, units, land, buildings) and usage categories (residential, commercial, etc.). A focus on residential properties (villas and units) will be most relevant for predicting property prices and rental yields.


2. **Rent Contracts Dataset**:

    - **Size & Coverage**: The dataset contains over 8.1 million rental contracts, offering extensive data for understanding the rental market and trends in Dubai.

    - **Missing Data**:

        - **Proximity Data**: Like the transactions dataset, proximity data to metro stations, landmarks, and malls is missing for a substantial portion of entries (30%+), which might impact location-based rent predictions.

        - **Property Subtypes and Tenant Type**: Some missing values exist for property subtypes and tenant types, but the dataset still provides a robust base for rental analysis.

    - **Key Columns for Analysis**:

        - **Contract Amount & Annual Amount**: These columns will be vital for calculating rental yields and identifying trends in rental prices over time.

        - **Property Details**: The dataset includes property types (villas, units, office spaces), which aligns well with the property types found in the transactions dataset, allowing us to bridge rental trends with sales data.

        - **Contract Dates**: With clear start and end dates for contracts, this dataset enables a longitudinal analysis of rental contracts, helping to project future rental yields and trends.


3. **Projects Dataset**:

    - **Size & Coverage**: This is a smaller dataset (2,362 entries), but it offers valuable information about real estate development projects in Dubai, which will be crucial for off-plan property analysis and understanding future supply.

    - **Missing Data**:

        - **Completion Dates & Statuses**: Some records lack key information like project completion dates (~20% missing) and percent completion (~10% missing). Handling this missing data will be important for analyzing future supply and development risks.

        - **Project Descriptions**: The dataset includes rich textual information about each project, which could be leveraged for more qualitative insights.

    - **Key Columns for Analysis**:

        - **Project Status & Percent Completion**: These fields will be essential for tracking the progress of off-plan properties and understanding their impact on future market conditions.

        - **Project Types**: The dataset includes classifications for residential, commercial, and mixed-use projects, which can help segment the analysis of off-plan versus existing properties.

4. **General Observations**:

    - **Missing Data Handling**:

        - Across all datasets, proximity-related features (e.g., nearest mall, metro, and landmark) have significant missing data, which may limit their use for certain models. Imputation or feature engineering may be required to handle these columns effectively.

        - **Rent Contracts**: Since rent data is sparse in the transactions dataset but well-covered in the rent contracts dataset, using both datasets together will provide a more complete picture of the Dubai real estate market.

    - **Residential Focus**: Both the transactions and rent contracts datasets contain detailed data on residential properties (villas and units). Focusing on these property types for price and rental yield prediction will align with the primary goals of helping investors make informed decisions.

First, I'll filter the transactions and rent contracts datasets to include the last 5 years to capture recent trends and avoid outdated data.

In [6]:
# Displaying a sample of the "instance_date" column in the transactions dataset
print(transactions['instance_date'].sample(5))

735089    08-11-2021
857528    05-10-2022
848065    02-11-2010
524313    18-04-2018
53306     27-03-2024
Name: instance_date, dtype: object


In [7]:
from datetime import datetime

# Converting the "instance_date" column in the transactions dataset to datetime format
transactions['instance_date'] = pd.to_datetime(transactions['instance_date'], dayfirst=True, format='%d-%m-%Y', errors='coerce')

# Filtering the transactions dataset to include the last 5 years of data
transactions_5y = transactions[transactions['instance_date'].dt.year >= datetime.now().year - 5]

# Comparing the shapes of the transactions datasets
print("Transactions dataset shape before filtering:", transactions.shape)
print("Transactions dataset shape after filtering:", transactions_5y.shape)

Transactions dataset shape before filtering: (1324062, 46)
Transactions dataset shape after filtering: (654551, 46)


In [8]:
# Displaying random samples of rent contracts date columns
rent_contracts[[col for col in rent_contracts.columns if "date" in col]].sample(5)

Unnamed: 0,contract_start_date,contract_end_date
3511139,01-10-2021,30-01-2022
4565729,01-11-2022,31-10-2023
4203237,04-11-2022,03-11-2023
1826498,11-10-2012,10-10-2013
5338360,09-03-2024,08-03-2025


In [9]:
# Converting "contract_start_date" and "contract_end_date" columns in the rent contracts dataset to datetime format
rent_contracts['contract_start_date'] = pd.to_datetime(rent_contracts['contract_start_date'], 
                                                       dayfirst=True, format='%d-%m-%Y', errors='coerce')
rent_contracts['contract_end_date'] = pd.to_datetime(rent_contracts['contract_end_date'], 
                                                     dayfirst=True, format='%d-%m-%Y', errors='coerce')

# Filtering the rent contracts dataset to include the last 5 years of data
rent_contracts_5y = rent_contracts[rent_contracts['contract_start_date'].dt.year >= datetime.now().year - 5]

# Comparing the shapes of the rent contracts datasets
print("Rent contracts dataset shape before filtering:", rent_contracts.shape)
print("Rent contracts dataset shape after filtering:", rent_contracts_5y.shape)

Rent contracts dataset shape before filtering: (8147996, 40)
Rent contracts dataset shape after filtering: (4838119, 40)


In [10]:
# Checking both datasets information after filtering
print("Transactions dataset info:")
print(transactions_5y.info())

print("\nRent contracts dataset info:")
print(rent_contracts_5y.info())

Transactions dataset info:
<class 'pandas.core.frame.DataFrame'>
Index: 654551 entries, 0 to 1324061
Data columns (total 46 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   transaction_id        654551 non-null  object        
 1   procedure_id          654551 non-null  int64         
 2   trans_group_id        654551 non-null  int64         
 3   trans_group_ar        654551 non-null  object        
 4   trans_group_en        654551 non-null  object        
 5   procedure_name_ar     654551 non-null  object        
 6   procedure_name_en     654551 non-null  object        
 7   instance_date         654551 non-null  datetime64[ns]
 8   property_type_id      654551 non-null  int64         
 9   property_type_ar      654551 non-null  object        
 10  property_type_en      654551 non-null  object        
 11  property_sub_type_id  537807 non-null  float64       
 12  property_sub_type_ar  537807 non-nu

In [11]:
# Checking the percentages of missing data in transactions and rent contracts datasets
print("Transactions dataset missing data percentage:")
print(transactions_5y.isnull().sum() / transactions_5y.shape[0] * 100)

print("\nRent contracts dataset missing data percentage:")
print(rent_contracts_5y.isnull().sum() / rent_contracts_5y.shape[0] * 100)

Transactions dataset missing data percentage:
transaction_id           0.000000
procedure_id             0.000000
trans_group_id           0.000000
trans_group_ar           0.000000
trans_group_en           0.000000
procedure_name_ar        0.000000
procedure_name_en        0.000000
instance_date            0.000000
property_type_id         0.000000
property_type_ar         0.000000
property_type_en         0.000000
property_sub_type_id    17.835738
property_sub_type_ar    17.835738
property_sub_type_en    17.835738
property_usage_ar        0.000000
property_usage_en        0.000000
reg_type_id              0.000000
reg_type_ar              0.000000
reg_type_en              0.000000
area_id                  0.000000
area_name_ar             0.000000
area_name_en             0.000000
building_name_ar        28.971157
building_name_en        28.905769
project_number          18.169554
project_name_ar         18.169554
project_name_en         18.169554
master_project_en       21.266945
ma

**Filtering On Residential Properties**

We’ll start by filtering the **transactions** and **rent contracts** datasets to include only residential properties.

In [12]:
# Displaying random observations from the transactions dataset
transactions_5y.sample(5)

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,property_usage_ar,property_usage_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
904240,1-133-2019-328,133,1,مبايعات,Sales,تسجيل حق منفعة,Development Registration,2019-11-07,3,وحدة,Unit,60.0,شقه سكنيه,Flat,سكني,Residential,1,العقارات القائمة,Existing Properties,484,ند حصة,Nadd Hessa,سبرينج,Spring,,,,Silicon Oasis,واحة السيليكون,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفة,1 B/R,1,81.33,950695.0,11689.35,,,1.0,1.0,0.0
1217347,1-102-2021-13855,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2021-08-19,3,وحدة,Unit,60.0,شقه سكنيه,Flat,سكني,Residential,0,على الخارطة,Off-Plan Properties,526,الخليج التجارى,Business Bay,نورث سايد 15 - تاور 1,15 Northside - Tower 1,2214.0,15 نورث سايد,15 Northside,Business Bay,الخليج التجاري,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,103.67,2060000.0,19870.74,,,1.0,1.0,0.0
1105331,1-102-2024-40242,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2024-06-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,سكني,Residential,0,على الخارطة,Off-Plan Properties,526,الخليج التجارى,Business Bay,رو? منازل مراسي دراي?,Rove Home Marasi Drive,2966.0,روف منازل مراسي درايف,Rove Home Marasi Drive,Business Bay,الخليج التجاري,وسط مدينة دبي,Downtown Dubai,محطة مترو الخليج التجاري,Business Bay Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,82.98,2658888.0,32042.52,,,1.0,1.0,0.0
203142,1-11-2022-1876,11,1,مبايعات,Sales,بيع,Sell,2022-02-03,3,وحدة,Unit,60.0,شقه سكنيه,Flat,سكني,Residential,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,ميسكا 2,MISKA 2,,,,Burj Khalifa,برج خليفة,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,21.76,250000.0,11488.97,,,1.0,1.0,0.0
485730,1-102-2022-21471,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2022-08-04,3,وحدة,Unit,60.0,شقه سكنيه,Flat,سكني,Residential,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,أوركيد في شاطئ الخور المبنى 1,Orchid at Creek Beach Building 1,2383.0,شاطئ الخور - اوركيد,Creek Beach - Orchid,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفة,1 B/R,1,66.58,1335888.0,20064.4,,,1.0,2.0,0.0


In [160]:
projects.isnull().sum() / projects.shape[0] * 100

project_id                    0.000000
project_number                0.000000
project_name                  0.000000
developer_id                  0.084674
developer_number              0.084674
developer_name                0.000000
master_developer_id           0.000000
master_developer_number       0.000000
master_developer_name         0.000000
project_start_date            0.211685
project_end_date             15.325995
project_type_id               0.000000
project_type_ar               0.000000
project_classification_id     0.000000
project_classification_ar     0.000000
escrow_agent_id               6.773920
escrow_agent_name             6.773920
project_status                0.000000
project_status_ar             0.000000
percent_completed             0.254022
completion_date              22.777307
cancellation_date            98.899238
project_description_ar       16.596105
project_description_en        2.413209
property_id                   0.000000
area_id                  

In [162]:
print(projects.shape[0])
print(projects['project_name'].nunique())

2362
2355


In [163]:
projects['project_number'].nunique()

2362

In [170]:
# Displaying values where the project_name is duplicated
projects[projects['project_name'].duplicated(keep=False)]

Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
26,297,297,بوتانيكا,190.0,190.0,اعمار العقارية (ش . م. ع),555,555,اعمار العقارية (ش . م. ع),01-06-2009,10-11-2011,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),FINISHED,منجز,100.0,10-11-2011,,مشروع برج سكني للتملك الحر يقع في منطقة مرسى د...,Freehold residential tower development of 42 f...,1100096320,330.0,مرسى دبي,Marsa Dubai,دبي مارينا,Dubai Marina,1,بلدية دبي,Dubai Municipality,0,1,0,371
120,21225977,1649,بوتانيكا,21223993.0,1074.0,قرية جميرا (ش.ذ.م.م),102,102,قرية جميرا (ش.ذ.م.م),01-07-2014,21-04-2016,1,عادي,1,مباني,19.0,بنك أبوظبي الأول ش.م.ع.,FINISHED,منجز,100.0,25-04-2016,,.2B+G+4 المشروع عبارة عن شقق سكنية في منطقة د...,PROJECT CONSISTS OF 2B+G+4 RESIDENTIAL APARTME...,1100115348,441.0,البرشاء جنوب الرابعة,Al Barsha South Fourth,قرية جميرا الدائرية,Jumeirah Village Circle,4,تراخيص,Trakhees,0,1,0,121
310,596302407,3003,أورا,41647020.0,1162.0,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,41647020,1162,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,25-09-2024,,1,عادي,1,مباني,19.0,بنك أبوظبي الأول ش.م.ع.,NOT_STARTED,تحت الانشاء,0.0,,,أرضي + 2 باركنج + 14 طابق + باركنج أرضي + بوديوم,G+2P+14+ parking in Ground floor + Podium floor,1100294834,507.0,اليلايس 2,Al Yelayiss 2,تاون سكوير,TOWN SQUARE,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,157
465,202099775,2105,مها تاون هاوس,41647020.0,1162.0,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,41647020,1162,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,08-12-2018,,1,عادي,3,مجمع فلل,7.0,بنك الإمارات دبي الوطني (ش.م.ع),PENDING,قيد التسجيل,0.0,,,292 VILLA G+1 Townhouses,292 VILLA G+1 Townhouses,1100226020,,,,,,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,0,0,0
558,576906053,2924,كيتوراه ريزيرف,462273479.0,1518.0,مجموعة ميدان (ش.ذ.م.م),53873272,1181,مجموعة ميدان (ش.ذ.م.م),01-12-2022,30-06-2027,1,عادي,3,مجمع فلل,7.0,بنك الإمارات دبي الوطني (ش.م.ع),NOT_STARTED,تحت الانشاء,0.0,30-06-2027,,يتكون من 93 تاون هاوس 2 مبنى سكني أرضي + 14 ع...,comprises of 93 townhouse 2 Nos . G+14 residen...,1100289862,482.0,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,حدائق الشيخ محمد بن راشد - ديستركت 7,HADAEQ SHEIKH MOHAMMED BIN RASHID - DISRICT 7,1,بلدية دبي,Dubai Municipality,10,6,0,547
780,358540450,2259,أورا,160608474.0,1228.0,ماجد الفطيم لتشغيل مشاريع المدن المتكاملة الام...,159428374,1227,ماجد الفطيم لتشغيل مشاريع المدن المتكاملة الام...,06-11-2021,31-08-2024,1,عادي,3,مجمع فلل,7.0,بنك الإمارات دبي الوطني (ش.م.ع),ACTIVE,فعال,68.0,31-08-2024,,"808 تاون هاوس,أرضي + أول + سطح,أرضي + أول + ثا...","808 Townhouses,(G+1+R) & (G+2+R)",1100244177,435.0,الحبيه الرابعة,Al Hebiah Fourth,تلال الغاف,TILAL AL GHAF,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),274,0,808,0
862,447598953,2490,مها تاون هاوس,41647020.0,1162.0,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,41647020,1162,نشاما للعقارات لمالكها نشمي ديفلوبمنت شركة الش...,01-03-2023,31-08-2025,1,عادي,3,مجمع فلل,19.0,بنك أبوظبي الأول ش.م.ع.,ACTIVE,فعال,60.0,31-08-2025,,,"500 X G+1 Townhouse Townhouses, The project de...",1100261293,507.0,اليلايس 2,Al Yelayiss 2,تاون سكوير,TOWN SQUARE,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),61,0,500,0
978,13583892,1545,سفير تاور 2,12940965.0,1014.0,الخليج التجاري (ش.ذ.م.م),898,898,الخليج التجاري (ش.ذ.م.م),01-03-2014,29-08-2017,1,عادي,1,مباني,21.0,نور بنك (مساهمة عامة),FINISHED,منجز,100.0,29-08-2017,,مبنى سكني مكون من 2 سرداب + ارضي + بوديوم + 1...,Residential building consists of 2B + G + P + ...,1100205345,526.0,الخليج التجارى,Business Bay,الخليج التجاري,Business Bay,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,185
1159,574636839,2913,ذا بالم كراون,103.0,103.0,شركة نخيل (ش.م.خ),100,100,شركة نخيل (ش.م.خ),30-04-2024,,1,عادي,2,فلل,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),PENDING,قيد التسجيل,0.0,,,تطوير 38 فيلا فاخرة بما في ذلك تنفيذ اعمال الب...,Development of 38 luxury villa and infrastruct...,1100290799,410.0,نخلة جميرا,Palm Jumeirah,نخلة جميرا,Palm Jumeirah,4,تراخيص,Trakhees,0,0,0,0
1162,576343966,2922,كيتوراه ريزيرف,142291327.0,1216.0,مجموعة ميدان (ش.ذ.م.م),53873272,1181,مجموعة ميدان (ش.ذ.م.م),01-12-2022,,1,عادي,3,مجمع فلل,5.0,بنك ابوظبى التجارى,PENDING,قيد التسجيل,2.0,,,يتكون من 93 تاون هاوس 2 مبنى سكني أرضي + 14 ع...,comprises of 93 townhouse 2 Nos . G+14 residen...,1100250724,482.0,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,حدائق الشيخ محمد بن راشد - ديستركت 7,HADAEQ SHEIKH MOHAMMED BIN RASHID - DISRICT 7,1,بلدية دبي,Dubai Municipality,0,0,0,0


In [180]:
projects[projects['project_number'] == 948]

Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
40,948,948,مارينا أركيد,542.0,542.0,اعمار العقارية (ش . م. ع),555,555,اعمار العقارية (ش . م. ع),01-02-2010,26-03-2017,1,عادي,1,مباني,2.0,مصرف الامارات الاسلامي مساهمة عامة,FINISHED,منجز,100.0,26-03-2017,,مشروع مختلط للتملك الحر يقع في منطقة مرسى دبي. المشروع عبارة عن برج سكني يتالف من 2 سرداب+طابق ارضي+ 47 طابق علوي بهيكل خرساني مع انهاءاته الخارجية و الداخلية ويتضمن محلات تجارية و نادي صحي و حوض سباحة,"Mixed use development located in the Dubai Marina area. The project comprises a tower of 2B+G+47 Typical Floors to be constructed out of concrete with internal & external finishes and to includes retail shops, health club and swimming pool.",1100096366,330.0,مرسى دبي,Marsa Dubai,دبي مارينا,Dubai Marina,1,بلدية دبي,Dubai Municipality,0,1,0,660


In [178]:
# Setting print option to display full text
pd.set_option('display.max_colwidth', None)

In [183]:
projects[projects['project_number'] == 948]

Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
40,948,948,مارينا أركيد,542.0,542.0,اعمار العقارية (ش . م. ع),555,555,اعمار العقارية (ش . م. ع),01-02-2010,26-03-2017,1,عادي,1,مباني,2.0,مصرف الامارات الاسلامي مساهمة عامة,FINISHED,منجز,100.0,26-03-2017,,مشروع مختلط للتملك الحر يقع في منطقة مرسى دبي. المشروع عبارة عن برج سكني يتالف من 2 سرداب+طابق ارضي+ 47 طابق علوي بهيكل خرساني مع انهاءاته الخارجية و الداخلية ويتضمن محلات تجارية و نادي صحي و حوض سباحة,"Mixed use development located in the Dubai Marina area. The project comprises a tower of 2B+G+47 Typical Floors to be constructed out of concrete with internal & external finishes and to includes retail shops, health club and swimming pool.",1100096366,330.0,مرسى دبي,Marsa Dubai,دبي مارينا,Dubai Marina,1,بلدية دبي,Dubai Municipality,0,1,0,660


In [182]:
projects[projects['project_number'] == 1878]

Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
129,54058440,1878,دبي وورف - برج 1,14241293.0,1031.0,دبي للعقارات (ش.ذ.م.م),850,850,دبي للعقارات (ش.ذ.م.م),30-10-2014,30-03-2017,1,عادي,1,مباني,33.0,بنك المشرق (شركة مساهمة عامة),FINISHED,منجز,100.0,30-03-2017,,برج سكني مكون من عدد 128 وحدة سكنية يقع في منطقة الجداف ويتكون من 3 سراديب ومنصة (مركز تسوق) كلها مشتركة مع 3 أبراج (3سراديب+منصة+ طابق ارضي+7 طوابق) والبناء مصمم من هيكل خرساني مع التشطيبات الداخلية والخارجية,"Residential Tower of 128 units located in Aljadaf area, shared with 3 towers in a Retail podium and 3 levels of abasement(3B+G+P+7),the tower designed of concrete structure with internal and external finishes.,",1100138578,334.0,الجداف,Al Jadaf,قرية الثقافة,Culture Village,2,سلطة دبي للتطوير,Dubai Development Authority (DDA),0,1,0,128


In [184]:
projects[projects['project_number'] == 3114]

Unnamed: 0,project_id,project_number,project_name,developer_id,developer_number,developer_name,master_developer_id,master_developer_number,master_developer_name,project_start_date,project_end_date,project_type_id,project_type_ar,project_classification_id,project_classification_ar,escrow_agent_id,escrow_agent_name,project_status,project_status_ar,percent_completed,completion_date,cancellation_date,project_description_ar,project_description_en,property_id,area_id,area_name_ar,area_name_en,master_project_ar,master_project_en,zoning_authority_id,zoning_authority_ar,zoning_authority_en,no_of_lands,no_of_buildings,no_of_villas,no_of_units
159,618474655,3114,ميزون الإنجليزية,537363812.0,1712.0,شركة نخيل (ش.م.خ),100,100,شركة نخيل (ش.م.خ),01-08-2024,,1,عادي,1,مباني,1.0,بنك دبي الاسلامي (شركة مساهمة عامة),PENDING,قيد التسجيل,0.0,,,مبنى سكني وتجاري 2 ق + أرضي + 2 منصة + 20 دور + روف,Residential and Commercial Building 2 B+G+2Podium+20 Floor+Roof,1100176487,445.0,جبل علي الأولى,Jabal Ali First,الفرجان,Al Furjan,4,تراخيص,Trakhees,0,0,0,0


In [171]:
projects['project_number'].unique()

array([ 139,  146,  159,   41,   46,  221,  227,  461, 2045, 2549, 1995,
       2824,  331,  334,  336,  401,  705,  257,  320,  323,  327,  510,
       2223,  632,  287,  295,  297, 2239,  506,   99, 1913,  982,  610,
        687, 2519,  720,  637,  641, 2850, 2609,  948, 1042, 1067, 2678,
       1303, 1136, 1261, 1182, 1185, 1190, 1196, 1199, 1130, 1214, 1204,
        437,  523,  556, 1986,  356,  617,  657, 1892, 1893, 1353, 1354,
       1346, 1644, 1317, 1319, 1377, 1810, 1813, 1897, 1657, 1597, 1341,
       1899, 1987, 1554, 2236, 2216, 1395, 1773, 1575, 1806, 1549, 1673,
       1523, 1367, 1535, 1088, 1089, 1428, 1429, 1444, 1452, 1474, 1676,
       1776, 1335, 1393, 1691, 1692, 1521, 1668, 1717, 2277, 1403, 2233,
       2234,  450, 1408, 1769, 1615, 1858, 2589, 1642, 1766, 1638, 1649,
       1681, 1698, 1713, 1569, 1977, 1370, 1643, 1357, 1878, 1768, 1940,
       2794, 1958, 2220, 2059, 1832, 1746, 1924, 1983, 1804, 1876, 1956,
       1999, 1973, 1927, 1961, 1780, 1880, 2000, 17

In [175]:
projects_name_english = {
    139 : "The Diamond", 
    146 : "Niloofar",
    159 : "Prime Tower",
    41 : "51@Business Bay",
    46 : "Burlington Tower",
    221 : "Princess Tower",
    227 : "Elite Residence",
    461 : "Saba Tower 1",
    2045 : "V2",
    2549 : "Mudon Al Ranim 4",
    1995 : "Azizi Riviera 8",
    2824 : "Violet Tower",
    331 : "Elite 3 Sports Residence",
    334 : "Elite 5 Sports Residence",
    336 : "Hera Tower",
    401 : "Lotus Park",
    705 : "Noora Residence 1",
    257 : "Torch Tower",
    320 : "Gallery Villas",
    323 : "Canal Residence West (Phase 1)",
    327 : "Artistic Heights",
    510 : "Eden Garden",
    2223 : "The Haven Residences",
    632 : "Tiffany Towers",
    287 : "Niki Lauda Tower",
    295 : "Syann Park 1",
    297 : "Botanica",
    2239 : "Sobha Hartland Waves",
    506 : "The Pearl",
    99 : "The Matrix",
    1913 : "J One",
    982 : "German Supreme Tower 2",
    610 : "Oasis High Park",
    687 : "Alshera Tower",
    2519 : "Fairway Villas",
    720 : "Park Corner",
    637 : "Knightsbridge Court",
    641 : "Dunes Village",
    2850 : "Grove",
    2609 : "Arabian Ranches 3 - Anya 2",
    948 : "Marina Arcade Tower",
    1042 : "Ritaj",
    1067 : "Uniestate Prime Tower",
    2678 : "Sunridge",
    1303 : "Golf Tower",
    1136 : "Olympic Park 3",
    1261 : "Axis Residences 7",
    1182 : "Queue Point Liwan - Plot R017",
    1185 : "Queue Point Liwan - Plot R002",
    1190 : 'Queue Point Liwan - Plot R009',
    1196 : "Queue Point Liwan - Plot R051",
    1199 : "Queue Point Liwan - Plot R053",
    1130 : "Hds Business Centre",
    1214 : "Queue Point Liwan - Plot R096",
    1204 : "Queue Point Liwan - Plot R072",
    437 : "Lake View",
    523 : "Lake Central",
    556 : "Mars Residences",
    1986 : "Keturah Resort",
    356 : "Royal Amwaj",
    617 : "Olive Point",
    657 : "Sunrise Boulevard",
    1892 : "Green Diamond 1",
    1893 : "Haven Villas",
    1353 : "Damac Hills - Whitefield 2",
    1354 : "Damac Hills - Silver Springs",
    1346 : "The Hills",
    1644 : "Oceana Hotel and Apartments",
    1317 : "Palm Views - West",
    1319 : "Alma 2",
    1377 : "Arabian Ranches - Palma Community",
    1810 : "Azizi Farishta Residence",
    1813 : "Creek Gate",
    1897 : "Azizi Aura Residences",
    1657 : "Al Falak Residence",
    1597 : "The Dubai Creek Residences",
    1341 : "Millennium Estates",
    1899 : "Maple 3",
    1987 : "Beverly Residence",
    1554 : "Serenity Lakes 5",
    2236 : "Damac Hills - Bel Air",
    2216 : "Loci Residences",
    1395 : "Damac Hills - Richmond",
    1773 : "Act One | Act Two",
    1575 : "Western Residence - North - 25 Villas",
    1806 : "The Pulse Boulevard Apartments",
    1549 : "Damac Hills 2 - Claret",
    1673 : "Mag 5 Boulevard",
    1523 : "Damac Hills - Artesia",
    1367 : "Prime Villas",
    1535 : "Park Villas",
    1088 : "Mirabella 2",
    1089 : "Mirabella 4",
    1428 : "Diamond Views 3 - Villas B",
    1429 : "Diamond Views 3 - Villas A",
    1444 ; "Lincoln Park (West Side & Lincoln Park - B)",
    1452 : "Sunrise Boulevard 10",
    1474 : "Le Grand Chateau",
    1676 : "The One JBR",
    2234 : "La Rosa 4",
    450 : "Victory Heights",
    1408 : "Damac Hills - Longview",
    1769 : "Burj Sabah",
    1615 : "Glitz Residence 2",
    1858 : "Al Fouad",
    2589 : "Mar Casa",
    1642 : "Downtown Views",
    1766 ; 'Laya Residences',
    1638 : "Town Square Hayat",
    1649 : "Botanica",
    1681 : "Altia Residence",
    1698 : "W Residences Dubai - The Palm",
    1713 : "Belgravia",
    1569 : "Avenue Residence 2",
    1977 : "Al Helal Al Zahaby 2",
    1370 : "Avenue Residence 1",
    1643 : "Skyhills Residences",
    1357 : "Damac hills - Piccadilly Green",
    1878 : "Dubai Wharf",
    1768 : "Parkway Vistas",
    1940 : "Azizi Riviera 7",
    2794 : "Manhattan 1 By SD",
    1958 : "Azizi Riviera 1",
    2220 ; "La Rosa 3",
    2059 : "Kappa Acca 3",
    1832 : "Arabian Ranches 2 - Reem Community",
    1746 : "Aykon City 2",
    1924 : "Remraam - Al Ramth",
    1983 : "The Grand",
    1804 : "The Pulse Townhouses",
    1876 : "Park Heights 1",
    1956 : "Azizi Riviera 6",
    1999 : "MBR City, District One Phase 3, Residences 5",
    1973 : "Time 1",
    1927 : "Azizi Riviera 5",
    1961 : "Prime Views",
    1780 : "City Walk Building 23",
    1880 : "Urbana 3",
    2000 : "MBR City, District One Phase 3, Residences 3",
    1742 : "Palace Estates",
    1962 : "Seven Hotel and Apartments - The Palm",
    1847 : "The 50",
    1923 : "Dar Al Jawhara",
    2018 : "Reva Residences",
    2167 : "Sur La Mer",
    2087 : "Dolphin Tower",
    3106 : "Muraba Veil",
    3107 : "One By Binghatti",
    3114 : ""
}


ValueError: If using all scalar values, you must pass an index

In [13]:
# Checking the values in the property usage columns
print("Different values count in the 'property_usage_ar' column in the transactions dataset:")
print(transactions_5y['property_usage_ar'].value_counts(dropna=False))

print("\nDifferent values count in the 'property_usage_en' column in the transactions dataset:")
print(transactions_5y['property_usage_en'].value_counts(dropna=False))

Different values count in the 'property_usage_ar' column in the transactions dataset:
property_usage_ar
سكني                     563393
تجاري                     54578
ضيافة                     21860
أخرى                       8090
سكني / تجاري               3124
متعدد الاستخدامات          1849
صناعي                      1197
زراعي                       216
صناعي / تجاري               177
تخزين                        39
صناعي /  تجاري / سكني        28
Name: count, dtype: int64

Different values count in the 'property_usage_en' column in the transactions dataset:
property_usage_en
Residential                              563393
Commercial                                54578
Hospitality                               21860
Other                                      8090
Residential / Commercial                   3124
Multi-Use                                  1849
Industrial                                 1197
Agricultural                                216
Industrial / Commercial      

In [14]:
# Filtering transactions dataset to include only residential properties
transactions_residential_5y = transactions_5y[transactions_5y['property_usage_en'] == 'Residential']

# Comparing the shapes of the original and filtered transactions datasets
print("Transactions dataset shape before filtering:", transactions_5y.shape)
print("Transactions dataset shape after filtering:", transactions_residential_5y.shape)

Transactions dataset shape before filtering: (654551, 46)
Transactions dataset shape after filtering: (563393, 46)


In [15]:
# Displaying random observations from the rent contracts dataset
rent_contracts_5y.sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,property_usage_en,property_usage_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,area_id,area_name_ar,area_name_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en
5522064,CNT2112631468,1,جديد,New,2024-05-01,2025-04-30,22000,22000,1,1,0.0,2.0,وحدة,Unit,2.0,Office,مكتب,422.0,Office,مكتب,Commercial,تجاري,,,,,,319.0,راس الخور الصناعيه الثانيه,Ras Al Khor Industrial Second,21.0,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,2.0,جهة,Authority
8039155,CNT921965191,2,تجديد,Renew,2019-01-05,2020-01-04,142077,142077,13,1,0.0,5.0,وحدة افتراضية,Virtual Unit,4.0,Labor Camps,سكن عمال,12.0,Room in labor Camp,غرفه سكن عمال,Residential,سكني,,,,,,360.0,محيصنه الثانيه,Muhaisanah Second,,مطار دبي الدولي,Dubai International Airport,محطة مترو اتصالات,Etisalat Metro Station,سيتي سنتر مردف,City Centre Mirdif,2.0,جهة,Authority
5205011,CNT2105043944,1,جديد,New,2023-12-18,2024-12-17,92000,92000,1,1,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,Residential,سكني,52.0,ميفير ريزيدنس,MAYFAIR RESIDENCY,الخليج التجاري,Business Bay,526.0,الخليج التجارى,Business Bay,57.0,وسط مدينة دبي,Downtown Dubai,محطة مترو الخليج التجاري,Business Bay Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
2183224,CNT1122506016,1,جديد,New,2019-11-16,2020-11-15,23000,23000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,11.0,Studio,أستوديو,Residential,سكني,,,,,,367.0,السوق الكبير,Al Suq Al Kabeer,16.0,برج خليفة,Burj Khalifa,محطة مترو الغبيبة,Al Ghubaiba Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
3192344,CNT1533355137,1,جديد,New,2021-08-05,2022-08-04,23000,23000,1,1,0.0,2.0,وحدة,Unit,2.0,Office,مكتب,422.0,Office,مكتب,Commercial,تجاري,,,,,,237.0,السبخه,Al Sabkha,27.0,مطار دبي الدولي,Dubai International Airport,محطات مترو نخلة ديرة,Palm Deira Metro Stations,مول دبي,Dubai Mall,1.0,شخص,Person


In [16]:
# Checking the values in the property usage columns
print("Different values count in the 'property_usage_ar' column in the transactions dataset:")
print(rent_contracts_5y['property_usage_ar'].value_counts(dropna=False))

print("\nDifferent values count in the 'property_usage_en' column in the transactions dataset:")
print(rent_contracts_5y['property_usage_en'].value_counts(dropna=False))

Different values count in the 'property_usage_ar' column in the transactions dataset:
property_usage_ar
سكني                     3474488
تجاري                    1306759
صناعي                      26777
صناعي / تجاري              11169
متعدد الاستخدامات           5375
NaN                         4863
صناعي /  تجاري / سكني       2923
تخزين                       2205
منشأه سياحيه                1752
منشأه تعليميه                783
منشأه صحيه                   689
سكني / تجاري                 260
زراعة                         76
Name: count, dtype: int64

Different values count in the 'property_usage_en' column in the transactions dataset:
property_usage_en
Residential                              3474488
Commercial                               1306759
Industrial                                 26777
Industrial / Commercial                    11169
Multi Usage                                 5375
NaN                                         4863
Industrial / Commercial / Residential     

In [17]:
# Filtering rent contracts dataset to include only residential properties
rent_contracts_residential_5y = rent_contracts_5y[rent_contracts_5y['property_usage_en'] == 'Residential']

# Comparing the shapes of the original and filtered transactions datasets
print("Transactions dataset shape before filtering:", rent_contracts_5y.shape)
print("Transactions dataset shape after filtering:", rent_contracts_residential_5y.shape)

Transactions dataset shape before filtering: (4838119, 40)
Transactions dataset shape after filtering: (3474488, 40)


In [18]:
# Dropping property usage columns from both datasets
transactions_residential_5y = transactions_residential_5y.drop(columns=['property_usage_ar', 'property_usage_en'])
rent_contracts_residential_5y = rent_contracts_residential_5y.drop(columns=['property_usage_ar', 'property_usage_en'])

# Confirming shapes of both datasets
print("Transactions dataset shape after dropping columns:", transactions_residential_5y.shape)
print("Rent contracts dataset shape after dropping columns:", rent_contracts_residential_5y.shape)

Transactions dataset shape after dropping columns: (563393, 44)
Rent contracts dataset shape after dropping columns: (3474488, 38)


After filtering both the transactions and rent contracts datasets to focus exclusively on residential properties, the next step is to assess the quality of the data. To do this, I will calculate the percentage of missing (null) values across all relevant columns in both datasets. This will help identify which columns may need further cleaning, imputation, or removal.

In [19]:
# Checking percentages of missing values in transactions and rent contracts datasets
print("Transactions dataset missing data percentage:")
print(transactions_residential_5y.isnull().sum() / transactions_residential_5y.shape[0] * 100)

print("\nRent contracts dataset missing data percentage:")
print(rent_contracts_residential_5y.isnull().sum() / rent_contracts_residential_5y.shape[0] * 100)

Transactions dataset missing data percentage:
transaction_id           0.000000
procedure_id             0.000000
trans_group_id           0.000000
trans_group_ar           0.000000
trans_group_en           0.000000
procedure_name_ar        0.000000
procedure_name_en        0.000000
instance_date            0.000000
property_type_id         0.000000
property_type_ar         0.000000
property_type_en         0.000000
property_sub_type_id    12.400935
property_sub_type_ar    12.400935
property_sub_type_en    12.400935
reg_type_id              0.000000
reg_type_ar              0.000000
reg_type_en              0.000000
area_id                  0.000000
area_name_ar             0.000000
area_name_en             0.000000
building_name_ar        25.322466
building_name_en        25.247030
project_number          18.310842
project_name_ar         18.310842
project_name_en         18.310842
master_project_en       21.275202
master_project_ar       21.275202
nearest_landmark_ar     19.877776
ne

In [20]:
# Displaying random observations from both dataset
print("Transactions dataset sample:")
display(transactions_residential_5y.sample(5))

print("\nRent contracts dataset sample:")
display(rent_contracts_residential_5y.sample(5))

Transactions dataset sample:


Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1045953,1-102-2022-8921,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2022-04-04,4,فيلا,Villa,4.0,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,,,2258.0,المرابع العربية ااا - بليس,Arabain Ranches lll - Bliss,,,,,,,,,ثلاث غرف,3 B/R,0,128.7,1962888.0,15251.66,,,1.0,1.0,0.0
90102,1-11-2024-7482,11,1,مبايعات,Sales,بيع,Sell,2024-02-29,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,485,معيصم الأول,Me'Aisem First,ليك سايد سي,LAKESIDE C,436.0,ليكسايد,LAKESIDE,International Media Production Zone,المنطقة العالمية للإنتاج الإعلامي,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,استوديو,Studio,1,33.93,350000.0,10315.36,,,1.0,1.0,0.0
477607,1-11-2022-16059,11,1,مبايعات,Sales,بيع,Sell,2022-07-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,435,الحبيه الرابعة,Al Hebiah Fourth,برمودا فيوز,BERMUDA VIEWS,616.0,برمودا فيوز,BERMUDA VIEWS,Dubai Sports City,مدينة دبي الرياضية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,استوديو,Studio,1,49.51,290000.0,5857.4,,,2.0,1.0,0.0
942364,1-102-2022-38800,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2022-11-22,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,شوبا هارتلاند ? ذا كرست تاور اي,Sobha Hartland - The Crest Tower A,2447.0,شوبا هارتلاند - ذا كرست,Sobha Hartland - The Crest,SOBHA HARTLAND,شوبها هارتلاند,,,,,,,ثلاث غرف,3 B/R,1,145.81,3021268.0,20720.58,,,1.0,1.0,0.0
1227478,1-102-2020-17035,102,1,مبايعات,Sales,بيع - تسجيل مبدئى,Sell - Pre registration,2020-12-31,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,334,الجداف,Al Jadaf,بنغاتي أفنيو,Binghatti Avenue,2185.0,بن غاطي افينو,Binghatti Avenue,Dubai Health Care City Phase 2,مدينة دبي الطبية المرحلة الثانية,وسط مدينة دبي,Downtown Dubai,محطة مترو مدينة الرعاية الصحية,Healthcare City Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,122.59,843588.0,6881.38,,,1.0,1.0,0.0



Rent contracts dataset sample:


Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,area_id,area_name_ar,area_name_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en
4609761,CNT2079582294,1,جديد,New,2023-04-01,2024-03-31,120000,120000,10,9,0.0,2.0,وحدة,Unit,4.0,Labor Camps,سكن عمال,12.0,Room in labor Camp,غرفه سكن عمال,,,,,,347.0,القوز الثالثه,Al Goze Third,15.0,وسط مدينة دبي,Downtown Dubai,محطة مترو نور بنك,Noor Bank Metro Station,مول الإمارات,Mall of the Emirates,2.0,جهة,Authority
4190820,CNT2012123391,2,تجديد,Renew,2022-04-04,2023-04-03,30000,30000,1,1,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,1251.0,المحور 5 ريزيدنس,AXIS RESIDENCES 5,واحة السيليكون,Silicon Oasis,484.0,ند حصة,Nadd Hessa,78.0,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,1.0,شخص,Person
5034187,CNT2098171897,2,تجديد,Renew,2023-09-20,2024-09-19,50000,50000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,,,,,,325.0,ام هرير الاولى,Um Hurair First,111.0,مطار دبي الدولي,Dubai International Airport,محطة مترو برجمان,Burjuman Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
159553,CRT1234536436,2,تجديد,Renew,2020-06-24,2021-06-23,21000,21000,1,1,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,11.0,Studio,أستوديو,436.0,ليكسايد,LAKESIDE,المنطقة العالمية للإنتاج الإعلامي,International Media Production Zone,485.0,معيصم الأول,Me'Aisem First,34.0,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,1.0,شخص,Person
2673291,CNT1307715668,1,جديد,New,2020-05-01,2021-04-30,1046707,1046707,115,20,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,,,,مجمع دبي للاستثمار الثاني,Dubai Investment Park Second,459.0,مجمع دبي للاستثمار الثاني,Dubai Investment Park Second,12.0,موقع إكسبو 2020,Expo 2020 Site,,,,,2.0,جهة,Authority


Upon inspecting the missing values in the transactions dataset, I noticed a significant percentage of missing data in the `rent_value` and `meter_rent_price` columns. Initially, the plan was to drop these columns due to their high number of missing values. However, before proceeding, I decided to investigate further to understand the context behind these missing values. 

In [21]:
# Displaying random observations from transactions dataset where rent_value is not null
transactions_residential_5y[transactions_residential_5y['rent_value'].notnull()].sample(5)

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
332793,2-110-2021-395,110,2,رهون,Mortgages,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2021-12-07,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,435,الحبيه الرابعة,Al Hebiah Fourth,جيوفاني بوتيك سويتس,Giovanni Boutique Suites,531.0,جيوفاني بوتيك,GIOVANNI BOUTIQUE,Dubai Sports City,مدينة دبي الرياضية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,استوديو,Studio,1,35.36,216666.0,6127.43,216666.0,6127.43,2.0,2.0,2.0
139264,2-110-2022-232,110,2,رهون,Mortgages,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2022-06-24,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,داماك هيلز - جولف فيستا - تاور أي,DAMAC HILLS - GOLF VISTA - TOWER A,1362.0,داماك هيلز - جولف فيستا,DAMAC HILLS - GOLF VISTA,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,غرفتين,2 B/R,1,140.33,1017600.0,7251.48,1017600.0,7251.48,2.0,2.0,2.0
216100,2-110-2020-80,110,2,رهون,Mortgages,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2020-03-25,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,312,محيصنه الاولى,Muhaisanah First,قمر 11,Qamar 11,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفة,1 B/R,1,98.38,769594.0,7822.67,769594.0,7822.67,2.0,2.0,2.0
643483,2-110-2020-145,110,2,رهون,Mortgages,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2020-07-21,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,ب د 29 بوليفارد مبنى رقم 2,BD 29 BLVD T2,1276.0,29 بوليفارد,29 BOULEVARD,Burj Khalifa,برج خليفة,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,74.24,2515994.0,33890.01,2515994.0,33890.01,2.0,0.0,0.0
588208,2-715-2021-18,715,2,رهون,Mortgages,تسجيل إيجاره تنتهى بالتملك على بيع مبدئى,Delayed Sell Lease to Own Registration,2021-08-02,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,335,ند الشبا الاولى,Nad Al Shiba First,,,1601.0,جاردن فيوز,GRAND VIEWS,,,وسط مدينة دبي,Downtown Dubai,محطة مترو الخليج التجاري,Business Bay Metro Station,مول دبي,Dubai Mall,,,0,512.07,4200000.0,8202.0,4200000.0,8202.0,2.0,2.0,2.0


In [22]:
# Checking if the transactions groups are different where rent_value is not null
transactions_residential_5y[transactions_residential_5y['rent_value'].notnull()]['trans_group_en'].value_counts()

trans_group_en
Mortgages    3306
Name: count, dtype: int64

While inspecting the `rent_value` and `meter_rent_price` columns, I discovered that these values appear in transactions classified as mortgage sales within the dataset. This clarified that the `rent_value` and `meter_rent_price` fields are likely capturing agreed-upon mortgage prices rather than actual rental data. These values represent mortgage agreements associated with specific transactions rather than typical rent prices, which explains why they are missing for most non-mortgage transactions.

In [23]:
# Inspecting the different transaction groups unique values and checking and its quality
print("Unique values count in 'trans_group_id' column is:")
print(transactions_residential_5y['trans_group_id'].value_counts(dropna=False))

print("\nUnique values count in 'trans_grouo_en' column is:")
print(transactions_residential_5y['trans_group_en'].value_counts(dropna=False))

print("\nUnique values count in 'trans_group_ar' column is:")
print(transactions_residential_5y['trans_group_ar'].value_counts(dropna=False))

Unique values count in 'trans_group_id' column is:
trans_group_id
1    434199
2    106943
3     22251
Name: count, dtype: int64

Unique values count in 'trans_grouo_en' column is:
trans_group_en
Sales        434199
Mortgages    106943
Gifts         22251
Name: count, dtype: int64

Unique values count in 'trans_group_ar' column is:
trans_group_ar
مبايعات    434199
رهون       106943
هبات        22251
Name: count, dtype: int64


To refine the dataset further, I examined the distribution of transaction types by inspecting the `trans_group_id`, `trans_group_en`, and `trans_group_ar` columns. This analysis provided a clear view of how transactions are categorized—whether they are **sales**, **mortgages**, or other transaction types like **gifts**. 

Based on the project’s objectives, I decided that filtering the dataset to focus solely on sales transactions would be the best course of action. By narrowing the dataset to include only sales, I can create more accurate forecasts and ensure the key financial metrics, such as price predictions and rental yield estimations, are more reliable for investors.

In [24]:
transactions_residential_5y[transactions_residential_5y['trans_group_en'] == 'Gifts'].sample(10)

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1155583,3-9-2022-4013,9,3,هبات,Gifts,هبه,Grant,2022-12-07,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,368,البرشاء الاولى,Al Barsha First,المراد تاورز,AL MURAD TOWERS,,,,,,برج العرب,Burj Al Arab,محطة مترو شرف دي جي,Sharaf Dg Metro Station,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,53.35,592520.0,11106.29,,,1.0,1.0,0.0
1122272,3-120-2019-107,120,3,هبات,Gifts,هبه - تسجيل مبدئى,Grant Pre-Registration,2019-09-11,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,سيرنتي ليكس 5,Serenity Lakes 5,1554.0,بحيرات الصفاء 5,SERENITY LAKES 5,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,مدينة دبي للإنترنت,Dubai Internet City,مارينا مول,Marina Mall,استوديو,Studio,1,35.59,268160.0,7534.72,,,1.0,1.0,0.0
606782,3-9-2024-5813,9,3,هبات,Gifts,هبه,Grant,2024-10-10,1,أرض,Land,,,,1,العقارات القائمة,Existing Properties,317,جميرا الاولى,Jumeirah First,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,0,958.65,16800005.0,17524.65,,,1.0,1.0,0.0
443066,3-9-2023-5070,9,3,هبات,Gifts,هبه,Grant,2023-10-11,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,364,الوصل,Al Wasl,ستي ووك ريزيدينشال بيلدنج 7,Citywalk Residential Building 7,,,,City Walk,ستي ووك,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,158.84,3195391.0,20117.04,,,1.0,1.0,0.0
33134,3-219-2022-136,219,3,هبات,Gifts,هبة على بيع مبدئى,Grant on Delayed Sell,2022-08-26,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,507,اليلايس 2,Al Yelayiss 2,بارك سايد,Parkside,1836.0,شقق روضة,RAWDA APARTMENTS,TOWN SQUARE,تاون سكوير,دورة دبي للدراجات,Dubai Cycling Course,,,,,غرفتين,2 B/R,1,85.11,814863.0,9574.24,,,1.0,1.0,0.0
1318919,3-9-2021-2109,9,3,هبات,Gifts,هبه,Grant,2021-08-31,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,507,اليلايس 2,Al Yelayiss 2,,,1823.0,نور تاون هاوس,NOOR TOWNHOUSES,TOWN SQUARE,تاون سكوير,,,,,,,ثلاث غرف,3 B/R,0,188.21,1235831.0,6566.24,,,1.0,2.0,0.0
473968,3-9-2023-3951,9,3,هبات,Gifts,هبه,Grant,2023-07-26,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,232,مردف,Mirdif,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,,,0,929.03,4700000.0,5059.04,,,1.0,1.0,0.0
1024893,3-59-2024-122,59,3,هبات,Gifts,هبة حق المنفعة,Grant Development,2024-08-09,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,484,ند حصة,Nadd Hessa,يونيفيرستي فيو بي,UNIVERSITY VIEW B,,,,Silicon Oasis,واحة السيليكون,,,,,,,غرفة,1 B/R,1,79.23,503902.0,6360.0,,,1.0,1.0,0.0
576041,3-219-2024-46,219,3,هبات,Gifts,هبة على بيع مبدئى,Grant on Delayed Sell,2024-02-14,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,435,الحبيه الرابعة,Al Hebiah Fourth,,,2205.0,إيلان,Elan,TILAL AL GHAF,تلال الغاف,,,,,,,أربع غرف,4 B/R,0,222.38,3194489.0,14365.0,,,1.0,1.0,0.0
265070,3-9-2023-5674,9,3,هبات,Gifts,هبه,Grant,2023-11-16,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,526,الخليج التجارى,Business Bay,(داماك تاورز باي باراماونت (ايه,DAMAC TOWERS BY PARAMOUNT (A),443.0,داماك تاورز من باراماونت,DAMAC TOWERS BY PARAMOUNT,Business Bay,الخليج التجاري,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,140.73,2268946.0,16122.69,,,1.0,1.0,0.0


In [25]:
# Filtering transactions dataset to "Sales" transactions only
transactions_residential_sales_5y = transactions_residential_5y[transactions_residential_5y['trans_group_en'] == 'Sales']

# Dropping transactions groups columns 
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=['trans_group_id', 'trans_group_en', 'trans_group_ar'])

# Comparing the shapes of the original and filtered transactions datasets
print("Transactions dataset shape before filtering:", transactions_residential_5y.shape)
print("Transactions dataset shape after filtering:", transactions_residential_sales_5y.shape)

Transactions dataset shape before filtering: (563393, 44)
Transactions dataset shape after filtering: (434199, 41)


In [26]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1189551,1-102-2022-18768,102,بيع - تسجيل مبدئى,Sell - Pre registration,2022-07-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,كريك بالاس,Creek Palace,2191.0,كريك بالاس,Creek Palace,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,ثلاث غرف,3 B/R,1,154.41,2919888.0,18909.97,,,1.0,1.0,0.0
1266663,1-102-2023-48324,102,بيع - تسجيل مبدئى,Sell - Pre registration,2023-09-18,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,إليتز 2 من دانوب - 1,Elitz 2 By Danube - 1,2693.0,إليتز 2 من دانوب,Elitz 2 By Danube,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,مدينة دبي للإنترنت,Dubai Internet City,مول الإمارات,Mall of the Emirates,استوديو,Studio,1,35.86,666000.0,18572.23,,,1.0,1.0,0.0
795112,1-11-2023-7483,11,بيع,Sell,2023-03-15,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,343,ورسان الاولى,Al Warsan First,,,1386.0,ورسان فيلج - D,WARSAN VILLAGE - D,International City Phase 1,المدينة العالمية - المرحلة الاولى,,,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,,,0,51.33,510000.0,9935.71,,,2.0,1.0,0.0
114138,1-102-2019-7234,102,بيع - تسجيل مبدئى,Sell - Pre registration,2019-05-14,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,عزيزي ريفييرا 33 \t,Azizi Riviera 33,2040.0,عزيزي ريفييرا 33,Azizi Riviera 33,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,48.39,759415.0,15693.64,,,1.0,1.0,0.0
803241,1-11-2024-20680,11,بيع,Sell,2024-06-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,447,الخيران الأولى,Al Khairan First,كريك ايدج تاور 1,CREEK EDGE Tower 1,2083.0,كريك ايدج,CREEK EDGE,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفة,1 B/R,1,65.66,1700000.0,25890.95,,,1.0,1.0,0.0


In [27]:
# Inspecting the procedure columns in the transactions dataset
print("Unique values count in 'procedure_id' column is:")
print(transactions_residential_sales_5y['procedure_id'].value_counts(dropna=False))

print("\nUnique values count in 'procedure_name_en' column is:")
print(transactions_residential_sales_5y['procedure_name_en'].value_counts(dropna=False))

print("\nUnique values count in 'procedure_name_ar' column is:")
print(transactions_residential_sales_5y['procedure_name_ar'].value_counts(dropna=False))

Unique values count in 'procedure_id' column is:
procedure_id
102    246579
11     136035
41      40333
45       4303
110      2870
851      1205
460      1158
133      1003
95        200
715       185
93        127
852        54
814        38
107        36
361        27
4          25
371        21
Name: count, dtype: int64

Unique values count in 'procedure_name_en' column is:
procedure_name_en
Sell - Pre registration                       246579
Sell                                          136035
Delayed Sell                                   40333
Sell Development                                4303
Lease to Own Registration                       2870
Development Registration Pre-Registration       1205
Sale On Payment Plan                            1158
Development Registration                        1003
Delayed Development                              200
Delayed Sell Lease to Own Registration           185
Delayed Sell Development                         127
Sell Development -

**Procedure Columns Insights**

1. **Common Sale Types**:

    - **Sell - Pre Registration** (procedure_id 102): This seems to be the most frequent type, likely referring to sales that occur before the property is fully registered.

    - **Sell** (procedure_id 11): This is likely a standard transaction.

    - **Delayed Sell** (procedure_id 41): Refers to sales where there may have been a delay.

2. **Special Sale Types**: Some of the less frequent procedure types (like “Lease to Own Registration” and “Sell On Payment Plan”) might be related to special agreements or sales conditions, which are quite different from regular sales.

In [28]:
# Inspecting random observations where procedure_name_en is "Sell - Pre registration"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Sell - Pre registration'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
940775,1-102-2021-10385,102,بيع - تسجيل مبدئى,Sell - Pre registration,2021-07-02,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,332,زعبيل الثانيه,Zaabeel Second,T1 II داون تاون فيوز,Downtown Views II T1,1849.0,II داون تاون فيوز,Downtown Views II,,,برج خليفة,Burj Khalifa,المركز المالي,Financial Centre,مول دبي,Dubai Mall,غرفتين,2 B/R,1,117.18,2630888.0,22451.68,,,1.0,1.0,0.0
832337,1-102-2023-27040,102,بيع - تسجيل مبدئى,Sell - Pre registration,2023-06-07,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,لاكشري فاميلي ريزيدنس,Luxury Family Residence,2353.0,لوكجري فاميلى ريزيدنس,Luxury Family Residence,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,مدينة دبي للإنترنت,Dubai Internet City,مول الإمارات,Mall of the Emirates,غرفتين,2 B/R,1,85.55,1394759.0,16303.45,,,1.0,1.0,0.0
968539,1-102-2019-21344,102,بيع - تسجيل مبدئى,Sell - Pre registration,2019-11-26,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,464,وادي الصفا 2,Wadi Al Safa 2,ويفز ريزيدنس,WAVEZ RESIDENCE,2175.0,ويفز رزيدنس,Wavez Residence,Liwan,ليوان,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,استوديو,Studio,1,37.32,375000.0,10048.23,,,1.0,2.0,0.0
282933,1-102-2024-83466,102,بيع - تسجيل مبدئى,Sell - Pre registration,2024-10-09,4,فيلا,Villa,4.0,فيلا,Villa,0,على الخارطة,Off-Plan Properties,459,مجمع دبي للاستثمار الثاني,Dubai Investment Park Second,,,3171.0,داماك ريفرسايد - سيج,DAMAC RIVERSIDE - SAGE,,,,,,,,,أربع غرف,4 B/R,0,144.0,2698000.0,18736.11,,,1.0,1.0,0.0
114496,1-102-2023-19303,102,بيع - تسجيل مبدئى,Sell - Pre registration,2023-04-25,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,شاطئ الخور - سيدار المبنى رقم 3,Creek Beach - Cedar Building 3,2551.0,شاطئ الخور - سافانا-سيدار-مانجروف,Creek Beach - Savanna-Cedar-Mangrove,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفة,1 B/R,1,65.18,1598888.0,24530.35,,,1.0,1.0,0.0


In [29]:
# Checking the different "reg_type_en" values where procedure_name_en is "Sell - Pre registration"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Sell - Pre registration']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Off-Plan Properties    246579
Name: count, dtype: int64

In [30]:
# Inspecting random observations where procedure_name_en is "Sell"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Sell'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1307194,1-11-2023-36191,11,بيع,Sell,2023-11-08,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,فورته T2,Forte T2,1660.0,فورتي,FORTE,Burj Khalifa,برج خليفة,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,57.12,1950000.0,34138.66,,,1.0,1.0,0.0
568663,1-11-2022-29624,11,بيع,Sell,2022-11-24,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,كولكتيف تووير 1,Collective Tower 1,2047.0,كولكتيف,Collective,DUBAI HILLS,دبي هيليز,موتور سيتي,Motor City,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,غرفتين,2 B/R,1,68.84,1540000.0,22370.71,,,1.0,1.0,0.0
492056,1-11-2024-13050,11,بيع,Sell,2024-04-15,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,412,المركاض,Al Merkadh,شوبا هارتلاند ويفز,Sobha Hartland Waves,2239.0,شوبا هارتلاند ويفز,Sobha Hartland Waves,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,59.81,1200000.0,20063.53,,,1.0,1.0,0.0
71959,1-11-2021-19855,11,بيع,Sell,2021-11-09,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,485,معيصم الأول,Me'Aisem First,الاندلس مبنى سي,Al Andalus Building C,1846.0,الأندلس المرحلة الثانية,AL Andalus Phase 2,Jumeirah Golf Estates,جميرا غولف للعقارات,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,أربع غرف,4 B/R,1,268.19,2500000.0,9321.75,,,1.0,1.0,0.0
1023472,1-11-2024-19910,11,بيع,Sell,2024-06-04,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,447,الخيران الأولى,Al Khairan First,كريك هوريزون تاور 1,CREEK HORIZON TOWER 1,1772.0,كريك هورايزون,CREEK HORIZON,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفتين,2 B/R,1,104.82,2550000.0,24327.42,,,1.0,1.0,0.0


In [31]:
# Checking the different "reg_type_en" values where procedure_name_en is "Sell"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Sell']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Existing Properties    136035
Name: count, dtype: int64

In [32]:
# Inspecting random observations where procedure_name_en is "Delayed Sell"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Delayed Sell'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
801440,1-41-2020-2364,41,بيع مبدئى,Delayed Sell,2020-04-16,1,أرض,Land,,,,1,العقارات القائمة,Existing Properties,531,الحبيه السادسة,Al Hebiah Sixth,,,1877.0,أرابيلا 3,Arabella 3,Mudon,مدن,دورة دبي للدراجات,Dubai Cycling Course,,,,,,,0,273.0,2104000.0,7706.96,,,1.0,1.0,0.0
1217271,1-41-2021-10820,41,بيع مبدئى,Delayed Sell,2021-10-14,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,447,الخيران الأولى,Al Khairan First,كريك سايد 18 بي,Creekside 18 B,1663.0,كريك سايد 18,CREEKSIDE 18,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفتين,2 B/R,1,125.52,2227888.0,17749.27,,,1.0,1.0,0.0
1002318,1-41-2020-7522,41,بيع مبدئى,Delayed Sell,2020-12-17,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,,,1645.0,مابل,MAPLE,DUBAI HILLS - MAPLE 1,دبي هيليز - مابل 1,موتور سيتي,Motor City,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,خمس غرف,5 B/R,0,278.93,2750000.0,9859.1,,,1.0,1.0,0.0
84529,1-41-2023-2353,41,بيع مبدئى,Delayed Sell,2023-01-31,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,مارينا اركيد,MARINA ARCADE,948.0,مارينا أركيد,MARINA ARCADE,Dubai Marina,دبي مارينا,برج العرب,Burj Al Arab,مينا السياحي,Mina Seyahi,مارينا مول,Marina Mall,غرفتين,2 B/R,1,112.87,1906000.0,16886.68,,,1.0,1.0,0.0
141509,1-41-2020-655,41,بيع مبدئى,Delayed Sell,2020-02-11,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,ملبيري آت بارك هايتس مبنى A1,MULBERRY at PARK HEIGHTS Building A1,1489.0,مولبيري في بارك هايتس,MULBERRY at PARK HEIGHTS,DUBAI HILLS - PARK,دبي هيليز - بارك,موتور سيتي,Motor City,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,ثلاث غرف,3 B/R,1,182.02,2778133.0,15262.79,,,1.0,1.0,0.0


In [33]:
# Checking the different "reg_type_en" values where procedure_name_en is "Delayed Sell"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Delayed Sell']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Existing Properties    40333
Name: count, dtype: int64

In [34]:
# Inspecting random observations where procedure_name_en is "Sell Development"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Sell Development'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
881940,1-45-2024-519,45,بيع حق منفعة,Sell Development,2024-04-25,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,484,ند حصة,Nadd Hessa,بن غاطى ابارتمنتس,BINGHATTI APARTMENTS,1630.0,بن غاطي ابارتمنتس,BINGHATTI APARTMENTS,Silicon Oasis,واحة السيليكون,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,استوديو,Studio,1,40.9,380000.0,9290.95,,,1.0,1.0,0.0
74912,1-45-2023-1486,45,بيع حق منفعة,Sell Development,2023-12-18,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,484,ند حصة,Nadd Hessa,اويسيس ستار,OASIS STAR,808.0,نجم الواحة,OASIS STAR,Silicon Oasis,واحة السيليكون,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفة,1 B/R,1,40.74,260000.0,6381.93,,,1.0,1.0,0.0
591745,1-45-2023-526,45,بيع حق منفعة,Sell Development,2023-05-18,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,484,ند حصة,Nadd Hessa,بن غاطى ابارتمنتس,BINGHATTI APARTMENTS,1630.0,بن غاطي ابارتمنتس,BINGHATTI APARTMENTS,Silicon Oasis,واحة السيليكون,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,استوديو,Studio,1,42.55,345000.0,8108.11,,,1.0,1.0,0.0
306609,1-45-2024-26,45,بيع حق منفعة,Sell Development,2024-01-08,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,459,مجمع دبي للاستثمار الثاني,Dubai Investment Park Second,CENTURION RESIDENCE - TOWER 1,CENTURION RESIDENCE - TOWER 1,1626.0,سنتوريون ريزدينس,CENTURION RESIDENCE,Dubai Investment Park Second,مجمع دبي للاستثمار الثاني,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,,,,,غرفتين,2 B/R,1,134.76,750000.0,5565.45,,,1.0,1.0,0.0
691929,1-45-2021-674,45,بيع حق منفعة,Sell Development,2021-09-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,458,مجمع دبي للاستثمار الاول,Dubai Investment Park First,جرين كوميونيتي غرب الثالث,GREEN COMMUNITY WEST - EXTENSION - PHASE III,1667.0,المجتمع الأخضر الغرب - التوسع - المرحلة الثالثة,GREEN COMMUNITY WEST- EXTENSION- PHASE III,Dubai Investment Park First,مجمع دبي للاستثمار الاول,موقع إكسبو 2020,Expo 2020 Site,محطة مترو الدانوب,DANUBE Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,غرفتين,2 B/R,1,265.04,1953131.0,7369.19,,,1.0,1.0,0.0


In [35]:
# Checking the different "reg_type_en" values where procedure_name_en is "Sell Development"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Sell Development']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Existing Properties    4303
Name: count, dtype: int64

In [36]:
# Inspecting random observations where procedure_name_en is "Lease to Own Registration"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Lease to Own Registration'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
653690,1-110-2024-667,110,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2024-09-02,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,داماك هيلز - جولف تيراس (أ),DAMAC HILLS - GOLF TERRACE (A),1365.0,داماك هيلز - جولف تيراس,DAMAC HILLS - GOLF TERRACE,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,ثلاث غرف,3 B/R,1,339.01,3300000.0,9734.23,,,2.0,2.0,2.0
1001245,1-110-2020-94,110,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2020-05-05,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,,,,Al Furjan,الفرجان,موقع إكسبو 2020,Expo 2020 Site,محطة مترو ابن بطوطة,Ibn Battuta Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,,,0,609.14,3000000.0,4924.98,,,2.0,2.0,2.0
840240,1-110-2023-504,110,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2023-10-20,1,أرض,Land,,,,1,العقارات القائمة,Existing Properties,531,الحبيه السادسة,Al Hebiah Sixth,,,1877.0,أرابيلا 3,Arabella 3,Mudon,مدن,دورة دبي للدراجات,Dubai Cycling Course,,,,,,,0,182.0,2200000.0,12087.91,,,2.0,2.0,2.0
138754,1-110-2021-9,110,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2021-01-14,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,سي 32,C32,,,,Al Waha Villas,فلل الواحة,دورة دبي للدراجات,Dubai Cycling Course,,,,,غرفتين,2 B/R,0,169.46,750000.0,4425.82,,,2.0,2.0,2.0
1285338,1-110-2019-476,110,تسجيل إيجارة تنتهى بالتملك,Lease to Own Registration,2019-12-19,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,352,الثنيه الرابعة,Al Thanayah Fourth,,,1047.0,روعة الإمارات ? الينابيع 10,Emirates Living - Springs 10,Springs - 2,الينابيع - 2,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,ثلاث غرف,3 B/R,0,342.52,2420000.0,7065.28,,,4.0,2.0,2.0


In [37]:
# Checking the different "reg_type_en" values where procedure_name_en is "Lease to Own Registration"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Lease to Own Registration']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Existing Properties    2870
Name: count, dtype: int64

In [38]:
# Inspecting random observations where procedure_name_en is "Sale On Payment Plan"
transactions_residential_sales_5y[transactions_residential_sales_5y['procedure_name_en'] == 'Sale On Payment Plan'].sample(5)

Unnamed: 0,transaction_id,procedure_id,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
618454,1-460-2023-438,460,بيع مقيد بسنوات الدفع,Sale On Payment Plan,2023-12-14,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,سكاي كورتس D,Skycourts Tower D,794.0,سكاي كورتس,SKY COURTS,Residential Complex,ريزيدينتشل كموبليكس,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفتين,2 B/R,1,119.53,836329.0,6996.81,,,1.0,1.0,0.0
1115676,1-460-2023-445,460,بيع مقيد بسنوات الدفع,Sale On Payment Plan,2023-12-26,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,سكاي كورتس C,Skycourts Tower C,794.0,سكاي كورتس,SKY COURTS,Residential Complex,ريزيدينتشل كموبليكس,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفتين,2 B/R,1,119.53,836329.0,6996.81,,,1.0,1.0,0.0
224866,1-460-2024-334,460,بيع مقيد بسنوات الدفع,Sale On Payment Plan,2024-10-14,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,سكاي كورتس D,Skycourts Tower D,794.0,سكاي كورتس,SKY COURTS,Residential Complex,ريزيدينتشل كموبليكس,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفتين,2 B/R,1,196.79,1376836.0,6996.48,,,1.0,1.0,0.0
121961,1-460-2020-164,460,بيع مقيد بسنوات الدفع,Sale On Payment Plan,2020-10-28,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,سكاي كورتس B,Skycourts Tower B,794.0,سكاي كورتس,SKY COURTS,Residential Complex,ريزيدينتشل كموبليكس,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفة,1 B/R,1,78.29,844925.0,10792.26,,,1.0,1.0,0.0
202571,1-460-2022-52,460,بيع مقيد بسنوات الدفع,Sale On Payment Plan,2022-09-06,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,232,مردف,Mirdif,نسايم افنيو 3,NASAYEM AVENUE 3,1834.0,تلال مردف - نسايم افنيو,MIRDIF HILLS- NASAYEM AVENUE,,,مطار دبي الدولي,Dubai International Airport,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,أربع غرف,4 B/R,1,300.57,3568874.0,11873.69,,,1.0,2.0,0.0


In [39]:
# Checking the different "reg_type_en" values where procedure_name_en is "Sale On Payment Plan"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_name_en'] == 'Sale On Payment Plan']['reg_type_en'].value_counts(dropna=False)

reg_type_en
Existing Properties    1158
Name: count, dtype: int64

After inspecting the different procedure types in the transactions dataset, I found that the “Sell Pre-Registration” procedure is exclusively associated with off-plan properties, while the “Sell” procedure pertains to existing properties. Other procedures, such as “Delayed Sell,” “Lease-to-Own Registration,” and “Sell on Payment Plan,” also relate to existing properties but involve specific terms.

To streamline the dataset and maintain focus on key predictors of property prices, I decided to drop the procedure-related columns (`procedure_id`, `procedure_name_en`, and `procedure_name_ar`). Rather than filtering on specific transaction subtypes like **“Sell”** or **“Sell Pre-Registration”**, this approach removes unnecessary details that don’t significantly impact property valuation. This decision allows for a cleaner, more concise dataset, better suited for accurate modeling and forecasting.

In [40]:
# Shape of the dataset before dropping columns
print("Transactions dataset shape before dropping columns:", transactions_residential_sales_5y.shape)

# Dropping the procedure-related columns
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=['procedure_id', 'procedure_name_en', 'procedure_name_ar'])

# Shape of the dataset after dropping columns
print("Transactions dataset shape after dropping columns:", transactions_residential_sales_5y.shape)



Transactions dataset shape before dropping columns: (434199, 41)
Transactions dataset shape after dropping columns: (434199, 38)


In [41]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
24814,1-11-2023-8598,2023-03-23,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,ار بيه هايتس,RP HEIGHTS,1605.0,أر بي هايتس,RP HEIGHTS,Burj Khalifa,برج خليفة,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,85.43,1472000.0,17230.48,,,1.0,1.0,0.0
1288246,1-102-2024-83141,2024-10-08,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,بن غاطي فينيكس,Binghatti Phoenix,3009.0,بن غاطي فينيكس,Binghatti Phoenix,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,غرفة,1 B/R,1,71.06,900000.0,12665.35,,,1.0,1.0,0.0
93986,1-102-2021-7622,2021-05-31,4,فيلا,Villa,4.0,فيلا,Villa,0,على الخارطة,Off-Plan Properties,317,جميرا الاولى,Jumeirah First,,,2167.0,سور لامير,Sur La Mer,LA MER,لامير,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,خمس غرف,5 B/R,0,755.87,13000000.0,17198.72,,,1.0,1.0,0.0
960333,1-102-2022-24883,2022-08-29,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,شوبا هارتلاند ون بارك افينيو,Sobha Hartland One Park Avenue,2166.0,شوبا هارتلاند ون بارك افينيو,Sobha Hartland One Park Avenue,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,ثلاث غرف,3 B/R,1,135.57,2451590.0,18083.57,,,1.0,1.0,0.0
720461,1-102-2024-79792,2024-10-02,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,350,الثنيه الخامسة,Al Thanyah Fifth,فيردي باي شوبا,Verde by Sobha,2583.0,فيردي باي شوبا,Verde by Sobha,Jumeirah Lakes Towers,ابراج بحيرات الجميرا,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو مارينا مول,Marina Mall Metro Station,مارينا مول,Marina Mall,غرفة,1 B/R,1,96.27,2051775.0,21312.71,,,1.0,2.0,0.0


In [42]:
# Checking the different Property Types columns in the transactions dataset
print("Unique values count in 'property_type_id' column is:")
print(transactions_residential_sales_5y['property_type_id'].value_counts(dropna=False))

print("\nUnique values count in 'property_type_en' column is:")
print(transactions_residential_sales_5y['property_type_en'].value_counts(dropna=False))

print("\nUnique values count in 'property_type_ar' column is:")
print(transactions_residential_sales_5y['property_type_ar'].value_counts(dropna=False))

Unique values count in 'property_type_id' column is:
property_type_id
3    348871
4     71204
1     13429
2       695
Name: count, dtype: int64

Unique values count in 'property_type_en' column is:
property_type_en
Unit        348871
Villa        71204
Land         13429
Building       695
Name: count, dtype: int64

Unique values count in 'property_type_ar' column is:
property_type_ar
وحدة    348871
فيلا     71204
أرض      13429
مبنى       695
Name: count, dtype: int64


The distribution of property types in the dataset reveals a clear dominance of residential properties (Units and Villas):

1. **Units**: 3488,871 entries, which represent the majority of transactions.

2. **Villas**: 71,204 entries, also a significant portion but much smaller than Units.

3. **Land**: 13,429 entries, which are relatively few compared to residential properties.

4. **Buildings**: 695 entries, representing a very small fraction of the dataset.

This shows that the dataset is heavily skewed towards residential transactions, which aligns well with our focus on **Units** and **Villas** for real estate analysis and forecasting. We can safely filter out the **Land** and **Building** property types, as they are not the main focus of our project. This will reduce dataset size and make our analysis more focused on the types of properties investors are most interested in.

In [43]:
# Shape of the dataset before filtering
print("Transactions dataset shape before filtering:", transactions_residential_sales_5y.shape)

# Filtering transactions dataset to include only "Unit" & "Villa"
transactions_residential_sales_5y = transactions_residential_sales_5y[
    transactions_residential_sales_5y['property_type_en'].isin(['Unit', 'Villa'])
]

# Shape of the dataset after filtering
print("Transactions dataset shape after filtering:", transactions_residential_sales_5y.shape)

Transactions dataset shape before filtering: (434199, 38)
Transactions dataset shape after filtering: (420075, 38)


In [44]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
104251,1-102-2024-55290,2024-07-29,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,330,مرسى دبي,Marsa Dubai,ايتيرنيتاس تاور,AETERNITAS TOWER,2797.0,ايتيرنيتاس تاور,Aeternitas Tower,Dubai Marina,دبي مارينا,برج العرب,Burj Al Arab,ابراج مارينا,Marina Towers,مارينا مول,Marina Mall,غرفتين,2 B/R,1,153.72,3210000.0,20882.12,,,1.0,2.0,0.0
396189,1-110-2022-584,2022-12-20,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,410,نخلة جميرا,Palm Jumeirah,الدباس,Al Dabas,,,,Palm Jumeirah,نخلة جميرا,برج العرب,Burj Al Arab,نخلة جميرا,Palm Jumeirah,مارينا مول,Marina Mall,غرفة,1 B/R,1,66.91,2300000.0,34374.53,,,2.0,2.0,2.0
519372,1-11-2023-29859,2023-09-19,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,العنوان رزيدنسز دبي اوبرا T1,The Address Residences Dubai Opera T1,1695.0,العنوان أوبرا دبي,THE ADDRESS DUBAI OPERA,Burj Khalifa,برج خليفة,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,ثلاث غرف,3 B/R,1,145.27,8200000.0,56446.62,,,1.0,1.0,0.0
475771,1-102-2022-33373,2022-10-25,3,وحدة,Unit,60.0,شقه سكنيه,Flat,0,على الخارطة,Off-Plan Properties,330,مرسى دبي,Marsa Dubai,شورز مارينا,MARINA SHORES,2425.0,مارينا شورز,MARINA SHORES,Dubai Marina,دبي مارينا,برج العرب,Burj Al Arab,مرسى دبي,Dubai Marina,مارينا مول,Marina Mall,غرفة,1 B/R,1,69.61,1543888.0,22179.11,,,1.0,1.0,0.0
132868,1-11-2021-17319,2021-09-30,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,1283.0,الفرجان,AL FURJAN,Al Furjan,الفرجان,موقع إكسبو 2020,Expo 2020 Site,محطة مترو ابن بطوطة,Ibn Battuta Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,,,0,224.0,2650000.0,11830.36,,,1.0,2.0,0.0


In [45]:
# Checking the different Property Sub Types columns in the transactions dataset
print("Unique values count in 'property_sub_type_id' column is:")
print(transactions_residential_sales_5y['property_sub_type_id'].value_counts(dropna=False))

print("\nUnique values count in 'property_sub_type_en' column is:")
print(transactions_residential_sales_5y['property_sub_type_en'].value_counts(dropna=False))

print("\nUnique values count in 'property_sub_type_ar' column is:")
print(transactions_residential_sales_5y['property_sub_type_ar'].value_counts(dropna=False))

Unique values count in 'property_sub_type_id' column is:
property_sub_type_id
60.0    348750
4.0      53589
NaN      17615
75.0       121
Name: count, dtype: int64

Unique values count in 'property_sub_type_en' column is:
property_sub_type_en
Flat                  348750
Villa                  53589
NaN                    17615
Stacked Townhouses       121
Name: count, dtype: int64

Unique values count in 'property_sub_type_ar' column is:
property_sub_type_ar
شقه سكنيه        348750
فيلا              53589
NaN               17615
منازل متلاصقة       121
Name: count, dtype: int64


The distribution of the property subtypes gives us more detailed insights into the types of residential properties:

1. **Flats**: There are 348,750 entries labeled as “Flat,” which dominate the dataset, making them the most common property subtype.

2. **Villas**: 53,589 entries are labeled as “Villa,” representing a significant portion of the data but still much smaller compared to flats.

3. **Missing Values (NaN)**: There are 17,615 missing values in this column, which will need to be addressed during cleaning.

4. **Stacked Townhouses**: Only 121 entries are marked as “Stacked Townhouses,” making them a very rare property subtype.

What stands out is that the dataset is overwhelmingly focused on flats, followed by villas. The stacked townhouses subtype is so rare that it might not provide meaningful insights for modeling, and the missing values (NaN) in this column may indicate the need to either impute or drop these entries. This also suggests that the `property_sub_type` column may not be as critical to the overall analysis, especially given the high prevalence of flats and villas.

In [46]:
# Inspecting a random sample where the property sub type is null
transactions_residential_sales_5y[transactions_residential_sales_5y['property_sub_type_en'].isnull()].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
182051,1-110-2022-333,2022-08-17,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,298,الورقاء الثالثه,Al Warqa Third,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,,,0,1264.78,4950000.0,3913.72,,,2.0,2.0,4.0
53037,1-11-2021-805,2021-01-18,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,,,,,,The Villa,ذا فيلا,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,,,0,582.59,3400000.0,5836.01,,,1.0,1.0,0.0
74469,1-11-2021-20809,2021-11-22,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,,,1396.0,دماك هيلز - روك وود,DAMAC HILLS - ROCKWOOD,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,,,0,266.36,2150000.0,8071.78,,,1.0,1.0,0.0
1096893,1-11-2021-5745,2021-04-09,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,,,,Jumeirah Islands,جزر جميرا,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,أبراج بحيرات جميرا,Jumeirah Lakes Towers,مارينا مول,Marina Mall,,,0,988.18,3225000.0,3263.58,,,1.0,2.0,0.0
401412,1-41-2019-4754,2019-11-19,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,412,المركاض,Al Merkadh,,,1496.0,محمد بن راشد ال مكتوم ستي ديسترك ون فيس 2 فلل,MOHAMMED BIN RASHID AL MAKTOUM CITY-DISTRICT O...,,,وسط مدينة دبي,Downtown Dubai,محطة مترو الخليج التجاري,Business Bay Metro Station,مول دبي,Dubai Mall,,,0,745.11,15777500.0,21174.73,,,1.0,1.0,0.0


In [47]:
# Inspecting the property type where the property sub type is null
transactions_residential_sales_5y[transactions_residential_sales_5y['property_sub_type_en'].isnull()]['property_type_en'].value_counts()

property_type_en
Villa    17615
Name: count, dtype: int64

In [48]:
# Inspecting a random sample where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y[transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
322843,1-41-2020-5530,2020-09-22,3,وحدة,Unit,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 32,The Pulse Townhouses Cluster 32,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,170.91,725000.0,4242.0,,,1.0,1.0,0.0
446870,1-11-2023-31581,2023-10-03,3,وحدة,Unit,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 20,The Pulse Townhouses Cluster 20,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,302.44,1700000.0,5620.95,,,1.0,1.0,0.0
778450,1-41-2020-6628,2020-11-09,3,وحدة,Unit,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 35,The Pulse Townhouses Cluster 35,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,162.2,720000.0,4438.96,,,2.0,1.0,0.0
343377,1-11-2023-36931,2023-11-13,3,وحدة,Unit,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 7,The Pulse Townhouses Cluster 7,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,332.17,1600000.0,4816.81,,,3.0,1.0,0.0
75617,1-110-2022-260,2022-07-06,3,وحدة,Unit,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 39,The Pulse Townhouses Cluster 39,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,332.48,1406000.0,4228.83,,,2.0,2.0,2.0


In [49]:
# Inspecting the property type unique values count where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses']['property_type_en'].value_counts()

property_type_en
Unit    121
Name: count, dtype: int64

During my data inspection, I discovered that the stacked townhouses were registered under the property type as “Units.” However, given the nature of stacked townhouses, categorizing them as “Villas” would be more appropriate for our analysis. 

To ensure consistency and accuracy, I will modify their property type to “Villa” before proceeding with the rest of the analysis. This adjustment aligns with the actual characteristics of the properties and helps improve the quality of our dataset for further modeling.

In [50]:
print(transactions_residential_sales_5y['property_type_id'].value_counts(dropna=False))
print(transactions_residential_sales_5y['property_type_en'].value_counts(dropna=False))
print(transactions_residential_sales_5y['property_type_ar'].value_counts(dropna=False))

property_type_id
3    348871
4     71204
Name: count, dtype: int64
property_type_en
Unit     348871
Villa     71204
Name: count, dtype: int64
property_type_ar
وحدة    348871
فيلا     71204
Name: count, dtype: int64


In [51]:
# Converting "Unit" to "Villa" in the property type column where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses') & 
    (transactions_residential_sales_5y['property_type_en'] == 'Unit'), 'property_type_id'] = 4

transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses') & 
    (transactions_residential_sales_5y['property_type_en'] == 'Unit'), 'property_type_en'] = 'Villa'

transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses') & 
    (transactions_residential_sales_5y['property_type_ar'] == 'وحدة'), 'property_type_ar'] = 'فيلا'

# Inspecting the property type unique values count where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
757859,1-41-2021-7122,2021-07-13,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 20,The Pulse Townhouses Cluster 20,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,167.73,1037000.0,6182.56,,,1.0,1.0,0.0
178515,1-11-2023-5731,2023-02-28,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 36,The Pulse Townhouses Cluster 36,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,169.12,1150000.0,6799.91,,,1.0,1.0,0.0
757853,1-41-2021-1258,2021-02-08,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 14,The Pulse Townhouses Cluster 14,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,332.17,1075000.0,3236.29,,,4.0,2.0,0.0
1254938,1-41-2020-7388,2020-12-14,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 8,The Pulse Townhouses Cluster 8,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,332.48,1024739.0,3082.11,,,4.0,2.0,0.0
882076,1-41-2020-6665,2020-11-09,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 35,The Pulse Townhouses Cluster 35,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,170.27,850000.0,4992.07,,,1.0,1.0,0.0


In [52]:
transactions_residential_sales_5y[
    transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)
].shape 

(3164, 38)

In [53]:
transactions_residential_sales_5y[
    transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)
].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1073257,1-11-2024-20906,2024-06-10,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,507,اليلايس 2,Al Yelayiss 2,,,1823.0,نور تاون هاوس,NOOR TOWNHOUSES,TOWN SQUARE,تاون سكوير,,,,,,,ثلاث غرف,3 B/R,0,190.77,2100000.0,11008.02,,,1.0,2.0,0.0
1005349,1-11-2024-11035,2024-03-26,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,507,اليلايس 2,Al Yelayiss 2,,,2289.0,ريم تاون هاوس,REEM TOWNHOUSES,TOWN SQUARE,تاون سكوير,,,,,,,ثلاث غرف,3 B/R,0,187.92,2365000.0,12585.14,,,1.0,2.0,0.0
385426,1-102-2023-18943,2023-04-19,4,فيلا,Villa,4.0,فيلا,Villa,0,على الخارطة,Off-Plan Properties,507,اليلايس 2,Al Yelayiss 2,,,2289.0,ريم تاون هاوس,REEM TOWNHOUSES,TOWN SQUARE,تاون سكوير,,,,,,,أربع غرف,4 B/R,0,255.32,2340000.0,9164.97,,,1.0,1.0,0.0
1297996,1-11-2020-10553,2020-11-11,4,فيلا,Villa,,,,1,العقارات القائمة,Existing Properties,531,الحبيه السادسة,Al Hebiah Sixth,,,1714.0,أرابيلا 2 - تاون هاوسز في مدن,ARABELLA 2 - TOWNHOUSES AT MUDON,Mudon,مدن,دورة دبي للدراجات,Dubai Cycling Course,,,,,,,0,268.4,1700000.0,6333.83,,,1.0,2.0,0.0
426926,1-11-2020-5899,2020-08-04,4,فيلا,Villa,4.0,فيلا,Villa,1,العقارات القائمة,Existing Properties,507,اليلايس 2,Al Yelayiss 2,,,1823.0,نور تاون هاوس,NOOR TOWNHOUSES,TOWN SQUARE,تاون سكوير,,,,,,,ثلاث غرف,3 B/R,0,186.87,1150000.0,6154.01,,,1.0,1.0,0.0


In [54]:
transactions_residential_sales_5y[
    transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)
]['property_type_en'].value_counts()

property_type_en
Villa    3157
Unit        7
Name: count, dtype: int64

In [55]:
transactions_residential_sales_5y[
    (transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)) &
    (transactions_residential_sales_5y['property_type_en'] == 'Unit')
]

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
75618,1-41-2023-8139,2023-05-01,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 23,The Pulse Townhouses Cluster 23,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,166.47,1050000.0,6307.44,,,1.0,1.0,0.0
96258,1-11-2023-29981,2023-09-20,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 29,The Pulse Townhouses Cluster 29,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,302.24,1600000.0,5293.81,,,1.0,1.0,0.0
364225,1-11-2023-42940,2023-12-26,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 29,The Pulse Townhouses Cluster 29,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,ثلاث غرف,3 B/R,1,300.8,1775000.0,5900.93,,,1.0,2.0,0.0
426069,1-41-2021-12796,2021-11-25,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 9,The Pulse Townhouses Cluster 9,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,162.64,1026060.0,6308.78,,,1.0,1.0,0.0
550383,1-41-2021-1231,2021-02-09,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 29,The Pulse Townhouses Cluster 29,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,160.36,957418.0,5970.43,,,1.0,1.0,0.0
612666,1-41-2021-12845,2021-11-24,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 29,The Pulse Townhouses Cluster 29,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,158.74,992377.0,6251.59,,,1.0,1.0,0.0
840665,1-41-2021-13381,2021-12-08,3,وحدة,Unit,60.0,شقه سكنيه,Flat,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 23,The Pulse Townhouses Cluster 23,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,166.47,1050032.0,6307.64,,,1.0,1.0,0.0


In [56]:
# Converting "Unit" to "Villa" in the property type column where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)) & 
    (transactions_residential_sales_5y['property_type_en'] == 'Unit'), 'property_type_id'] = 4

transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)) &
    (transactions_residential_sales_5y['property_type_en'] == 'Unit'), 'property_type_en'] = 'Villa'

transactions_residential_sales_5y.loc[
    (transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)) & 
    (transactions_residential_sales_5y['property_type_ar'] == 'وحدة'), 'property_type_ar'] = 'فيلا'

# # Inspecting the property type unique values count where the property sub type is "Stacked Townhouses"
transactions_residential_sales_5y[
    transactions_residential_sales_5y['property_sub_type_en'] == 'Stacked Townhouses'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
840668,1-11-2023-26550,2023-08-22,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 34,The Pulse Townhouses Cluster 34,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,162.45,1350000.0,8310.25,,,1.0,1.0,0.0
529621,1-11-2022-12692,2022-06-07,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 10,The Pulse Townhouses Cluster 10,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,أربع غرف,4 B/R,1,359.86,1750000.0,4863.0,,,1.0,2.0,0.0
219577,1-11-2022-60,2022-01-04,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 1,The Pulse Townhouses Cluster 1,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,169.67,1000114.0,5894.47,,,1.0,2.0,0.0
322843,1-41-2020-5530,2020-09-22,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 32,The Pulse Townhouses Cluster 32,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,170.91,725000.0,4242.0,,,1.0,1.0,0.0
219569,1-41-2024-14594,2024-07-18,4,فيلا,Villa,75.0,منازل متلاصقة,Stacked Townhouses,1,العقارات القائمة,Existing Properties,462,مدينة المطار,Madinat Al Mataar,النبض المنازل مجموعه 25,The Pulse Townhouses Cluster 25,1804.0,النبض المنازل,THE PULSE TOWNHOUSES,Dubai South Residential District,المدينة السكنية بدبي الجنوب,موقع إكسبو 2020,Expo 2020 Site,,,,,غرفتين,2 B/R,1,203.31,1133100.0,5573.26,,,1.0,1.0,0.0


In [57]:
transactions_residential_sales_5y[
    (transactions_residential_sales_5y['project_name_en'].str.lower().str.contains("townhouse", na=False)) &
    (transactions_residential_sales_5y['property_type_en'] == 'Unit')
]

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3


Now that I’ve updated the property type values from “Unit” to “Villa” based on the property subtype for stacked townhouses, the property subtype column no longer provides any additional meaningful information. It mostly overlaps with the property type column, and the variation is minimal. To simplify our analysis and avoid redundancy, I believe it’s best to drop the property subtype columns moving forward.

In [58]:
# Dropping property sub type columns from the transactions dataset
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(
    columns=['property_sub_type_id', 'property_sub_type_en', 'property_sub_type_ar']
    )

# Displaying the shape of the dataset after dropping columns
print("Transactions dataset shape after dropping columns:", transactions_residential_sales_5y.shape)

Transactions dataset shape after dropping columns: (420075, 35)


In [59]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
997809,1-102-2024-15826,2024-03-13,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,رايز ريزدنسز,Rise Residences,2731.0,مساكن الارتفاع,Rise Residences,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,مدينة دبي للإنترنت,Dubai Internet City,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,65.9,771645.0,11709.33,,,1.0,1.0,0.0
779166,1-102-2024-5266,2024-01-29,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,523,الحبية الثالثة,Al Hebiah Third,داماك هيلز - جولف جرينز 2 - تاور اي,DAMAC HILLS - GOLF GREENS 2 -TOWER A,2780.0,داماك هيلز - جولف جرينز 2,DAMAC HILLS - GOLF GREENS 2,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,غرفة,1 B/R,1,90.9,1468000.0,16149.61,,,1.0,2.0,0.0
949194,1-102-2022-24105,2022-08-23,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,عزيزي ريفييرا 15\t,Azizi Riviera 15,1975.0,عزيزي ريفييرا 15,Azizi Riviera 15,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,50.93,727390.0,14282.15,,,1.0,1.0,0.0
448416,1-11-2024-4423,2024-02-07,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,,,2217.0,روكان 3,Rukan 3,Rukan,ركان,,,,,,,غرفة,1 B/R,0,59.44,780000.0,13122.48,,,1.0,1.0,0.0
152592,1-11-2021-3761,2021-03-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,1283.0,الفرجان,AL FURJAN,Al Furjan,الفرجان,موقع إكسبو 2020,Expo 2020 Site,محطة مترو ابن بطوطة,Ibn Battuta Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,,,0,600.0,3100000.0,5166.67,,,1.0,2.0,0.0


In [60]:
# Inspecting the Registration Type columns in the transactions dataset
print("Unique values in 'reg_type_id' column is:")
print(transactions_residential_sales_5y['reg_type_id'].value_counts())

print("\nUnique values in 'reg_type_en' column is:")
print(transactions_residential_sales_5y['reg_type_en'].value_counts())

print("\nUnique values in 'reg_type_ar' column is:")
print(transactions_residential_sales_5y['reg_type_ar'].value_counts())

Unique values in 'reg_type_id' column is:
reg_type_id
0    247874
1    172201
Name: count, dtype: int64

Unique values in 'reg_type_en' column is:
reg_type_en
Off-Plan Properties    247874
Existing Properties    172201
Name: count, dtype: int64

Unique values in 'reg_type_ar' column is:
reg_type_ar
على الخارطة         247874
العقارات القائمة    172201
Name: count, dtype: int64


Based on the unique value counts of the registration columns, we can observe that the dataset is divided into two main categories:

- **Off-Plan Properties (247,874 entries)**: These represent properties that are still under development or not yet completed.

- **Existing Properties (172,201 entries)**: These are properties that are already constructed and available for sale or rent.

The proportion of Off-Plan Properties is significantly higher than Existing Properties, which suggests that a large portion of the dataset deals with upcoming or under-construction projects. This balance of data is useful for analyzing trends and forecasting in both the current real estate market and future developments.

In [61]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
494365,1-41-2019-388,2019-02-12,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,1686.0,الفرجان 4,ALFURJAN PACKAGE 4,Al Furjan,الفرجان,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو ابن بطوطة,Ibn Battuta Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,,,0,618.46,5077000.0,8209.1,,,1.0,2.0,0.0
325809,1-11-2020-3247,2020-03-23,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,451,الحبيه الخامسة,Al Hebiah Fifth,الثمام 7,AL THAMAM 7,975.0,رمرام,REMRAAM,Remraam,رمرام,موتور سيتي,Motor City,,,,,غرفتين,2 B/R,1,93.73,570000.0,6081.3,,,1.0,1.0,0.0
50222,1-102-2020-13670,2020-10-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,478,جبل علي الصناعية الثانية,Jabal Ali Industrial Second,عزيزي أورا ريسيدنس,Azizi Aura Residences,1897.0,عزيزي أورا ريزيدنزس,Azizi Aura Residences,Down Town Jabal Ali,دون تاون جبل علي,موقع إكسبو 2020,Expo 2020 Site,محطة مترو الإمارات العربية المتحدة للصرافة,UAE Exchange Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,غرفة,1 B/R,1,59.73,700034.0,11719.97,,,1.0,1.0,0.0
230415,1-11-2020-2583,2020-03-04,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,409,البرشاء جنوب الثالثة,Al Barshaa South Third,سايان بارك 1,SYANN PARK 1,295.0,سين بارك 1,SYANN PARK1,Arjan,أرجان,موتور سيتي,Motor City,محطة مترو شرف دي جي,Sharaf Dg Metro Station,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,76.08,478000.0,6282.86,,,1.0,1.0,0.0
1175343,1-102-2020-8659,2020-06-22,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2157.0,غرين فيو,Greenview,Dubai World Central,دبي ورلد سنترال,مطار آل مكتوم الدولي,Al Makhtoum International Airport,,,,,ثلاث غرف,3 B/R,0,141.04,1117888.0,7926.04,,,1.0,1.0,0.0


In [62]:
# Inspecting the Area columns in the transactions dataset
print("Unique values in 'area_id' column is:")
print(transactions_residential_sales_5y['area_id'].value_counts())

print("\nUnique values in 'area_name_en' column is:")
print(transactions_residential_sales_5y['area_name_en'].value_counts())

print("\nUnique values in 'area_name_ar' column is:")
print(transactions_residential_sales_5y['area_name_ar'].value_counts())

Unique values in 'area_id' column is:
area_id
441    39458
330    31972
526    31454
412    25550
482    20160
467    18566
390    17532
350    15646
447    15156
409    13349
462    13338
445    11914
435    11549
343     9366
507     9332
410     9197
485     6673
333     6612
523     6044
484     5925
463     5726
465     5623
469     5605
364     5505
442     4708
334     4548
351     3799
414     3649
451     3457
370     3418
444     3371
464     3261
405     3241
352     3197
531     2836
317     2611
335     2547
376     2434
458     2232
332     2203
434     2180
505     1998
331     1984
366     1965
506     1956
452     1927
459     1602
483     1602
466     1474
232     1466
371     1074
478     1073
432     1039
312      840
348      830
527      403
443      401
453      386
307      269
437      265
266      214
300      211
375      189
329      156
230      126
276      110
374      109
233      106
522      103
382       94
303       82
368       72
264       69
393  

From the inspection of the area columns, we observe the following:

1. **Area Variety**: The dataset covers a wide variety of areas, with over 100 unique area IDs and names. The most frequent areas include:

	- **Al Barsha South Fourth** (39,458 entries)

	- **Marsa Dubai** (31,972 entries)

	- **Business Bay** (31,454 entries)

	- **Al Merkadh** (25,550 entries)

These high frequencies suggest that these are popular regions for property transactions, especially in central or high-demand locations in Dubai.

2. **Top Areas in Transactions**: The frequent mention of certain areas, such as **Business Bay** and **Marsa Dubai**, aligns with well-known commercial and residential hubs. These areas likely have a significant influence on the market and may represent prime locations for investment.

3. **Tail-End Areas**: The areas with lower counts, such as **Al Waheda** or **Muhaisanah First**, suggest smaller or less active regions in terms of transactions. These regions may not have as much data to offer or may represent niche markets.

4. **Consistency Between English and Arabic Columns**: The values in the Arabic (`area_name_ar`) and English (`area_name_en`) columns appear to align well, with each Arabic name having a corresponding English counterpart. This will be useful for merging and referencing in further analysis, ensuring data consistency.

In [63]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
131419,1-102-2024-78365,2024-09-30,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,444,الحبيه الاولى,Al Hebiah First,شوبا أوربيس تاور إي,Sobha Orbis Tower E,3070.0,شوبا أوربيس,Sobha Orbis,Motor City,موتور ستي,موتور سيتي,Motor City,محطة مترو شرف دي جي,Sharaf Dg Metro Station,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,65.17,1297738.0,19913.12,,,1.0,1.0,0.0
407343,1-102-2023-55423,2023-10-23,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,عزيزي ريفيرا 27,Azizi Riviera 27,2121.0,عزيزي ريفيرا 27,Azizi Riviera 27,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,استوديو,Studio,1,32.65,675200.0,20679.94,,,1.0,1.0,0.0
134542,1-102-2021-23625,2021-12-08,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,ذي جراند,The Grand,1983.0,ذي جراند,The Grand,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,مول دبي,Dubai Mall,غرفتين,2 B/R,1,120.19,2183888.0,18170.3,,,1.0,1.0,0.0
567819,1-102-2024-57735,2024-08-05,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,452,الحبيه الثانية,Al Hebiah Second,عزيزي ميراج 1 - تاور 2,AZIZI Mirage 1 - TOWER 2,2034.0,عزيزي ميراج 1,Azizi Mirage 1,Dubai Studio City,مدينة دبي للستديو,موتور سيتي,Motor City,مدينة دبي للإنترنت,Dubai Internet City,,,غرفة,1 B/R,1,55.63,737600.0,13259.03,,,1.0,2.0,0.0
798376,1-102-2021-185,2021-01-05,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,,,2176.0,لاروزا المرحله الثانيه,La Rosa Phase 2,,,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,ثلاث غرف,3 B/R,0,173.97,1415000.0,8133.59,,,1.0,1.0,0.0


In [64]:
# Inspecting the Building columns in the transactions dataset
print("Unique values in 'building_name_en' column is:")
print(transactions_residential_sales_5y['building_name_en'].value_counts(dropna=False))

print("\nUnique values in 'building_name_ar' column is:")
print(transactions_residential_sales_5y['building_name_ar'].value_counts(dropna=False))

Unique values in 'building_name_en' column is:
building_name_en
NaN                                                                           71204
Seven City JLT                                                                 3261
Grande                                                                         1221
Regalia                                                                        1207
Bayz 101 By Danube                                                             1101
Peninsula Four                                                                 1095
THE EDGE                                                                       1071
Burj Royale                                                                    1017
Sobha Creek Vistas Reserve                                                      984
Peninsula Three                                                                 979
Sobha Hartland - Crest Grande                                                   967
Palace Resid

In [65]:
# Displaying the count of unique values in the building name columns
print("Unique values count in 'building_name_en' column is:")
print(transactions_residential_sales_5y['building_name_en'].nunique(dropna=False))

print("\nUnique values count in 'building_name_ar' column is:")
print(transactions_residential_sales_5y['building_name_ar'].nunique(dropna=False))

Unique values count in 'building_name_en' column is:
3305

Unique values count in 'building_name_ar' column is:
3302


From the inspection of the building columns, we observe the following:

1. **Missing Building Names**:The dataset contains **62,648** missing values in the `building_name_en` column and **63,072** in the `building_name_ar` column. I will investigate further to determine if the missing building names are related to the inclusion of villas in the data. 

These high frequencies suggest that these are popular regions for property transactions, especially in central or high-demand locations in Dubai.

2. **Top 5 Buildings in Transactions**: The frequent mention of building names, are the following:

	- **Seven City JLT** (3,259 entries)

	- **Regalia** (1,206 entries)

	- **Grande** (1,160 entries)

	- **Bayz 101 By Danube** (1,099 entries)
    
    - **Peninsula Four** (1,094 entries)
    
 This likely indicates that these properties are highly active in the real estate market, potentially driven by demand in specific areas, off-plan developments, or attractive investment opportunities.

3. **Tail-End Building Names**: These buildings may have limited market activity or niche appeal, possibly due to their location, type (e.g., stacked townhouses), or the fact that they cater to a small subset of buyers. Also, Some of these buildings might be newly launched or still under construction, meaning that their sales volume hasn’t picked up yet.

4. **Inconsistency Between English and Arabic Columns**: The values in the Arabic (`building_name_ar`) and English (`building_name_en`) columns appear to be inconsistent, with different missing values count, and English building names in the Arabic (`building_name_ar`) column. 

In [66]:
# Let's inspect the observations where the building name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['building_name_en'].isnull()].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
593649,1-102-2019-16595,2019-10-10,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2102.0,بارك سايد,Parkside,Dubai World Central,دبي ورلد سنترال,مطار آل مكتوم الدولي,Al Makhtoum International Airport,,,,,ثلاث غرف,3 B/R,0,141.74,1068888.0,7541.19,,,1.0,1.0,0.0
405589,1-102-2022-9966,2022-04-13,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2172.0,جرين فيو 2,Greenview 2,Dubai World Central,دبي ورلد سنترال,مطار آل مكتوم الدولي,Al Makhtoum International Airport,,,,,ثلاث غرف,3 B/R,0,157.48,1182888.0,7511.35,,,1.0,1.0,0.0
877925,1-45-2021-445,2021-06-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,458,مجمع دبي للاستثمار الاول,Dubai Investment Park First,,,1667.0,المجتمع الأخضر الغرب - التوسع - المرحلة الثالثة,GREEN COMMUNITY WEST- EXTENSION- PHASE III,Dubai Investment Park First,مجمع دبي للاستثمار الاول,موقع إكسبو 2020,Expo 2020 Site,محطة مترو الدانوب,DANUBE Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,ثلاث غرف,3 B/R,0,311.54,2992343.0,9605.0,,,1.0,1.0,0.0
165453,1-11-2023-5215,2023-02-23,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,,,,Jumeirah Islands,جزر جميرا,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,أبراج بحيرات جميرا,Jumeirah Lakes Towers,مارينا مول,Marina Mall,,,0,988.18,18500000.0,18721.29,,,1.0,1.0,0.0
291873,1-11-2022-6566,2022-03-30,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,351,الثنيه الثالثة,Al Thanyah Third,,,1062.0,البحيرات زلال,The Lakes Zulal,,,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,ثلاث غرف,3 B/R,0,364.13,4085000.0,11218.52,,,1.0,1.0,0.0


In [67]:
# Checking the property type where the building name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['building_name_en'].isnull()]['property_type_en'].value_counts()

property_type_en
Villa    71204
Name: count, dtype: int64

In [68]:
# Checking the registration type where the building name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['building_name_en'].isnull()]['reg_type_en'].value_counts()

reg_type_en
Off-Plan Properties    36973
Existing Properties    34231
Name: count, dtype: int64

In [69]:
# Checking the areas where the building name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['building_name_en'].isnull()]['area_name_en'].value_counts()

area_name_en
Wadi Al Safa 5                       11717
Madinat Al Mataar                     7131
Al Yufrah 1                           5605
Wadi Al Safa 7                        4911
Hadaeq Sheikh Mohammed Bin Rashid     3479
Al Hebiah Fourth                      3459
Al Thanayah Fourth                    3197
Al Yelayiss 2                         3013
Al Hebiah Sixth                       2682
Wadi Al Safa 6                        2180
Wadi Al Safa 3                        2141
Al Yelayiss 1                         1956
Jabal Ali First                       1938
Al Thanyah Fifth                      1850
Al Barsha South Fourth                1555
Dubai Investment Park First           1505
Dubai Investment Park Second          1302
Al Hebiah Third                       1219
Nad Al Shiba First                    1202
Me'Aisem First                         975
Wadi Al Safa 2                         835
Palm Jumeirah                          787
Al Barsha South Fifth                  78

From the inspection of the missing building names, here’s what stands out:

1. **Property Type**: All missing building names correspond to properties that are classified as **villas**. This makes sense since standalone villas typically don’t have building names like apartment units. Therefore, the absence of building names in these cases is expected, and this insight confirms that it’s not a data quality issue but rather a characteristic of villa-type properties.

2. **Registration Type**: The missing building names are divided between **Off-Plan Properties** (**36,973** entries) and **Existing Properties** (**34,231** entries). This shows that the issue of missing building names affects both off-plan and existing villas, meaning this is consistent across both property types.

3. **Area**: Most of the missing building names are concentrated in areas dominated by villa developments, such as **Wadi Al Safa 5**, **Madinat Al Mataar**, and **Al Yufrah 1**, which are known for their villa projects. This further emphasizes that the missing building names are tied to the property type and location.

Given these insights, before removing the building_name_en and building_name_ar columns, I plan to use them to help impute missing values in the rooms columns. In cases where building names are available for villa properties, we can leverage known building characteristics, location, or developer patterns to infer the typical room configuration for similar villas. This step will enhance the dataset’s completeness, particularly in terms of property size and type, which are critical for our later analyses.

Once this imputation is complete, I will drop the building name columns to remove unnecessary complexity, allowing us to focus on more impactful area-level metrics for market trends and investment analysis. This streamlined approach ensures smoother data processing and model training without the need for continuous building name imputation or handling.

In [70]:
# Inspecting the values in rooms columns
print("Unique values count in 'rooms_en' column is:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

print("\nUnique values count in 'rooms_ar' column is:")
print(transactions_residential_sales_5y['rooms_ar'].value_counts(dropna=False))

Unique values count in 'rooms_en' column is:
rooms_en
1 B/R          149522
2 B/R           99527
Studio          69030
3 B/R           57485
4 B/R           23743
NaN             17957
5 B/R            2299
PENTHOUSE         240
6 B/R             173
Single Room        58
7 B/R              31
Shop                6
Office              2
9 B/R               1
8 B/R               1
Name: count, dtype: int64

Unique values count in 'rooms_ar' column is:
rooms_ar
غرفة           149522
غرفتين          99527
استوديو         69030
ثلاث غرف        57485
أربع غرف        23743
NaN             17957
خمس غرف          2299
بنتهاوس           240
ست غرف            173
غرفة مستقلة        58
سبع غرف            31
محل                 6
مكتب                2
تسع غرف             1
ثمان غرف            1
Name: count, dtype: int64


In [71]:
# Dropping observations where room is "Shop" or "Office"
transactions_residential_sales_5y = transactions_residential_sales_5y[
    ~transactions_residential_sales_5y['rooms_en'].isin(['Shop', 'Office'])
]

# Inspecting the values in rooms columns
print("Unique values count in 'rooms_en' column is:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

print("\nUnique values count in 'rooms_ar' column is:")
print(transactions_residential_sales_5y['rooms_ar'].value_counts(dropna=False))

Unique values count in 'rooms_en' column is:
rooms_en
1 B/R          149522
2 B/R           99527
Studio          69030
3 B/R           57485
4 B/R           23743
NaN             17957
5 B/R            2299
PENTHOUSE         240
6 B/R             173
Single Room        58
7 B/R              31
9 B/R               1
8 B/R               1
Name: count, dtype: int64

Unique values count in 'rooms_ar' column is:
rooms_ar
غرفة           149522
غرفتين          99527
استوديو         69030
ثلاث غرف        57485
أربع غرف        23743
NaN             17957
خمس غرف          2299
بنتهاوس           240
ست غرف            173
غرفة مستقلة        58
سبع غرف            31
تسع غرف             1
ثمان غرف            1
Name: count, dtype: int64


In [72]:
# Displaying random observations where rooms is "Single Room"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'Single Room'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
378178,1-11-2019-6303,2019-07-09,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,جاي اند جي مبنى 2,J & G BUILDING 2,1448.0,ج أند جي بليكس,J&G PLEXS,DMCC-EZ2,مركز دبي للسلع المتعددة - المرحلة الثانية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,برج هاربور,Harbour Tower,مارينا مول,Marina Mall,غرفة مستقلة,Single Room,1,11.33,146345.0,12916.59,,,1.0,1.0,0.0
1311008,1-11-2023-13588,2023-05-08,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,جاي اند جي مبنى 2,J & G BUILDING 2,1448.0,ج أند جي بليكس,J&G PLEXS,DMCC-EZ2,مركز دبي للسلع المتعددة - المرحلة الثانية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,برج هاربور,Harbour Tower,مارينا مول,Marina Mall,غرفة مستقلة,Single Room,1,11.33,96997.0,8561.08,,,1.0,1.0,0.0
131006,1-11-2022-185,2022-01-07,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,جاي اند جي مبنى 2,J & G BUILDING 2,1448.0,ج أند جي بليكس,J&G PLEXS,DMCC-EZ2,مركز دبي للسلع المتعددة - المرحلة الثانية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,برج هاربور,Harbour Tower,مارينا مول,Marina Mall,غرفة مستقلة,Single Room,1,11.33,124483.0,10987.03,,,1.0,1.0,0.0
1020314,1-11-2023-13870,2023-05-09,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,جاي اند جي مبنى 2,J & G BUILDING 2,1448.0,ج أند جي بليكس,J&G PLEXS,DMCC-EZ2,مركز دبي للسلع المتعددة - المرحلة الثانية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,برج هاربور,Harbour Tower,مارينا مول,Marina Mall,غرفة مستقلة,Single Room,1,11.33,112500.0,9929.39,,,1.0,1.0,0.0
357728,1-11-2019-5299,2019-06-10,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,جاي اند جي مبنى 2,J & G BUILDING 2,1448.0,ج أند جي بليكس,J&G PLEXS,DMCC-EZ2,مركز دبي للسلع المتعددة - المرحلة الثانية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,برج هاربور,Harbour Tower,مارينا مول,Marina Mall,غرفة مستقلة,Single Room,1,11.33,95000.0,8384.82,,,1.0,1.0,0.0


In [73]:
# Inspecting the procedure_area statistics where rooms is "Single Room"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'Single Room']['procedure_area'].describe()

count    58.000000
mean     11.326897
std       0.016565
min      11.240000
25%      11.330000
50%      11.330000
75%      11.330000
max      11.330000
Name: procedure_area, dtype: float64

In [74]:
# Inspecting the area where rooms is "Single Room"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'Single Room']['area_name_en'].value_counts()

area_name_en
Jabal Ali First    58
Name: count, dtype: int64

Upon further inspection, I identified that all properties labeled as **“Single Rooms”** are concentrated in the **“Jebel Ali First”** area, with consistent property sizes averaging around **11.33** square meters. This pattern strongly indicates that these entries are likely part of a labor camp setup rather than traditional residential properties, as they follow a uniform room size structure typical of worker accommodations.

Given the focus on standard residential properties for this analysis, I will remove these **“Single Room”** entries. This decision will streamline the dataset, ensuring that these outliers do not distort the insights on residential property trends and characteristics.

In [75]:
# Removing observations where rooms is "Single Room"
transactions_residential_sales_5y = transactions_residential_sales_5y[
    transactions_residential_sales_5y['rooms_en'] != 'Single Room'
]

# Inspecting the values in rooms columns
print("Unique values count in 'rooms_en' column is:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

print("\nUnique values count in 'rooms_ar' column is:")
print(transactions_residential_sales_5y['rooms_ar'].value_counts(dropna=False))

Unique values count in 'rooms_en' column is:
rooms_en
1 B/R        149522
2 B/R         99527
Studio        69030
3 B/R         57485
4 B/R         23743
NaN           17957
5 B/R          2299
PENTHOUSE       240
6 B/R           173
7 B/R            31
9 B/R             1
8 B/R             1
Name: count, dtype: int64

Unique values count in 'rooms_ar' column is:
rooms_ar
غرفة        149522
غرفتين       99527
استوديو      69030
ثلاث غرف     57485
أربع غرف     23743
NaN          17957
خمس غرف       2299
بنتهاوس        240
ست غرف         173
سبع غرف         31
تسع غرف          1
ثمان غرف         1
Name: count, dtype: int64


In [76]:
# Displaying random observations where room is "PENTHOUSE"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'PENTHOUSE'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1219296,1-11-2022-26255,2022-10-26,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,390,برج خليفة,Burj Khalifa,لوفتس,LOFTS,,,,Burj Khalifa,برج خليفة,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,بنتهاوس,PENTHOUSE,1,53.12,880000.0,16566.27,,,1.0,1.0,0.0
701304,1-11-2019-8518,2019-09-23,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,410,نخلة جميرا,Palm Jumeirah,جولدن ميل 3,GOLDEN MILE 3,388.0,جولدن ميل,GOLDEN MILE,Palm Jumeirah,نخلة جميرا,برج العرب,Burj Al Arab,نخلة جميرا,Palm Jumeirah,مارينا مول,Marina Mall,بنتهاوس,PENTHOUSE,1,405.85,3100000.0,7638.29,,,1.0,1.0,0.0
929102,1-11-2021-11246,2021-07-01,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,برج كيان,CAYAN TOWER,130.0,برج كيان,CAYAN TOWER,Dubai Marina,دبي مارينا,برج العرب,Burj Al Arab,ابراج مارينا,Marina Towers,مارينا مول,Marina Mall,بنتهاوس,PENTHOUSE,1,236.65,3150000.0,13310.8,,,1.0,1.0,0.0
392133,1-11-2024-12867,2024-04-15,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,سييف 3,SEEF 3,,,,Jumeirah Lakes Towers,ابراج بحيرات الجميرا,برج العرب,Burj Al Arab,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,بنتهاوس,PENTHOUSE,1,517.75,6500000.0,12554.32,,,1.0,1.0,0.0
1074453,1-11-2023-18509,2023-06-13,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,امواج 4,Amwaj 4,,,,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,برج العرب,Burj Al Arab,مساكن شاطئ جميرا,Jumeirah Beach Residency,مارينا مول,Marina Mall,بنتهاوس,PENTHOUSE,1,524.33,5492330.0,10474.95,,,1.0,1.0,0.0


In [77]:
# Inspecting the procedure_area statistics where rooms is "PENTHOUSE"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'PENTHOUSE']['procedure_area'].describe()

count     240.000000
mean      360.493292
std       255.446089
min        37.680000
25%       194.492500
50%       316.240000
75%       453.902500
max      1967.500000
Name: procedure_area, dtype: float64

In [78]:
# Find the index of the minimum 'procedure_area' where 'rooms_en' is 'PENTHOUSE'
min_idx = transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'PENTHOUSE']['procedure_area'].idxmin()

# Use this index to filter and display the row
transactions_residential_sales_5y.loc[[min_idx]]

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1014830,1-11-2021-20781,2021-11-22,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,مارينا دايموند 5 (B),MARINA DIAMOND 5 (B),,,,Dubai Marina,دبي مارينا,برج العرب,Burj Al Arab,مرسى دبي,Dubai Marina,مارينا مول,Marina Mall,بنتهاوس,PENTHOUSE,1,37.68,405000.0,10748.41,,,1.0,1.0,0.0


In [79]:
# Displaying random observations where room is "Studio"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'Studio'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1187614,1-41-2019-2635,2019-07-22,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,348,القوز الرابعه,Al Goze Fourth,الخـيـــل هايـتـــس 8A-8B,AL KHAIL HEIGHTS 8A-8B,1527.0,الخيل هايتس,AL KHAIL HEIGHTS,,,وسط مدينة دبي,Downtown Dubai,محطة مترو نور بنك,Noor Bank Metro Station,مول دبي,Dubai Mall,استوديو,Studio,1,47.93,436500.0,9107.03,,,1.0,1.0,0.0
119781,1-102-2023-67311,2023-12-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,442,البرشاء جنوب الخامسة,Al Barsha South Fifth,سيسيليا تاور,SESLIA TOWER,2528.0,سيسليا تاور,Seslia Tower,Jumeirah Village Triangle,قرية جميرا المثلثة,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,استوديو,Studio,1,35.18,434387.0,12347.56,,,1.0,1.0,0.0
610724,1-102-2024-1048,2024-01-09,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,عزيزي فينيس 8 - اي,AZIZI VENICE 8 -A,2830.0,عزيزى فينيس 8,AZIZI VENICE 8,Dubai World Central,دبي ورلد سنترال,,,,,,,استوديو,Studio,1,31.57,632000.0,20019.01,,,1.0,1.0,0.0
873365,1-41-2024-195,2024-01-04,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,BLOOM TOWERS B,BLOOM TOWERS B,1888.0,بلووم تاورز,Bloom Towers,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,مدينة دبي للإنترنت,Dubai Internet City,مول الإمارات,Mall of the Emirates,استوديو,Studio,1,37.6,525000.0,13962.77,,,1.0,1.0,0.0
11568,1-102-2024-63712,2024-10-02,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,485,معيصم الأول,Me'Aisem First,سامانا ليك فيوز,Samana Lake Views,3071.0,سامانا ليك فيوز,Samana Lake Views,International Media Production Zone,المنطقة العالمية للإنتاج الإعلامي,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,استوديو,Studio,1,38.7,678786.0,17539.7,,,1.0,1.0,0.0


In [80]:
# Inspecting the procedure_area statistics where rooms is "Studio"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == 'Studio']['procedure_area'].describe()

count    69030.000000
mean        41.307551
std         57.876080
min          0.290000
25%         35.970000
50%         39.020000
75%         44.000000
max       4061.930000
Name: procedure_area, dtype: float64

In [81]:
# Displaying random observations where room is "1 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '1 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
294871,1-102-2024-9779,2024-02-19,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,333,مدينة دبي الملاحية,Madinat Dubai Almelaheyah,نوتيكا اثنان,NAUTICA TWO,2783.0,نوتيكا تو,Nautica Two,Dubai Maritime City,مدينة دبي الملاحية,برج خليفة,Burj Khalifa,محطة مترو الغبيبة,Al Ghubaiba Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,62.74,1650000.0,26299.01,,,1.0,1.0,0.0
100101,1-11-2024-30713,2024-08-20,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,412,المركاض,Al Merkadh,شوبا هارتلاند ويفز,Sobha Hartland Waves,2239.0,شوبا هارتلاند ويفز,Sobha Hartland Waves,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,48.75,1290000.0,26461.54,,,2.0,1.0,0.0
86834,1-102-2023-40767,2023-08-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,409,البرشاء جنوب الثالثة,Al Barshaa South Third,سمانا ميكونوس سكنيجر,SAMANA MYKONOS SIGNATURE,2587.0,سامانا ميكونوس سيجنيتشر,SAMANA MYKONOS SIGNATURE,Arjan,أرجان,موتور سيتي,Motor City,محطة مترو شرف دي جي,Sharaf Dg Metro Station,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,78.98,1142964.0,14471.56,,,1.0,1.0,0.0
577418,1-102-2023-34262,2023-07-19,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,شوبا هارتلاند ويفز اوبيولينس,Sobha Hartland Waves Opulence,2564.0,شوبا هارتلاند ويفز اوبيولينس,Sobha Hartland Waves Opulence,,,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,غرفة,1 B/R,1,83.7,1711786.0,20451.45,,,1.0,1.0,0.0
128072,1-102-2022-17908,2022-07-04,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,409,البرشاء جنوب الثالثة,Al Barshaa South Third,تورينو باي أورو24 - 4,TORINO BY ORO24 - 4,2379.0,تورينو من أورو24,TORINO BY ORO24,Arjan,أرجان,موتور سيتي,Motor City,محطة مترو شرف دي جي,Sharaf Dg Metro Station,مول الإمارات,Mall of the Emirates,غرفة,1 B/R,1,47.4,599000.0,12637.13,,,1.0,1.0,0.0


In [82]:
# Inspecting the procedure_area statistics where rooms is "1 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '1 B/R']['procedure_area'].describe()

count    149522.000000
mean         72.689695
std          17.512324
min           0.630000
25%          63.170000
50%          70.980000
75%          79.510000
max        1645.500000
Name: procedure_area, dtype: float64

In [83]:
# Displaying random observations where room is "2 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '2 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1102347,1-11-2020-12944,2020-12-28,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,ليك شور 1,Lakeshore Tower 1,559.0,ليك سور تاور,LAKE SHORE TOWER,Jumeirah Lakes Towers,ابراج بحيرات الجميرا,برج العرب,Burj Al Arab,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,غرفتين,2 B/R,1,138.95,940000.0,6765.02,,,2.0,1.0,0.0
174667,1-102-2021-15161,2021-09-06,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,إكزيكتيف رزيدنسز 2,EXECUTIVE RESIDENCES 2,1937.0,بارك ريدج,PARK RIDGE,DUBAI HILLS - PARK,دبي هيليز - بارك,برج العرب,Burj Al Arab,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,غرفتين,2 B/R,1,92.62,1657888.0,17899.89,,,1.0,2.0,0.0
1258867,1-102-2023-31031,2023-06-26,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,كريك ايدج تاور 1,CREEK EDGE Tower 1,2083.0,كريك ايدج,CREEK EDGE,The Lagoons,الخيران,مطار دبي الدولي,Dubai International Airport,محطة مترو الخور,Creek Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفتين,2 B/R,1,97.23,2450000.0,25197.98,,,1.0,1.0,0.0
735546,1-102-2024-72163,2024-09-16,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,سامانا براري فيوز,Samana Barari Views,2835.0,سمانا براري فيوز,Samana Barari Views,Majan,ماجان,,,,,,,غرفتين,2 B/R,1,129.61,1700000.0,13116.27,,,1.0,1.0,0.0
444968,1-102-2023-44170,2023-09-06,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,ذا في تاور,THE V TOWER,2199.0,ذا في تاور,THE V TOWER,Residential Complex,ريزيدينتشل كموبليكس,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,غرفتين,2 B/R,1,85.34,780802.0,9149.31,,,1.0,2.0,0.0


In [84]:
# Inspecting the procedure_area statistics where rooms is "2 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '2 B/R']['procedure_area'].describe()

count    99527.000000
mean       122.353789
std         37.597718
min          0.010000
25%        101.880000
50%        116.120000
75%        135.160000
max       4061.930000
Name: procedure_area, dtype: float64

In [85]:
# Displaying random observations where room is "3 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '3 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1093450,1-102-2022-35023,2022-11-01,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,,,2177.0,المرابع العربية ااا- ربى,Arabian Ranches lll - Ruba,,,مجمع حمدان الرياضي,Hamdan Sports Complex,,,,,ثلاث غرف,3 B/R,0,144.84,1720000.0,11875.17,,,1.0,2.0,0.0
1260088,1-11-2024-32358,2024-09-02,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,351,الثنيه الثالثة,Al Thanyah Third,,,1062.0,البحيرات زلال,The Lakes Zulal,,,,,,,,,ثلاث غرف,3 B/R,0,352.59,5850000.0,16591.51,,,2.0,1.0,0.0
33584,1-102-2022-24895,2022-08-29,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,469,اليفره 1,Al Yufrah 1,,,2303.0,ذا فالي - تاليا,The Valley - Talia,,,,,,,,,ثلاث غرف,3 B/R,0,174.0,1500888.0,8625.79,,,1.0,1.0,0.0
74123,1-102-2022-10028,2022-04-14,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,469,اليفره 1,Al Yufrah 1,,,2303.0,ذا فالي - تاليا,The Valley - Talia,,,,,,,,,ثلاث غرف,3 B/R,0,174.0,1500888.0,8625.79,,,1.0,1.0,0.0
802752,1-102-2024-12125,2024-03-05,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,531,الحبيه السادسة,Al Hebiah Sixth,,,2699.0,مدن الرنيم 7,Mudon Al Ranim 7,Mudon,مدن,,,,,,,ثلاث غرف,3 B/R,0,168.0,2545000.0,15148.81,,,1.0,1.0,0.0


In [86]:
# Inspecting the procedure_area statistics where rooms is "3 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '3 B/R']['procedure_area'].describe()

count    57485.000000
mean       194.147744
std        164.443815
min          1.820000
25%        149.860000
50%        174.000000
75%        206.770000
max      35703.000000
Name: procedure_area, dtype: float64

In [87]:
# Displaying random observations where room is "4 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '4 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
529332,1-102-2023-48558,2023-09-19,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,2503.0,مدينة محمد بن راشد آل مكتوم، ديستريكت 11 - أوب...,"Mohammed Bin Rashid Al Maktoum City, District ...",Mohammed Bin Rashid AL Maktoum District 11,مدينة محمد بن راشد آل مكتوم – ديستركت 11,,,,,,,أربع غرف,4 B/R,0,256.43,4273800.0,16666.54,,,1.0,1.0,0.0
1264227,1-11-2023-38575,2023-11-24,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,الصدف 8,Sadaf 8,,,,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,برج العرب,Burj Al Arab,مساكن شاطئ الجميرا,Jumeirah Beach Resdency,مارينا مول,Marina Mall,أربع غرف,4 B/R,1,380.38,4300000.0,11304.48,,,1.0,1.0,0.0
168639,1-102-2024-1335,2024-01-16,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,390,برج خليفة,Burj Khalifa,سيتي سنتر رزيدنسز,City Center Residences,2359.0,سيتي سنتر رزيدنسز,City Center Residences,Burj Khalifa,برج خليفة,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,أربع غرف,4 B/R,1,343.25,11417700.0,33263.51,,,1.0,1.0,0.0
504114,1-11-2024-30623,2024-08-20,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,,,,Al Furjan,الفرجان,موقع إكسبو 2020,Expo 2020 Site,محطة مترو ابن بطوطة,Ibn Battuta Metro Station,ابن بطوطة مول,Ibn-e-Battuta Mall,أربع غرف,4 B/R,0,177.54,3400000.0,19150.61,,,2.0,2.0,0.0
180424,1-102-2023-42859,2023-08-29,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,,,2609.0,المرابع العربية 3 - انيا 2,Arabian Ranches lll - Anya 2,,,,,,,,,أربع غرف,4 B/R,0,273.17,3218888.0,11783.46,,,1.0,2.0,0.0


In [88]:
# Inspecting the procedure_area statistics where rooms is "4 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '4 B/R']['procedure_area'].describe()

count    23743.000000
mean       292.998563
std        139.509758
min          2.800000
25%        225.580000
50%        261.210000
75%        323.205000
max       1842.470000
Name: procedure_area, dtype: float64

In [89]:
# Displaying random observations where room is "5 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '5 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
489179,1-102-2024-71392,2024-09-13,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,459,مجمع دبي للاستثمار الثاني,Dubai Investment Park Second,,,3169.0,داماك ريفرسايد - لاش,DAMAC RIVERSIDE - LUSH,,,,,,,,,خمس غرف,5 B/R,0,220.75,3691000.0,16720.27,,,1.0,1.0,0.0
919771,1-11-2023-18132,2023-06-12,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,,,1645.0,مابل,MAPLE,DUBAI HILLS - MAPLE 1,دبي هيليز - مابل 1,موتور سيتي,Motor City,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,خمس غرف,5 B/R,0,345.0,4700000.0,13623.19,,,1.0,1.0,0.0
1001005,1-110-2024-309,2024-05-15,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,348,القوز الرابعه,Al Goze Fourth,,,1527.0,الخيل هايتس,AL KHAIL HEIGHTS,,,وسط مدينة دبي,Downtown Dubai,محطة مترو نور بنك,Noor Bank Metro Station,مول دبي,Dubai Mall,خمس غرف,5 B/R,0,809.01,3250000.0,4017.26,,,2.0,2.0,2.0
93986,1-102-2021-7622,2021-05-31,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,317,جميرا الاولى,Jumeirah First,,,2167.0,سور لامير,Sur La Mer,LA MER,لامير,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,خمس غرف,5 B/R,0,755.87,13000000.0,17198.72,,,1.0,1.0,0.0
944987,1-102-2022-26357,2022-09-06,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2367.0,النبض الشاطئ 3,The Pulse Beachfront 3,Dubai South Residential District,المدينة السكنية بدبي الجنوب,,,,,,,خمس غرف,5 B/R,0,369.21,3100000.0,8396.31,,,1.0,1.0,0.0


In [90]:
# Inspecting the procedure_area statistics where rooms is "5 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '5 B/R']['procedure_area'].describe()

count    2299.000000
mean      494.483223
std       369.374263
min        41.280000
25%       320.770000
50%       396.100000
75%       513.430000
max      6170.460000
Name: procedure_area, dtype: float64

In [91]:
# Displaying random observations where room is "6 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '6 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1291482,1-11-2023-36120,2023-11-08,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,348,القوز الرابعه,Al Goze Fourth,,,1527.0,الخيل هايتس,AL KHAIL HEIGHTS,,,وسط مدينة دبي,Downtown Dubai,محطة مترو نور بنك,Noor Bank Metro Station,مول دبي,Dubai Mall,ست غرف,6 B/R,0,716.97,2250000.0,3138.21,,,1.0,1.0,0.0
285250,1-102-2023-38316,2023-08-10,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,413,جزر العالم,World Islands,,,2588.0,جزيرة زايا زوها,ZAYA ZUHA ISLAND,The World,جزر العالم,,,,,,,ست غرف,6 B/R,0,3436.79,78785000.0,22924.01,,,1.0,1.0,0.0
612493,1-102-2024-87354,2024-10-17,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,3051.0,خليج الجنوب 6 بريميوم,South Bay 6 Premium,Dubai South Residential District,المدينة السكنية بدبي الجنوب,,,,,,,ست غرف,6 B/R,0,800.0,16900000.0,21125.0,,,1.0,1.0,0.0
439385,1-102-2023-48015,2023-09-18,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,410,نخلة جميرا,Palm Jumeirah,,,2580.0,ايومي,EOME,Palm Jumeirah,نخلة جميرا,برج العرب,Burj Al Arab,مساكن شاطئ الجميرا,Jumeirah Beach Resdency,مارينا مول,Marina Mall,ست غرف,6 B/R,0,1700.68,70000000.0,41160.01,,,1.0,1.0,0.0
58453,1-102-2024-40597,2024-06-07,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,485,معيصم الأول,Me'Aisem First,,,2877.0,تيرا غولف كوليكشن,Terra Golf Collection,Jumeirah Golf Estates,جميرا غولف للعقارات,,,,,,,ست غرف,6 B/R,0,296.14,7534000.0,25440.67,,,1.0,1.0,0.0


In [92]:
# Inspecting the procedure_area statistics where rooms is "6 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '6 B/R']['procedure_area'].describe()

count     173.00000
mean     1204.49422
std       966.43311
min       226.57000
25%       691.51000
50%       840.79000
75%      1612.74000
max      8073.00000
Name: procedure_area, dtype: float64

In [93]:
# Displaying random observations where room is "7 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '7 B/R'].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
569815,1-102-2024-54239,2024-07-25,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2711.0,خليج الجنوب 3 بريميوم,South Bay 3 Premium,Dubai South Residential District,المدينة السكنية بدبي الجنوب,,,,,,,سبع غرف,7 B/R,0,1088.76,20000000.0,18369.52,,,1.0,1.0,0.0
1037251,1-102-2023-29855,2023-06-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,390,برج خليفة,Burj Khalifa,برج خليفة ريزدنس,Burj Khalifa Residences,2443.0,ذا ريزيدنس | برج خليفة,THE RESIDENCE | Burj Khalifa,Burj Khalifa,برج خليفة,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,سبع غرف,7 B/R,1,1477.88,56349888.0,38128.87,,,1.0,1.0,0.0
151520,1-11-2020-3721,2020-04-30,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,410,نخلة جميرا,Palm Jumeirah,,,1753.0,فيلات كلوب,CLUB VILLAS,Palm Jumeirah,نخلة جميرا,برج العرب,Burj Al Arab,مينا السياحي,Mina Seyahi,مارينا مول,Marina Mall,سبع غرف,7 B/R,0,1627.7,29360000.0,18037.72,,,1.0,1.0,0.0
1111507,1-102-2024-49333,2024-07-11,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,,,2711.0,خليج الجنوب 3 بريميوم,South Bay 3 Premium,Dubai South Residential District,المدينة السكنية بدبي الجنوب,,,,,,,سبع غرف,7 B/R,0,1121.8,20000000.0,17828.49,,,1.0,1.0,0.0
1289819,1-102-2019-3647,2019-03-07,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,410,نخلة جميرا,Palm Jumeirah,,,1753.0,فيلات كلوب,CLUB VILLAS,Palm Jumeirah,نخلة جميرا,برج العرب,Burj Al Arab,مينا السياحي,Mina Seyahi,مارينا مول,Marina Mall,سبع غرف,7 B/R,0,1485.85,20267500.0,13640.34,,,1.0,3.0,0.0


In [94]:
# Inspecting the procedure_area statistics where rooms is "7 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '7 B/R']['procedure_area'].describe()

count      31.000000
mean     1619.013548
std       390.406385
min       969.960000
25%      1398.040000
50%      1696.240000
75%      1759.760000
max      2418.520000
Name: procedure_area, dtype: float64

In [95]:
# Displaying random observations where room is "8 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '8 B/R']

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
1159896,1-102-2020-4652,2020-06-04,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,330,مرسى دبي,Marsa Dubai,Jumeirah Gate Tower 1,Jumeirah Gate Tower 1,1790.0,جميرا غيت,Jumeirah Gate,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,برج العرب,Burj Al Arab,مساكن شاطئ جميرا,Jumeirah Beach Residency,مارينا مول,Marina Mall,ثمان غرف,8 B/R,1,399.46,39387064.0,98600.77,,,1.0,1.0,0.0


In [96]:
# Displaying random observations where room is "9 B/R"
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'] == '9 B/R']

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
137390,1-102-2023-5259,2023-02-14,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,527,جزيرة 2,Island 2,بولغاري لايتهاوس دبي,BULGARI Lighthouse Dubai,2507.0,بولغري لايتهاوس دبي,Bulgari Lighthouse Dubai,,,,,,,,,تسع غرف,9 B/R,1,3620.45,410000000.0,113245.59,,,1.0,1.0,0.0


Upon reviewing the distribution of property sizes and room configurations, I observed that the dataset includes only single occurrences for both 8-bedroom and 9-bedroom units. Due to their extreme values in both size and worth, these records appear as outliers with limited representativeness for broader market analysis. 

To maintain a dataset that better reflects common configurations and provides reliable trends for investment analysis, I will remove these entries. This decision helps ensure that our analysis remains focused on prevalent property types that offer consistent patterns in pricing and market trends.

In [97]:
# Removing 8-bedroom and 9-bedroom units
transactions_residential_sales_5y = transactions_residential_sales_5y[
    ~transactions_residential_sales_5y['rooms_en'].isin(['8 B/R', '9 B/R'])
]

# Verifying removal
transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False)

rooms_en
1 B/R        149522
2 B/R         99527
Studio        69030
3 B/R         57485
4 B/R         23743
NaN           17957
5 B/R          2299
PENTHOUSE       240
6 B/R           173
7 B/R            31
Name: count, dtype: int64

To ensure consistency in property sizes across room types, I’m applying an outlier removal process based on procedure_area for each room type. By filtering out extreme values that don’t align with typical room sizes, I can create a dataset that accurately reflects realistic property dimensions, which will support more reliable analysis and modeling.

In [98]:
# Define a function to remove outliers by room type while retaining rows with missing room types
def remove_outliers_by_room_type(df, room_col='rooms_en', area_col='procedure_area'):
    # Initialize an empty list to store data after filtering
    filtered_data = []
    
    # Get unique non-null room types
    room_types = df[room_col].dropna().unique()
    
    # Loop through each room type to filter outliers
    for room_type in room_types:
        # Subset for the current room type
        subset = df[(df[room_col] == room_type)]
        
        # Calculate IQR
        Q1 = subset[area_col].quantile(0.25)
        Q3 = subset[area_col].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        
        # Filter inliers for the current room type
        inliers = subset[(subset[area_col] >= lower_bound) & (subset[area_col] <= upper_bound)]
        filtered_data.append(inliers)
    
    # Concatenate all inliers and add rows with missing room types
    result = pd.concat(filtered_data)
    result = pd.concat([result, df[df[room_col].isna()]])
    
    return result

In [99]:
# Show value counts before removing outliers
print("Value counts before removing outliers:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

# Apply the function to your dataset
transactions_residential_sales_5y = remove_outliers_by_room_type(transactions_residential_sales_5y)

# Display the cleaned dataset summary
print("Value counts after removing outliers:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

Value counts before removing outliers:
rooms_en
1 B/R        149522
2 B/R         99527
Studio        69030
3 B/R         57485
4 B/R         23743
NaN           17957
5 B/R          2299
PENTHOUSE       240
6 B/R           173
7 B/R            31
Name: count, dtype: int64
Value counts after removing outliers:
rooms_en
1 B/R        142519
2 B/R         94875
Studio        66046
3 B/R         52397
4 B/R         21889
NaN           17957
5 B/R          2055
PENTHOUSE       231
6 B/R           164
7 B/R            29
Name: count, dtype: int64


After applying outlier removal to the `procedure_area` for each room type, I observed a reduction across categories while retaining essential data integrity. Common room types like **1 B/R**, **2 B/R**, **Studio**, and **3 B/R** showed modest decreases, reflecting the removal of extreme values while preserving the core of these categories. Larger room types, such as **4 B/R** and **5 B/R**, had slightly more pronounced reductions, aligning with their inherent variability in area size. Notably, rarer categories like **PENTHOUSE**, **6 B/R**, and **7 B/R** experienced minimal change, maintaining a balanced dataset for further analysis. Importantly, the count of missing values in `rooms_en` remained unchanged, allowing for focused imputation in the next steps. This process has streamlined the dataset and enhanced its reliability for analysis and modeling.

Now, I'll investigate the missing values

In [100]:
# Displaying random observations where the rooms information is null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
612185,1-41-2020-3903,2020-07-14,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,,,1389.0,دماك هيلز - ذا فيليد,DAMAC HILLS - THE FIELD,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,,,0,688.0,6019000.0,8748.55,,,1.0,1.0,0.0
614877,1-11-2019-5589,2019-06-18,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,442,البرشاء جنوب الخامسة,Al Barsha South Fifth,,,,,,Jumeirah Village Triangle,قرية جميرا المثلثة,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,,,0,180.0,1050000.0,5833.33,,,2.0,1.0,0.0
1108802,1-11-2023-24002,2023-07-31,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,,,1396.0,دماك هيلز - روك وود,DAMAC HILLS - ROCKWOOD,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,,,0,253.0,2760000.0,10909.09,,,1.0,1.0,0.0
1213432,1-11-2022-11413,2022-05-26,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,,,1400.0,المرابع العربيه - ليلا,Arabian Ranches - Lila Community,Arabian Ranches II - LILA,المرابع العربية 2 - ليلا,موتور سيتي,Motor City,,,,,,,0,432.0,3700000.0,8564.81,,,1.0,1.0,0.0
493913,1-11-2024-16367,2024-05-14,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,300,الراشديه,Al Rashidiya,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,,,0,576.0,1800000.0,3125.0,,,11.0,1.0,0.0


In [101]:
# Inspecting the procedure area statistics where rooms information is null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()]['procedure_area'].describe()

count    17957.000000
mean       651.313812
std        656.531250
min          1.280000
25%        341.530000
50%        580.870000
75%        775.110000
max      40282.740000
Name: procedure_area, dtype: float64

In [156]:
# Inspecting the procedure area statistics where rooms information is not null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].notnull()]['procedure_area'].describe()

count    393755.000000
mean        122.417895
std         127.286050
min          12.920000
25%          62.000000
50%          84.810000
75%         140.190000
max        2839.110000
Name: procedure_area, dtype: float64

In [102]:
# # Inspecting the procedure area statistics by property type where rooms information is null
print("Procedure Area Statistics by Unit:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Unit")]['procedure_area'].describe())

print("\nProcedure Area Statistics by Villa:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Villa")]['procedure_area'].describe())

Procedure Area Statistics by Unit:
count     319.000000
mean      108.870940
std        94.388486
min        34.700000
25%        68.780000
50%        81.470000
75%       119.220000
max      1553.460000
Name: procedure_area, dtype: float64

Procedure Area Statistics by Villa:
count    17638.000000
mean       661.124407
std        658.217664
min          1.280000
25%        358.400000
50%        593.990000
75%        783.710000
max      40282.740000
Name: procedure_area, dtype: float64


TTo ensure the dataset reflects typical property sizes in Dubai, I applied realistic size boundaries for each property type. Villas were filtered to include only those between 140 and 2,500 square meters, while units were limited to 30 to 300 square meters. This method avoids the limitations of the IQR method, capturing Dubai’s usual range of property sizes and helping refine the dataset for further imputation and analysis.

In [157]:
# Define boundaries for villas and units
villa_min_size = 140
villa_max_size = 2500
unit_min_size = 30
unit_max_size = 300

# Apply the range filter based on property type
transactions_residential_sales_5y = transactions_residential_sales_5y[
    ~(
        ((transactions_residential_sales_5y['property_type_en'] == 'Villa') &
         ((transactions_residential_sales_5y['procedure_area'] < villa_min_size) |
          (transactions_residential_sales_5y['procedure_area'] > villa_max_size))) |
        
        ((transactions_residential_sales_5y['property_type_en'] == 'Unit') &
         ((transactions_residential_sales_5y['procedure_area'] < unit_min_size) |
          (transactions_residential_sales_5y['procedure_area'] > unit_max_size)))
    )
]

# Display result to verify the filtering
transactions_residential_sales_5y['property_type_en'].value_counts(), transactions_residential_sales_5y['procedure_area'].describe()

(property_type_en
 Unit     327077
 Villa     58923
 Name: count, dtype: int64,
 count    386000.000000
 mean        125.773924
 std         136.964578
 min          30.000000
 25%          62.550000
 50%          84.760000
 75%         142.230000
 max        2488.510000
 Name: procedure_area, dtype: float64)

In [158]:
# Apply the function separately for 'Villa' and 'Unit'
transactions_residential_sales_5y = remove_outliers_by_property_type(transactions_residential_sales_5y, 'Villa')
transactions_residential_sales_5y = remove_outliers_by_property_type(transactions_residential_sales_5y, 'Unit')

# Verify the result by checking the count of missing values in `rooms_en`
transactions_residential_sales_5y['rooms_en'].isnull().sum()

3239

In [105]:
# Inspecting the procedure area statistics by property type where rooms information is null
print("Procedure Area Statistics by Unit:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Unit")]['procedure_area'].describe())

print("\nProcedure Area Statistics by Villa:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Villa")]['procedure_area'].describe())

Procedure Area Statistics by Unit:
count    286.000000
mean      90.845699
std       26.589753
min       34.700000
25%       68.660000
50%       81.470000
75%      119.220000
max      154.610000
Name: procedure_area, dtype: float64

Procedure Area Statistics by Villa:
count    16983.000000
mean       592.489155
std        303.097299
min          1.280000
25%        345.235000
50%        570.600000
75%        747.400000
max       1421.420000
Name: procedure_area, dtype: float64


In [106]:
# inspecting all the values below 10
transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Villa") &
                                  (transactions_residential_sales_5y['procedure_area'] < 10)]

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
37493,1-11-2019-8080,2019-09-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,1368.0,حدائق جميرا -7B&C,JUMEIRAH PARK 7B&C,Jumeirah Park,جميرا بارك,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,,,0,5.1,30000.0,5882.35,,,1.0,1.0,0.0
305926,1-11-2020-3828,2020-05-11,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,,,,,,Al Waha Villas,فلل الواحة,دورة دبي للدراجات,Dubai Cycling Course,,,,,,,0,1.28,6642.0,5189.06,,,1.0,1.0,0.0
558502,1-11-2021-20021,2021-11-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,351,الثنيه الثالثة,Al Thanyah Third,,,,,,Lakes - Hattan I,البحيرات - حتان 1,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,,,0,9.01,502598.0,55782.24,,,1.0,1.0,0.0
733906,1-11-2019-3332,2019-04-04,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,267,الرفاعه,Al Raffa,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو الفهيدي,Al Fahidi Metro Station,مول دبي,Dubai Mall,,,0,2.18,22056.0,10117.43,,,1.0,1.0,0.0
764449,1-11-2023-8980,2023-03-28,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,318,جميرا الثالثه,Jumeirah Third,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو نور بنك,Noor Bank Metro Station,مول دبي,Dubai Mall,,,0,7.91,50000.0,6321.11,,,1.0,1.0,0.0
785627,1-11-2023-42849,2024-01-23,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,233,هور العنز,Hor Al Anz,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو أبو هيل,Abu Hail Metro Station,سيتي سنتر مردف,City Centre Mirdif,,,0,8.55,58045.0,6788.89,,,1.0,1.0,0.0
862316,1-11-2023-21888,2023-07-12,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,398,العوير الأولى,Al Aweer First,,,,,,,,,,,,,,,,0,6.94,100000.0,14409.22,,,1.0,1.0,0.0
916731,1-11-2019-8078,2019-09-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,1368.0,حدائق جميرا -7B&C,JUMEIRAH PARK 7B&C,Jumeirah Park,جميرا بارك,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,,,0,5.1,30000.0,5882.35,,,1.0,1.0,0.0
1049756,1-11-2022-8971,2022-04-26,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,1383.0,تاون هاوسز في جزر الجميرا,TOWNHOUSES AT JUMEIRAH ISLANDS,Jumeirah Islands,جزر جميرا,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو مارينا مول,Marina Mall Metro Station,مارينا مول,Marina Mall,,,0,2.24,87360.0,39000.0,,,1.0,1.0,0.0
1137385,1-11-2024-24571,2024-07-05,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,230,ابو هيل,Abu Hail,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو أبو بكر الصديق,Abu Baker Al Siddique Metro Station,,,,,0,2.75,24864.0,9041.59,,,13.0,1.0,0.0


In [107]:
# Removing all the procedure area below 10 where room is null and property type is villa
transactions_residential_sales_5y = transactions_residential_sales_5y[
    ~((transactions_residential_sales_5y['rooms_en'].isnull()) &
      (transactions_residential_sales_5y['property_type_en'] == "Villa") &
      (transactions_residential_sales_5y['procedure_area'] < 10))
]

# Inspecting the procedure area statistics by property type where rooms information is null
print("Procedure Area Statistics by Unit:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Unit")]['procedure_area'].describe())

print("\nProcedure Area Statistics by Villa:")
print(transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) &
                                  (transactions_residential_sales_5y['property_type_en'] == "Villa")]['procedure_area'].describe())

Procedure Area Statistics by Unit:
count    286.000000
mean      90.845699
std       26.589753
min       34.700000
25%       68.660000
50%       81.470000
75%      119.220000
max      154.610000
Name: procedure_area, dtype: float64

Procedure Area Statistics by Villa:
count    16971.000000
mean       592.904207
std        302.802103
min         10.450000
25%        346.630000
50%        570.660000
75%        747.400000
max       1421.420000
Name: procedure_area, dtype: float64


In [108]:
# Removing observation where procedure area is equal or less than 10
transactions_residential_sales_5y[
    transactions_residential_sales_5y['procedure_area'] <= 10].shape

(0, 35)

In [155]:
# Inspecting procedure area statistics where room information is not null & property type is villa
transactions_residential_sales_5y[
    (transactions_residential_sales_5y['rooms_en'].notnull()) &
    (transactions_residential_sales_5y['property_type_en'] == "Villa")]['procedure_area'].describe()

count    62248.000000
mean       299.635676
std        224.949826
min         12.920000
25%        165.627500
50%        224.400000
75%        324.980000
max       2839.110000
Name: procedure_area, dtype: float64

In [109]:
# Inspecting property type where the rooms information is null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()]['property_type_en'].value_counts()

property_type_en
Villa    16971
Unit       286
Name: count, dtype: int64

In [110]:
# Inspecting regsitration type where the rooms information is null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()]['reg_type_en'].value_counts()

reg_type_en
Existing Properties    16995
Off-Plan Properties      262
Name: count, dtype: int64

In [111]:
# Displaying random observations where the rooms is null & the property type is "Unit"
transactions_residential_sales_5y[
    (transactions_residential_sales_5y['rooms_en'].isnull()) & 
    (transactions_residential_sales_5y['property_type_en'] == 'Unit')
].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
728363,1-102-2022-15011,2022-06-07,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,بن غاطي جاسمين,Binghatti Jasmine,2386.0,بن غاطي جازمين,Binghatti Jasmine,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,,,1,89.78,682000.0,7596.35,,,1.0,1.0,0.0
603598,1-102-2022-18815,2022-07-12,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,بن غاطي جاسمين,Binghatti Jasmine,2386.0,بن غاطي جازمين,Binghatti Jasmine,Jumeirah Village Circle,قرية جميرا الدائرية,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,,,1,68.78,510000.0,7414.95,,,1.0,1.0,0.0
1048120,1-102-2024-458,2024-02-16,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,364,الوصل,Al Wasl,Aسنترال بارك بلازا برج,Central Park Plaza Tower A,2796.0,سنترال بارك بلازا,Central Park Plaza,City Walk,ستي ووك,,,,,,,,,1,81.47,2747000.0,33717.93,,,1.0,1.0,0.0
385168,1-102-2024-6781,2024-02-05,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,364,الوصل,Al Wasl,Aسنترال بارك بلازا برج,Central Park Plaza Tower A,2796.0,سنترال بارك بلازا,Central Park Plaza,City Walk,ستي ووك,,,,,,,,,1,119.22,3796000.0,31840.3,,,1.0,1.0,0.0
1151917,1-102-2023-67616,2024-01-04,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,364,الوصل,Al Wasl,Aسنترال بارك بلازا برج,Central Park Plaza Tower A,2796.0,سنترال بارك بلازا,Central Park Plaza,City Walk,ستي ووك,,,,,,,,,1,81.47,2781000.0,34135.26,,,1.0,1.0,0.0


In [112]:
# Inspecting area names where the rooms information is null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()]['area_name_en'].value_counts()

area_name_en
Al Thanyah Fifth                     1776
Jabal Ali First                      1229
Al Hebiah Third                      1218
Wadi Al Safa 6                       1174
Al Thanayah Fourth                    886
Al Hebiah Sixth                       843
Al Barsha South Fifth                 751
Hadaeq Sheikh Mohammed Bin Rashid     736
Wadi Al Safa 7                        724
Palm Jumeirah                         704
Wadi Al Safa 5                        668
Al Hebiah Fourth                      606
Me'Aisem First                        596
Al Barsha South Fourth                488
Al Merkadh                            454
Wadi Al Safa 3                        408
Nad Al Shiba First                    396
Al Warsan First                       385
Wadi Al Safa 2                        322
Al Thanyah Third                      219
Mirdif                                202
Al Rashidiya                          197
Al Wasl                               191
Al Barshaa South Seco

For each area and property type, calculate the median or mode of `rooms_en`. Villas and units may have different typical room counts in each area. Using the median or mode by area can help ensure the imputed values align with realistic configurations for that specific area.

In [113]:
# Calculate the mode of rooms_en for each combination of area and property type
area_property_mode = (transactions_residential_sales_5y.groupby(['area_name_en', 'property_type_en'])['rooms_en']
                      .agg(lambda x: x.mode().iloc[0] if not x.mode().empty else np.nan))

area_property_mode


area_name_en                       property_type_en
Abu Hail                           Villa                  NaN
Al Aweer First                     Villa                  NaN
Al Aweer Second                    Villa                  NaN
Al Bada                            Villa                  NaN
Al Baraha                          Villa                  NaN
Al Barsha First                    Unit                 1 B/R
                                   Villa                  NaN
Al Barsha Second                   Villa                  NaN
Al Barsha South Fifth              Unit                 1 B/R
                                   Villa                3 B/R
Al Barsha South Fourth             Unit                 1 B/R
                                   Villa                3 B/R
Al Barsha Third                    Villa                  NaN
Al Barshaa South First             Villa                  NaN
Al Barshaa South Second            Unit                 1 B/R
                  

In [114]:
def impute_rooms(row):
    if pd.isna(row['rooms_en']):  # Only fill in missing values
        area = row['area_name_en']
        property_type = row['property_type_en']
        mode_value = area_property_mode.get((area, property_type), np.nan)
        return mode_value
    return row['rooms_en']

In [115]:
print("Unique values count in 'rooms_en' column before imputation:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

# Apply the function to fill in the missing rooms_en values
transactions_residential_sales_5y['rooms_en'] = transactions_residential_sales_5y.apply(impute_rooms, axis=1)

print("\nUnique values count in 'rooms_en' column after imputation:")
print(transactions_residential_sales_5y['rooms_en'].value_counts(dropna=False))

Unique values count in 'rooms_en' column before imputation:
rooms_en
1 B/R        142519
2 B/R         94875
Studio        66046
3 B/R         52397
4 B/R         21889
NaN           17257
5 B/R          2055
PENTHOUSE       231
6 B/R           164
7 B/R            29
Name: count, dtype: int64

Unique values count in 'rooms_en' column after imputation:
rooms_en
1 B/R        142661
2 B/R         95346
Studio        66046
3 B/R         59930
4 B/R         26584
NaN            3707
5 B/R          2060
7 B/R           733
PENTHOUSE       231
6 B/R           164
Name: count, dtype: int64


In [116]:
# Calculate the mode of rooms_ar for each combination of area and property type
area_property_mode = (transactions_residential_sales_5y.groupby(['area_name_ar', 'property_type_ar'])['rooms_ar']
                      .agg(lambda x: x.mode().iloc[0] if not x.mode().empty else np.nan))

area_property_mode


area_name_ar               property_type_ar
ابو هيل                    فيلا                     NaN
البدع                      فيلا                     NaN
البراحه                    فيلا                     NaN
البرشاء الاولى             فيلا                     NaN
                           وحدة                    غرفة
البرشاء الثالثه            فيلا                     NaN
البرشاء الثانيه            فيلا                     NaN
البرشاء جنوب الاولى        فيلا                     NaN
البرشاء جنوب الثالثة       وحدة                    غرفة
البرشاء جنوب الثانية       فيلا                     NaN
                           وحدة                    غرفة
البرشاء جنوب الخامسة       فيلا                ثلاث غرف
                           وحدة                    غرفة
البرشاء جنوب الرابعة       فيلا                ثلاث غرف
                           وحدة                    غرفة
الثنيه الأولى              وحدة                  غرفتين
الثنيه الثالثة             فيلا                ثلاث غرف
    

In [117]:
# Procedure area statistics by property type
print("Procedure area statistics by Units:")
print(transactions_residential_sales_5y[transactions_residential_sales_5y['property_type_en'] == 'Unit']['procedure_area'].describe())

print("\nProcedure area statistics by Villas:")
print(transactions_residential_sales_5y[transactions_residential_sales_5y['property_type_en'] == 'Villa']['procedure_area'].describe())

Procedure area statistics by Units:
count    331507.000000
mean         89.141215
std          52.323804
min          23.990000
25%          56.710000
50%          75.450000
75%         111.210000
max        2822.760000
Name: procedure_area, dtype: float64

Procedure area statistics by Villas:
count    65955.000000
mean       311.616082
std        242.167964
min         10.450000
25%        167.950000
50%        230.010000
75%        341.490000
max       2839.110000
Name: procedure_area, dtype: float64


In [118]:
def impute_rooms(row):
    if pd.isna(row['rooms_ar']):  # Only fill in missing values
        area = row['area_name_ar']
        property_type = row['property_type_ar']
        mode_value = area_property_mode.get((area, property_type), np.nan)
        return mode_value
    return row['rooms_ar']

In [119]:
print("Unique values count in 'rooms_ar' column before imputation:")
print(transactions_residential_sales_5y['rooms_ar'].value_counts(dropna=False))

# Apply the function to fill in the missing rooms_ar values
transactions_residential_sales_5y['rooms_ar'] = transactions_residential_sales_5y.apply(impute_rooms, axis=1)

print("\nUnique values count in 'rooms_ar' column after imputation:")
print(transactions_residential_sales_5y['rooms_ar'].value_counts(dropna=False))

Unique values count in 'rooms_ar' column before imputation:
rooms_ar
غرفة        142519
غرفتين       94875
استوديو      66046
ثلاث غرف     52397
أربع غرف     21889
NaN          17257
خمس غرف       2055
بنتهاوس        231
ست غرف         164
سبع غرف         29
Name: count, dtype: int64

Unique values count in 'rooms_ar' column after imputation:
rooms_ar
غرفة        142661
غرفتين       95346
استوديو      66046
ثلاث غرف     59930
أربع غرف     26584
NaN           3707
خمس غرف       2060
سبع غرف        733
بنتهاوس        231
ست غرف         164
Name: count, dtype: int64


In [120]:
# Dropping the building name columns
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=['building_name_en', 'building_name_ar'])

# Checking the shape of the dataset to confirm the columns are dropped
print("Transactions dataset shape after dropping building name columns:", transactions_residential_sales_5y.shape)

Transactions dataset shape after dropping building name columns: (397462, 33)


In [121]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
749903,1-11-2022-15697,2022-07-04,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,343,ورسان الاولى,Al Warsan First,231.0,برايم ريزدنسي 1,PRIME RESIDENCY1,International City Phase 1,المدينة العالمية - المرحلة الاولى,,,محطة مترو الراشدية,Rashidiya Metro Station,سيتي سنتر مردف,City Centre Mirdif,غرفتين,2 B/R,1,91.32,510000.0,5584.76,,,1.0,1.0,0.0
710414,1-11-2022-23957,2022-10-03,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,485,معيصم الأول,Me'Aisem First,436.0,ليكسايد,LAKESIDE,International Media Production Zone,المنطقة العالمية للإنتاج الإعلامي,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,عقارات داماك,Damac Properties,مارينا مول,Marina Mall,استوديو,Studio,1,33.93,215000.0,6336.58,,,1.0,1.0,0.0
546696,1-41-2021-5282,2021-06-01,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,1751.0,فير واي فيزتاس,FAIRWAY VISTAS,DUBAI HILLS - FAIRWAY VISTAS,دبي هيليز - فايرويز فيستا,قرية عالمية,Global Village,محطة مترو بنك أبوظبي الأول,First Abu Dhabi Bank Metro Station,مول الإمارات,Mall of the Emirates,أربع غرف,4 B/R,0,1253.63,10000000.0,7976.84,,,1.0,1.0,0.0
695238,1-102-2023-68524,2023-12-26,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,523,الحبية الثالثة,Al Hebiah Third,2779.0,داماك هيلز - جولف جرينز 1,DAMAC HILLS - GOLF GREENS 1,DAMAC HILLS,داماك هيليز,موتور سيتي,Motor City,,,,,غرفتين,2 B/R,1,124.94,2092000.0,16744.04,,,1.0,2.0,0.0
426096,1-11-2024-30888,2024-08-21,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,465,وادي الصفا 3,Wadi Al Safa 3,1612.0,الحقول في D11 - إم بي أر إم سي,THE FIELDS AT D11 - MBRMC,Mohammed Bin Rashid AL Maktoum District 11,مدينة محمد بن راشد آل مكتوم – ديستركت 11,آي إم جي وورلد أدفينتشرز,IMG World Adventures,,,,,أربع غرف,4 B/R,0,306.6,4930000.0,16079.58,,,1.0,2.0,0.0


In [122]:
# Checking the missing values eprcentages
print("Transactions dataset missing data percentage:")
transactions_residential_sales_5y.isnull().sum() / transactions_residential_sales_5y.shape[0] * 100

Transactions dataset missing data percentage:


transaction_id            0.000000
instance_date             0.000000
property_type_id          0.000000
property_type_ar          0.000000
property_type_en          0.000000
reg_type_id               0.000000
reg_type_ar               0.000000
reg_type_en               0.000000
area_id                   0.000000
area_name_ar              0.000000
area_name_en              0.000000
project_number           10.130277
project_name_ar          10.130277
project_name_en          10.130277
master_project_en        21.317510
master_project_ar        21.317510
nearest_landmark_ar      21.813154
nearest_landmark_en      21.813154
nearest_metro_ar         33.194117
nearest_metro_en         33.194117
nearest_mall_ar          33.555157
nearest_mall_en          33.555157
rooms_ar                  0.932668
rooms_en                  0.932668
has_parking               0.000000
procedure_area            0.000000
actual_worth              0.000000
meter_sale_price          0.000000
rent_value          

To streamline our analysis and focus on key variables, I plan to remove several columns that are unlikely to add substantial value to our predictive models or investor insights. First, I’ll remove the **nearest amenities** columns, which provide proximity information for landmarks, malls, and metro stations. While interesting in certain contexts, these values are inconsistent and often represent distances too far to be practical indicators of property value, so I don’t anticipate them contributing meaningfully to our analysis.

Additionally, I plan to exclude the **rent value** and **meter rent price** columns. Since our primary goal is forecasting sales prices rather than rental values, these columns would add unnecessary complexity without directly enhancing our insights into market trends or property valuation.

Lastly, I’ll remove the **number of parties** column. While this detail may be relevant in specific contractual analyses, it doesn’t hold significant relevance in forecasting property values or trends within the broader scope of this project. By removing these columns, I aim to keep our dataset focused and streamlined, concentrating on the most pertinent factors for achieving accurate and insightful results.

In [123]:
# Shape of the dataset before dropping
print("Transactions dataset shape before dropping columns:", transactions_residential_sales_5y.shape)

# Dropping Nearest Amenities columns
nearest_columns = [col for col in transactions_residential_sales_5y.columns if 'nearest' in col]
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=nearest_columns)

# Dropping rent columns
rent_columns = [col for col in transactions_residential_sales_5y.columns if 'rent' in col]
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=rent_columns)

# Dropping the Number of Parties columns
parties_columns = [col for col in transactions_residential_sales_5y.columns if 'parties' in col]
transactions_residential_sales_5y = transactions_residential_sales_5y.drop(columns=parties_columns)

# Checking the shape of the dataset after dropping columns
print("Transactions dataset shape after dropping columns:", transactions_residential_sales_5y.shape)


Transactions dataset shape before dropping columns: (397462, 33)


Transactions dataset shape after dropping columns: (397462, 22)


In [124]:
# Displaying random observations from the transactions dataset
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
941635,1-102-2019-6963,2019-05-13,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,317,جميرا الاولى,Jumeirah First,2093.0,بورت دو لامير - لا كوت,Port de La Mer - La Cote,LA MER,لامير,ثلاث غرف,3 B/R,1,201.89,4330000.0,21447.32
278877,1-102-2022-9978,2022-04-13,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,526,الخليج التجارى,Business Bay,2318.0,بينينسولا تو,Peninsula Two,Business Bay,الخليج التجاري,استوديو,Studio,1,37.92,751200.0,19810.13
54564,1-41-2024-18161,2024-09-11,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,469,اليفره 1,Al Yufrah 1,2196.0,ذا فالي - ايدن,The Valley - Eden,,,ثلاث غرف,3 B/R,0,174.0,2450000.0,14080.46
1218131,1-102-2024-78188,2024-09-27,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,447,الخيران الأولى,Al Khairan First,3229.0,خور دبي l العنوان رزيدنسز,Address Residences l Dubai Creek Harbour,The Lagoons,الخيران,غرفة,1 B/R,1,72.51,2074888.0,28615.2
719627,1-11-2022-25146,2022-10-13,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,1119.0,حديقة الإمارات 2 (القيقب - ماجنولا - التوت),EMIRATES GARDEN 2 (MAPLE- MAGNOLA - MULBERRY),Jumeirah Village Circle,قرية جميرا الدائرية,غرفة,1 B/R,1,93.73,500000.0,5334.47


In [125]:
# Setting np options to display all columns and rows
np.set_printoptions(threshold=np.inf)

In [126]:
# Inspecting the Project Name columns in the transactions dataset
print("Unique values in 'project_number' column is:")
print(transactions_residential_sales_5y['project_number'].nunique())
print(transactions_residential_sales_5y['project_number'].unique())

print("\nUnique values in 'project_name_en' column is:")
print(transactions_residential_sales_5y['project_name_en'].nunique())
print(transactions_residential_sales_5y['project_name_en'].unique())

print("\nUnique values in 'project_name_ar' column is:")
print(transactions_residential_sales_5y['project_name_ar'].nunique())
print(transactions_residential_sales_5y['project_name_ar'].unique())

Unique values in 'project_number' column is:
1792
[  nan 2782. 1693.   16.  227.  150. 1260.  390. 1664.  257.  430.  431.
 2797.   40. 1267. 2425. 2099. 1790. 1034. 1458. 1551.  845.  267.  948.
  207. 1051. 2065. 1607. 1561. 1405. 1898.  336. 1142.  323.   82. 2746.
 2845. 1462. 1340. 1660. 1277. 1773. 1695. 1816. 2053. 1507. 2469.  250.
 2359. 2996.  413. 1073. 1605. 1831.  513.  259. 2352.  129. 2371. 1315.
  295. 2360. 2192. 2129. 2379. 2587. 2139. 2499. 2702. 2662. 2578. 2304.
 2862. 2204. 2673. 2655. 2659. 2693. 2321. 2216. 2233. 1888. 2100. 2825.
 2737. 2478. 2837. 3081. 1306. 2571. 2412. 2574. 2537.  815. 2675. 2197.
 2722. 3048. 1541. 1796. 2535. 2353. 2635. 2715. 2640. 1706. 2637.  210.
 2636.  231.  603.   48.  464.  463.  596.  172.  816. 1934.  732.  360.
 2149. 2505. 1643. 1537. 2647. 2527.  162.  388. 2465. 2707. 1583. 1297.
 1617. 2185.  422.  439. 2429. 2406. 2449. 1779.   47. 2112. 1811. 2475.
 1580. 1732. 1558. 2887. 1747. 2213. 2516. 1495. 2190. 2815. 2344. 2437.
 

In [127]:
# Let's check the missing values count in each project column to see if they are the same
print("Missing values count in 'project_number' column is:")
print(transactions_residential_sales_5y['project_number'].isnull().sum())

print("\nMissing values count in 'project_name_en' column is:")
print(transactions_residential_sales_5y['project_name_en'].isnull().sum())

print("\nMissing values count in 'project_name_ar' column is:")
print(transactions_residential_sales_5y['project_name_ar'].isnull().sum())

Missing values count in 'project_number' column is:
40264

Missing values count in 'project_name_en' column is:
40264

Missing values count in 'project_name_ar' column is:
40264


In [128]:
# Inspecting the Master Project Name columns in the transactions dataset
print("Unique values in 'master_project_en' column is:")
print(transactions_residential_sales_5y['master_project_en'].nunique())
print(transactions_residential_sales_5y['master_project_en'].unique())

print("\nUnique values in 'master_project_ar' column is:")
print(transactions_residential_sales_5y['master_project_ar'].nunique())
print(transactions_residential_sales_5y['master_project_ar'].unique())


Unique values in 'master_project_en' column is:
145
[nan 'TECOM Site B' 'Jumeirah Lakes Towers' 'Dubai Marina' 'The Greens'
 'Palm Jumeirah' 'International City Phase 1' 'Dubiotech'
 'Jumeriah Beach Residence  - JBR' 'Springs - 5' 'Springs - 2'
 'Jumeirah Islands' 'Dubai Sports City' 'Burj Khalifa' 'Culture Village'
 'Arjan' 'Jumeirah Village Circle' 'Jumeirah Village Triangle'
 'Motor City' 'Dubai Health Care City Phase 2' 'Down Town Jabal Ali'
 'International Media Production Zone' 'Business Bay'
 'Dubai Maritime City' 'Meydan' 'DUBAI HILLS - PARK' 'Al Furjan'
 'SOBHA HARTLAND' 'Badra' 'City Walk' 'DUBAI HILLS'
 'Jumeirah Golf Estates' 'WARSAN FIRST DEVELOPMENT' 'The Lagoons' 'Wasl 1'
 'Wasl Gate' 'HADAEQ SHEIKH MOHAMMED BIN RASHID - DISRICT 7'
 'Business Park' 'LA MER' 'Mina Rashid' 'Dubai South Residential District'
 'Liwan' 'Discovery Gardens' 'Living Legends' 'DAMAC HILLS'
 'Silicon Oasis' 'TOWN SQUARE' 'Dubai Investment Park Second'
 'Dubai World Central' 'Rukan' 'Palm Deira' 'D

In [129]:
# Let's check the missing values count in each master project column to see if they are the same
print("Missing values count in 'master_project_en' column is:")
print(transactions_residential_sales_5y['master_project_en'].isnull().sum())

print("\nMissing values count in 'master_project_ar' column is:")
print(transactions_residential_sales_5y['master_project_ar'].isnull().sum())

Missing values count in 'master_project_en' column is:
84729

Missing values count in 'master_project_ar' column is:
84729


In [130]:
# Checking random observations where project name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['project_name_en'].isnull()].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
26492,1-11-2023-19552,2023-06-20,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,442,البرشاء جنوب الخامسة,Al Barsha South Fifth,,,,Jumeirah Village Triangle,قرية جميرا المثلثة,ثلاث غرف,3 B/R,0,240.0,2275000.0,9479.17
600707,1-11-2023-2158,2023-01-24,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,,,,Dubai Marina,دبي مارينا,غرفة,1 B/R,1,72.93,1565000.0,21458.93
185362,1-11-2021-1283,2021-01-27,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,374,ام الشيف,Um Al Sheif,,,,,,,,0,1393.55,14000000.0,10046.28
601312,1-11-2022-21899,2022-09-12,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,,,,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,غرفتين,2 B/R,1,127.81,1850000.0,14474.61
627538,1-11-2019-5576,2019-06-18,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,,Discovery Gardens,الحدائق المكتشفة,استوديو,Studio,0,44.0,330000.0,7500.0


In [131]:
# Inspecting the property type where the project name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['project_name_en'].isnull()]['property_type_en'].value_counts()

property_type_en
Unit     31202
Villa     9062
Name: count, dtype: int64

In [132]:
# Inspecting the registration type where the project name is null
transactions_residential_sales_5y[transactions_residential_sales_5y['project_name_en'].isnull()]['reg_type_en'].value_counts()

reg_type_en
Existing Properties    40238
Off-Plan Properties       26
Name: count, dtype: int64

In [133]:
# Displaying random observations where the project name is null & registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
300189,1-102-2019-19889,2019-11-12,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,,Living Legends,مجمع الأساطير الحية السكني,استوديو,Studio,1,45.42,366675.0,8072.99
694123,1-102-2019-9090,2022-07-25,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,,Living Legends,مجمع الأساطير الحية السكني,غرفتين,2 B/R,1,129.78,1124536.0,8664.95
382790,1-102-2021-1004,2021-02-01,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,,Living Legends,مجمع الأساطير الحية السكني,استوديو,Studio,1,46.31,300000.0,6478.08
1253042,1-102-2019-7506,2019-05-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,,Living Legends,مجمع الأساطير الحية السكني,غرفتين,2 B/R,1,145.41,1017367.0,6996.54
486287,1-102-2019-6310,2019-04-29,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,465,وادي الصفا 3,Wadi Al Safa 3,,,,Living Legends,مجمع الأساطير الحية السكني,غرفة,1 B/R,1,79.51,727463.0,9149.33


In [134]:
# Inspecting the area type where the project name is null & registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties')]['area_name_en'].value_counts()

area_name_en
Wadi Al Safa 3    26
Name: count, dtype: int64

In [135]:
# Inspecting the master project where the project name is null & registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties')]['master_project_en'].value_counts()

master_project_en
Living Legends    26
Name: count, dtype: int64

**Key Observations**:

1. **Missing Project Names by Property Type**:

	- Out of the missing project names, **34,551** are **unit** property types and **11,004** are **villa** property types.

2. **Missing Project Names by Registration Type**:

	- **45,527** of the missing project names correspond to **existing properties**.

	- Only **28** are associated with **off-plan properties**.

3. **Off-plan Insight**:

	- The master project "Living legends" is listed as off-plan, but I found that the project is actually complete, meaning it should be categorized as existing properties. After updating this, all missing project names and numbers will belong exclusively to existing properties.

In [136]:
# Inspecting registration type where the project name is null & propety type is "Villa"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['property_type_en'] == 'Villa')]['reg_type_en'].value_counts()

reg_type_en
Existing Properties    9062
Name: count, dtype: int64

In [137]:
# Displaying random observations where the project name is null & propety type is "Unit"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['property_type_en'] == 'Unit')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
206501,1-11-2024-37658,2024-10-07,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,351,الثنيه الثالثة,Al Thanyah Third,,,,The Greens,الروضة,غرفتين,2 B/R,1,135.45,2700000.0,19933.55
193452,1-11-2023-22861,2023-07-19,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,526,الخليج التجارى,Business Bay,,,,Business Bay,الخليج التجاري,ثلاث غرف,3 B/R,1,246.1,3250000.0,13206.01
753034,1-11-2023-8451,2023-03-23,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,526,الخليج التجارى,Business Bay,,,,Business Bay,الخليج التجاري,غرفة,1 B/R,1,84.5,1165000.0,13786.98
49457,1-11-2024-2031,2024-01-18,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,526,الخليج التجارى,Business Bay,,,,Business Bay,الخليج التجاري,ثلاث غرف,3 B/R,1,267.56,3650000.0,13641.8
972159,1-11-2021-13109,2021-07-29,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,,Jumeirah Lakes Towers,ابراج بحيرات الجميرا,غرفتين,2 B/R,1,149.84,1625000.0,10844.9


In [138]:
# Inspecting registratin type where the project name is null & propety type is "Unit"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['property_type_en'] == 'Unit')]['reg_type_en'].value_counts()

reg_type_en
Existing Properties    31176
Off-Plan Properties       26
Name: count, dtype: int64

In [139]:
# Inspecting project missing values count where registration type is "Existing Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Existing Properties')]['project_name_en'].isnull().sum()

40238

In [140]:
# Displaying random observations where project missing values count where registration type is "Existing Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Existing Properties')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
401769,1-41-2021-12703,2021-11-22,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,312,محيصنه الاولى,Muhaisanah First,,,,,,غرفتين,2 B/R,1,124.22,1671780.0,13458.22
1031105,1-11-2024-8276,2024-03-06,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,343,ورسان الاولى,Al Warsan First,,,,International City Phase 1,المدينة العالمية - المرحلة الاولى,استوديو,Studio,0,45.0,240000.0,5333.33
1236874,1-11-2023-23343,2023-07-24,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,,,,Jumeirah Village Circle,قرية جميرا الدائرية,ثلاث غرف,3 B/R,0,180.0,1350000.0,7500.0
1138522,1-11-2021-17271,2021-09-30,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,351,الثنيه الثالثة,Al Thanyah Third,,,,The Greens,الروضة,أربع غرف,4 B/R,0,224.03,2150000.0,9596.93
832947,1-11-2022-566,2022-01-13,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,350,الثنيه الخامسة,Al Thanyah Fifth,,,,Jumeirah Lakes Towers,ابراج بحيرات الجميرا,غرفة,1 B/R,1,64.32,580000.0,9017.41


In [141]:
# Inspecting project missing values count where registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties')]['project_name_en'].isnull().sum()

26

In [142]:
# Displaying random observations where registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].notnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
964181,1-102-2024-85533,2024-10-14,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,432,نخلة ديرة,Palm Deira,3167.0,أوشن برل باي اس دي,Ocean Pearl By SD,Palm Deira,نخلة ديرة,غرفة,1 B/R,1,81.16,1935200.0,23844.26
1174211,1-102-2024-15617,2024-03-13,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,2119.0,عزيزي ريفيرا 26,Azizi Riviera 26,,,استوديو,Studio,1,30.32,750400.0,24749.34
447825,1-102-2022-2009,2022-01-20,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,2177.0,المرابع العربية ااا- ربى,Arabian Ranches lll - Ruba,,,ثلاث غرف,3 B/R,0,146.08,1604888.0,10986.36
982754,1-102-2020-8282,2020-06-04,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,482,حدائق الشيخ محمد بن راشد,Hadaeq Sheikh Mohammed Bin Rashid,2033.0,ماج آي المرحلة 1,MAG EYE Phase 1,HADAEQ SHEIKH MOHAMMED BIN RASHID - DISRICT 7,حدائق الشيخ محمد بن راشد - ديستركت 7,استوديو,Studio,1,38.73,426000.0,10999.23
623998,1-102-2023-29414,2023-06-22,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,409,البرشاء جنوب الثالثة,Al Barshaa South Third,2620.0,إيلانو من أورو24,ELANO BY ORO24,Arjan,أرجان,استوديو,Studio,1,34.27,563825.0,16452.44


In [143]:
# Displaying random observations where registration type is "Off-Plan Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].notnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Off-Plan Properties') &
                                  (transactions_residential_sales_5y['property_type_en'] == 'Villa')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
278863,1-102-2021-53,2021-01-04,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,317,جميرا الاولى,Jumeirah First,2167.0,سور لامير,Sur La Mer,LA MER,لامير,أربع غرف,4 B/R,0,387.97,6100000.0,15722.87
613997,1-102-2023-16096,2023-04-03,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,2534.0,المرابع العربية ااا - انيا,Arabian Ranches lll - Anya,,,ثلاث غرف,3 B/R,0,144.84,2240888.0,15471.47
597577,1-102-2023-58830,2023-11-06,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,469,اليفره 1,Al Yufrah 1,2771.0,ذا فالي - نيما,The Valley - Nima,,,ثلاث غرف,3 B/R,0,174.08,2355888.0,13533.36
963718,1-102-2023-59966,2023-11-13,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,2258.0,المرابع العربية ااا - بليس,Arabain Ranches lll - Bliss,,,ثلاث غرف,3 B/R,0,117.97,2175000.0,18436.89
778548,1-102-2024-59233,2024-08-08,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,469,اليفره 1,Al Yufrah 1,2511.0,ذا فالي - ايلورا,The Valley - Elora,,,أربع غرف,4 B/R,0,231.0,2400000.0,10389.61


In [144]:
# Displaying random observations where registration type is "Existing Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].notnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Existing Properties')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
488212,1-11-2023-15347,2023-05-22,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,453,سيح شعيب 2,Saih Shuaib 2,514.0,الصحراء ميدوس 1,SAHARA MEADOWS1,,,ثلاث غرف,3 B/R,0,136.96,785000.0,5731.6
1214973,1-41-2022-20566,2022-11-22,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,2057.0,فيلانوفا أمارانتا 3,Villanova Amaranta 3,,,أربع غرف,4 B/R,0,230.39,2050000.0,8897.96
128667,1-11-2024-30374,2024-08-16,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,1639.0,بارك فيو تاور,PARK VIEW TOWER,Jumeirah Village Circle,قرية جميرا الدائرية,غرفتين,2 B/R,1,124.22,1700000.0,13685.4
971082,1-41-2024-11751,2024-06-11,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,2094.0,مارينا فيستا,MARINA VISTA,,,ثلاث غرف,3 B/R,1,162.96,6600000.0,40500.74
588961,1-41-2024-18496,2024-09-17,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,366,الكفاف,Al Kifaf,1957.0,بارك جيت رازيدنس,Park Gate Residences,Wasl 1,وصل 1,ثلاث غرف,3 B/R,1,200.24,3980000.0,19876.15


In [145]:
# Displaying random observations where registration type is "Existing Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].notnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Existing Properties') &
                                  (transactions_residential_sales_5y['property_type_en'] == 'Villa')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
820748,1-110-2023-563,2023-11-21,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,531,الحبيه السادسة,Al Hebiah Sixth,1659.0,منطقة مدن 3 _ الجوار الأول- أرابيلا,MUDON PHASE 3 _ NEIGHBOURHOOD I- ARABELLA,Mudon,مدن,ثلاث غرف,3 B/R,0,270.64,2900000.0,10715.34
1317796,1-11-2023-3533,2023-02-07,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,506,اليلايس 1,Al Yelayiss 1,1342.0,مجتمع ريم ميرا PH5,REEM-MIRA COMMUNITY PH5,,,ثلاث غرف,3 B/R,0,201.58,1850000.0,9177.5
329056,1-11-2021-20450,2021-11-16,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,467,وادي الصفا 5,Wadi Al Safa 5,2076.0,فيلانوفا أمارانتا أ,Villanova Amaranta A,,,ثلاث غرف,3 B/R,0,160.91,1685000.0,10471.69
405345,1-11-2021-928,2021-01-20,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,1390.0,المدينة المستدامة,THE SUSTAINABLE CITY,THE SUSTAINABLE CITY,ذا ساستينبل ستي,أربع غرف,4 B/R,0,432.29,3800000.0,8790.4
819480,1-11-2022-14957,2022-06-28,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,463,وادي الصفا 7,Wadi Al Safa 7,1873.0,سيرينا - كاسا فيفا,Serena - Casa Viva,800 Villas,800 فيلا,أربع غرف,4 B/R,0,276.4,2400000.0,8683.07


In [146]:
# Displaying random observations where registration type is "Existing Properties"
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].isnull()) & 
                                  (transactions_residential_sales_5y['reg_type_en'] == 'Existing Properties') &
                                  (transactions_residential_sales_5y['property_type_en'] == 'Unit')].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
534701,1-11-2024-21154,2024-06-10,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,445,جبل علي الأولى,Jabal Ali First,,,,Discovery Gardens,الحدائق المكتشفة,استوديو,Studio,0,49.0,470000.0,9591.84
16289,1-11-2024-12811,2024-04-08,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,526,الخليج التجارى,Business Bay,,,,Business Bay,الخليج التجاري,غرفة,1 B/R,1,102.93,2127000.0,20664.53
987779,1-11-2024-14763,2024-04-30,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,,,,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,غرفتين,2 B/R,1,134.13,2100000.0,15656.45
283097,1-11-2021-8490,2021-05-25,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,,,,Jumeriah Beach Residence - JBR,جميرا بيتش ريزيدنس - الجيه بي آر,غرفتين,2 B/R,1,130.57,1411111.0,10807.31
833315,1-11-2024-21930,2024-06-13,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,444,الحبيه الاولى,Al Hebiah First,,,,Motor City,موتور ستي,ثلاث غرف,3 B/R,1,272.3,1900000.0,6977.6


In [147]:
# Checking the missing values percentages in the transactions dataset
print("Transactions dataset missing data percentage:")
print(transactions_residential_sales_5y.isnull().sum() / transactions_residential_sales_5y.shape[0] * 100)

Transactions dataset missing data percentage:
transaction_id        0.000000
instance_date         0.000000
property_type_id      0.000000
property_type_ar      0.000000
property_type_en      0.000000
reg_type_id           0.000000
reg_type_ar           0.000000
reg_type_en           0.000000
area_id               0.000000
area_name_ar          0.000000
area_name_en          0.000000
project_number       10.130277
project_name_ar      10.130277
project_name_en      10.130277
master_project_en    21.317510
master_project_ar    21.317510
rooms_ar              0.932668
rooms_en              0.932668
has_parking           0.000000
procedure_area        0.000000
actual_worth          0.000000
meter_sale_price      0.000000
dtype: float64


In [148]:
# Checking random observations where projects information are available and master project information is missing
transactions_residential_sales_5y[(transactions_residential_sales_5y['project_name_en'].notnull()) &
                                  (transactions_residential_sales_5y['master_project_en'].isnull())].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
114126,1-102-2023-46660,2023-09-11,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,412,المركاض,Al Merkadh,2046.0,عزيزي ريفييرا 17,Azizi Riviera 17,,,غرفة,1 B/R,1,90.29,1399950.0,15505.04
1273616,1-102-2024-43949,2024-06-20,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,405,بوكدرة,Bukadra,3079.0,ريفرسايد كريسنت 360,360 Riverside Crescent,,,غرفتين,2 B/R,1,84.67,2415157.0,28524.35
383422,1-102-2023-66631,2023-12-15,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,469,اليفره 1,Al Yufrah 1,2803.0,ذا فالي - ألانا,The Valley - Alana,,,أربع غرف,4 B/R,0,345.5,4248888.0,12297.79
280538,1-102-2022-9491,2022-04-07,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,467,وادي الصفا 5,Wadi Al Safa 5,2348.0,لا ?يوليتا 1,La Violeta 1,,,ثلاث غرف,3 B/R,0,161.0,1575000.0,9782.61
863317,1-11-2021-13724,2021-08-09,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,330,مرسى دبي,Marsa Dubai,1743.0,بوابة مارينا الحية في جميرا,JUMEIRAH LIVING MARINA GATE,,,غرفة,1 B/R,1,90.75,2447359.0,26968.14


In [149]:
transactions_residential_sales_5y['master_project_en'].value_counts(dropna=False)

master_project_en
NaN                                              84729
Jumeirah Village Circle                          37693
Business Bay                                     26578
The Lagoons                                      14952
Dubai Marina                                     13786
Arjan                                            12994
Burj Khalifa                                     12443
DUBAI HILLS                                      11303
TOWN SQUARE                                       9259
Jumeirah Lakes Towers                             8482
Dubai Sports City                                 8332
International City Phase 1                        8311
Dubai World Central                               8236
Al Furjan                                         6723
Residential Complex                               6554
Palm Jumeirah                                     6178
SOBHA HARTLAND                                    5929
DAMAC HILLS                                    

In [150]:
transactions_residential_sales_5y['master_project_en'].value_counts(dropna=False)

master_project_en
NaN                                              84729
Jumeirah Village Circle                          37693
Business Bay                                     26578
The Lagoons                                      14952
Dubai Marina                                     13786
Arjan                                            12994
Burj Khalifa                                     12443
DUBAI HILLS                                      11303
TOWN SQUARE                                       9259
Jumeirah Lakes Towers                             8482
Dubai Sports City                                 8332
International City Phase 1                        8311
Dubai World Central                               8236
Al Furjan                                         6723
Residential Complex                               6554
Palm Jumeirah                                     6178
SOBHA HARTLAND                                    5929
DAMAC HILLS                                    

In [151]:
transactions_residential_sales_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
1274994,1-102-2022-42228,2022-12-19,4,فيلا,Villa,0,على الخارطة,Off-Plan Properties,462,مدينة المطار,Madinat Al Mataar,2452.0,1 خليج الجنوب,South Bay 1,Dubai South Residential District,المدينة السكنية بدبي الجنوب,أربع غرف,4 B/R,0,406.34,3000000.0,7382.98
385015,1-102-2022-28097,2022-09-20,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,445,جبل علي الأولى,Jabal Ali First,2395.0,جيمز من دانوب,GEMZ by Danube,Al Furjan,الفرجان,غرفتين,2 B/R,1,121.89,1422000.0,11666.26
930274,1-11-2022-4533,2022-03-14,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,315,المناره,Al Manara,,,,,,,,0,1393.55,9000000.0,6458.33
515124,1-41-2024-18254,2024-09-12,3,وحدة,Unit,1,العقارات القائمة,Existing Properties,364,الوصل,Al Wasl,2147.0,سنترال بارك في سيتي ووك - مبنى 1,Central Park at City Walk -Building 1,City Walk,ستي ووك,غرفتين,2 B/R,1,137.06,4285000.0,31263.68
898194,1-102-2024-17853,2024-03-21,3,وحدة,Unit,0,على الخارطة,Off-Plan Properties,441,البرشاء جنوب الرابعة,Al Barsha South Fourth,2757.0,Sapphire 32 by Dar Alkarama,Sapphire 32 by Dar Alkarama,Jumeirah Village Circle,قرية جميرا الدائرية,غرفة,1 B/R,1,68.39,904575.0,13226.73


In [152]:
# Displaying random observations where rooms are null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()].sample(5)

Unnamed: 0,transaction_id,instance_date,property_type_id,property_type_ar,property_type_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price
1126783,1-41-2023-5088,2023-03-09,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,343,ورسان الاولى,Al Warsan First,1384.0,ورسان فيلج A,WARSAN VILLAGE A,International City Phase 1,المدينة العالمية - المرحلة الاولى,,,0,154.0,1425000.0,9253.25
81754,1-110-2023-80,2023-02-28,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,414,البرشاء جنوب الثانية,Al Barshaa South Second,1525.0,فيلا لانتانا 2,Villa Lantana 2,Dubiotech,دبيوتك,,,0,354.0,4550000.0,12853.11
225757,1-11-2019-1526,2019-02-19,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,266,السطوه,Al Satwa,,,,,,,,0,232.26,2850000.0,12270.73
364078,1-41-2021-1813,2021-02-23,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,523,الحبية الثالثة,Al Hebiah Third,1407.0,دمام هيلز - تيرنتي,DAMAC HILLS - TRINITY,DAMAC HILLS,داماك هيليز,,,0,253.0,2100000.0,8300.4
144076,1-11-2020-7717,2020-09-17,4,فيلا,Villa,1,العقارات القائمة,Existing Properties,375,جميرا الثانيه,Jumeirah Second,,,,,,,,0,548.13,4000000.0,7297.54


In [153]:
# Inspecting property type where rooms are null
transactions_residential_sales_5y[transactions_residential_sales_5y['rooms_en'].isnull()]['property_type_en'].value_counts()

property_type_en
Villa    3707
Name: count, dtype: int64

In [154]:
# Displaying random observations where property type is "Unit" and rooms are null
transactions_residential_sales_5y[(transactions_residential_sales_5y['rooms_en'].isnull()) & 
                                  (transactions_residential_sales_5y['property_type_en'] == 'Unit')]

ValueError: a must be greater than 0 unless no samples are taken

In [292]:
transactions_5y.isnull().sum() / transactions_5y.shape[0] * 100

transaction_id           0.000000
procedure_id             0.000000
trans_group_id           0.000000
trans_group_ar           0.000000
trans_group_en           0.000000
procedure_name_ar        0.000000
procedure_name_en        0.000000
instance_date            0.000000
property_type_id         0.000000
property_type_ar         0.000000
property_type_en         0.000000
property_sub_type_id    17.835738
property_sub_type_ar    17.835738
property_sub_type_en    17.835738
property_usage_ar        0.000000
property_usage_en        0.000000
reg_type_id              0.000000
reg_type_ar              0.000000
reg_type_en              0.000000
area_id                  0.000000
area_name_ar             0.000000
area_name_en             0.000000
building_name_ar        28.971157
building_name_en        28.905769
project_number          18.169554
project_name_ar         18.169554
project_name_en         18.169554
master_project_en       21.266945
master_project_ar       21.268778
nearest_landma

In [203]:
transactions_5y.to_csv('../data/processed/transactions_5y.csv', index=False)
rent_contracts_5y.to_csv('../data/processed/rent_contracts_5y.csv', index=False)

Before diving deeper into analyzing each dataset, I want to identify the common columns between the **transactions**, **rent contracts**, and **projects** datasets. This will help me align the data structures and determine which columns can be used to link the datasets together. Once I find the common columns, I’ll have a better understanding of how to merge or relate these datasets for further analysis.

In [204]:
# Get the column names of each dataset
transactions_columns = set(transactions.columns)
rent_contracts_columns = set(rent_contracts.columns)
projects_columns = set(projects.columns)

# Find the common columns between all three datasets
common_columns = transactions_columns.intersection(rent_contracts_columns, projects_columns)

# Display the common columns
print("Common columns across transactions, rent contracts, and projects datasets:")
print(common_columns)

Common columns across transactions, rent contracts, and projects datasets:
{'project_number', 'master_project_en', 'area_id', 'area_name_ar', 'master_project_ar', 'area_name_en'}
