# Rent Contracts Dataset Cleaning and Preparation

## Objective
In this notebook, we aim to clean, prepare, and explore the rent contracts dataset. Our goal is to align this dataset with the transactions data, facilitating rental yield analysis and further enhancing our real estate market insights.



## Data Collection and Cleaning for Rent Contracts

In this notebook, we’ll focus on the **Rent Contracts** dataset, loading and inspecting it to ensure it’s clean and well-structured for analysis. Our initial steps involve exploring the dataset’s structure, identifying key variables, and addressing any missing or inconsistent data. This foundational work on the rent contracts data will align it with the **Transactions** dataset, allowing us to seamlessly integrate both datasets in future steps. With a well-prepared dataset, we’ll be set up to derive insights and conduct rental yield analysis to support our project objectives.

In [234]:
# Importing libraries
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [235]:
# Loading dataset
rent_contracts = pd.read_csv("../data/raw/rent_contracts.csv")

In [236]:
# Setting pandas and numpy setting to display all rows and columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
np.set_printoptions(threshold=np.inf)

In [237]:
# Displaying random observations from rent contracts dataset
print(rent_contracts.sample(5))

           contract_id  contract_reg_type_id contract_reg_type_ar  \
6388037   CNT320155815                     1                 جديد   
3522354  CNT1798321308                     1                 جديد   
1911092   CNT104335509                     2                تجديد   
3245617  CNT1568751533                     2                تجديد   
7548975   CNT745772357                     1                 جديد   

        contract_reg_type_en contract_start_date contract_end_date  \
6388037                  New          10-05-2015        09-05-2016   
3522354                  New          16-12-2021        31-01-2023   
1911092                Renew          15-01-2013        14-01-2014   
3245617                Renew          15-09-2021        14-09-2022   
7548975                  New          20-01-2018        19-01-2019   

         contract_amount  annual_amount  no_of_prop  line_number  \
6388037           789600         789600          47           42   
3522354            70000    

In [238]:
# Displaying information about rent contracts dataset
print(rent_contracts.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8171211 entries, 0 to 8171210
Data columns (total 40 columns):
 #   Column                      Dtype  
---  ------                      -----  
 0   contract_id                 object 
 1   contract_reg_type_id        int64  
 2   contract_reg_type_ar        object 
 3   contract_reg_type_en        object 
 4   contract_start_date         object 
 5   contract_end_date           object 
 6   contract_amount             int64  
 7   annual_amount               int64  
 8   no_of_prop                  int64  
 9   line_number                 int64  
 10  is_free_hold                float64
 11  ejari_bus_property_type_id  float64
 12  ejari_bus_property_type_ar  object 
 13  ejari_bus_property_type_en  object 
 14  ejari_property_type_id      float64
 15  ejari_property_type_en      object 
 16  ejari_property_type_ar      object 
 17  ejari_property_sub_type_id  float64
 18  ejari_property_sub_type_en  object 
 19  ejari_property_sub_ty

In [239]:
# Displaying missing values in rent contracts dataset
rent_contracts.isnull().sum() / rent_contracts.shape[0] * 100

contract_id                    0.000000
contract_reg_type_id           0.000000
contract_reg_type_ar           0.000000
contract_reg_type_en           0.000000
contract_start_date            0.000000
contract_end_date              0.000000
contract_amount                0.000000
annual_amount                  0.000000
no_of_prop                     0.000000
line_number                    0.000000
is_free_hold                   0.036115
ejari_bus_property_type_id     0.036115
ejari_bus_property_type_ar     0.036115
ejari_bus_property_type_en     0.036115
ejari_property_type_id         0.708218
ejari_property_type_en         0.722378
ejari_property_type_ar         0.722378
ejari_property_sub_type_id     0.736672
ejari_property_sub_type_en     0.803235
ejari_property_sub_type_ar     0.803235
property_usage_en              0.175103
property_usage_ar              0.175103
project_number                85.635250
project_name_ar               85.635250
project_name_en               85.635250


**Key Observations**

1. **Contract and Tenant Information**:

    - **Essential Contract Data**: Each record has `contract_id`, `contract_reg_type_en` (**New** or **Renew**), and contract duration (`contract_start_date` and `contract_end_date`), which provide insights into the contract type and period. This information is crucial for analyzing trends over time, particularly in contract renewals versus new agreements.

    - **Tenant Information**: The `tenant_type_en` column specifies whether the tenant is a **Person** or **Authority**. However, it has a **9.6%** missing rate, indicating some records lack tenant type, which might impact tenant-based segmentation.

2. **Property and Usage Details**:

    - **Property Classification**:

        - **Broad Categories**: `ejari_bus_property_type_en` and `ejari_property_type_en` offer classifications like **Unit**, **Land**, **Office**, and **Shop**. However, the primary property usage (`property_usage_en`) includes **Residential**, **Commercial**, and **Industrial / Commercial** categories.

        - **Detailed Subtypes**: `ejari_property_sub_type_en` specifies unit configurations (e.g., **1-bedroom**, **Shop**) but has **80%** missing data, which may limit the level of detail in analysis.

        - The `is_free_hold` column, indicating whether the property is freehold, is missing in **3.6%** of records.

        - **Single Property Contracts**: `no_of_prop` indicates the number of properties per contract. A value of 1 suggests single-property contracts, which may simplify yield analysis by focusing on individual units.


3. **Location Information**:

    - **Area and Project Data**:

        - `area_name_en` and `area_id` provide general location information with minimal missing values.

        - The `project_number`, `project_name_en`, and `master_project_en` columns give insights into specific projects, but they have high missing rates (**85.6%** and **67.8%** respectively), indicating limited project-level information in many records.

    - **Proximity to Key Amenities**:

        - Fields such as `nearest_landmark_en`, `nearest_metro_en`, and `nearest_mall_en` describe the proximity of properties to landmarks, metro stations, and malls. Missing values range from 6.7% to 11.3%, which may impact completeness in location-based analysis.


4. **Financial Details**:

    - **Rent Amounts and Area**:

        - `contract_amount` and `annual_amount` fields capture the rental amounts, and `actual_area` provides the area (m²) of the unit. However, `actual_area` is missing in **11.1%** of records, which may affect price-per-area calculations for rental yield analysis.

### Data Preparation and Initial Filtering of Rent Contracts Dataset

To ensure that our **Rent Contracts** dataset is relevant, manageable, and aligned with our project’s objectives, we begin by preparing and filtering the data in a few key ways:

1. **Convert Date Columns**: The `contract_start_date` and `contract_end_date` columns are first converted into datetime format. This standardization allows us to perform accurate date-based operations, enabling efficient filtering and future analysis based on timeframes.

2. **Focus on Recent Data (Last 3 Years)**: Given that recent trends are most relevant to our analysis, we filter the dataset to include only contracts from the past three years. This step substantially reduces the dataset size, helping us focus on the most pertinent and up-to-date information.

3. **Remove Arabic Columns**: To simplify the dataset, we remove all columns with Arabic text (`_ar` suffix). This leaves only the English attributes, streamlining the data and improving readability for subsequent processing steps.

4. **Filter for Residential Properties with Single Units**: 

   - We narrow down the dataset further to focus solely on **Residential** property contracts, discarding any non-residential entries.

   - Additionally, we filter for contracts involving a single property unit (`no_of_prop = 1`). This ensures that our data reflects individual rental units, aligning with our objectives for rental yield calculations and simplifying the modeling process.

In [240]:
# Displaying random observations from the date columns
print(rent_contracts['contract_start_date'].sample(5))
print(rent_contracts['contract_end_date'].sample(5))

6928902    25-09-2016
2898874    01-02-2021
7969782    01-12-2018
1343200    12-06-2016
6977470    15-12-2016
Name: contract_start_date, dtype: object
2945595    31-12-2021
4753655    19-07-2024
2702851    31-10-2021
4157722    16-09-2023
5373216    28-02-2025
Name: contract_end_date, dtype: object


In [241]:
# Converting date columns to datetime format
rent_contracts['contract_start_date_converted'] = pd.to_datetime(rent_contracts['contract_start_date'], dayfirst=True, errors='coerce')
rent_contracts['contract_end_date_converted'] = pd.to_datetime(rent_contracts['contract_end_date'], dayfirst=True, errors='coerce')

# Displaying random observations from the date columns
print(rent_contracts['contract_start_date_converted'].sample(5))
print(rent_contracts['contract_end_date_converted'].sample(5))

6381049   2015-04-01
551292    2022-05-29
806393    2023-09-15
4150182   2014-03-22
2812753   2021-01-10
Name: contract_start_date_converted, dtype: datetime64[ns]
4490449   2023-09-24
2828607   2021-12-31
90830     2020-10-08
2740043   2021-12-14
1319276   2017-03-24
Name: contract_end_date_converted, dtype: datetime64[ns]


In [242]:
# Comparing the original date columns with the converted date columns missing values
rent_contracts[[col for col in rent_contracts.columns if 'date' in col]].isnull().sum() / rent_contracts.shape[0] * 100

contract_start_date              0.000000
contract_end_date                0.000000
contract_start_date_converted    0.000000
contract_end_date_converted      0.000024
dtype: float64

In [243]:
# Inpecting the missing values in the converted date columns
rent_contracts[rent_contracts['contract_end_date_converted'].isna()].shape 

(2, 42)

In [244]:
# Displaying the missing values in the converted date columns
display(rent_contracts[rent_contracts['contract_end_date_converted'].isna()])

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,property_usage_en,property_usage_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,area_id,area_name_ar,area_name_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en,contract_start_date_converted,contract_end_date_converted
7142575,CNT58327,1,جديد,New,29-05-2012,28-05-5013,180060000,60000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,2.0,2 bed rooms+hall,غرفتين و صالة,Residential,سكني,,,,,,355.0,النهده الاولى,Al Nahda First,137.0,مطار دبي الدولي,Dubai International Airport,محطة مترو الاستاد,STADIUM Metro Station,سيتي سنتر مردف,City Centre Mirdif,1.0,شخص,Person,2012-05-29,NaT
7144099,CNT58873,1,جديد,New,01-04-2012,31-03-3013,18018000,18000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,11.0,Studio,أستوديو,Residential,سكني,,,,,,362.0,نايف,Naif,24.0,مطار دبي الدولي,Dubai International Airport,محطة مترو ميدان بني ياس,Baniyas Square Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person,2012-04-01,NaT


Given that these date errors fall far outside your three-year filter, it’s reasonable to proceed with the direct conversion in the original columns and use `errors='coerce'` to handle any non-standard dates. This approach will help you bypass the need for new columns while ensuring any invalid dates become `NaT`, which won't affect your filtering process.

In [245]:
# Deleting the created date columns
rent_contracts.drop(columns=['contract_start_date_converted', 'contract_end_date_converted'], inplace=True)

# Converting the original date columns to datetime format
rent_contracts['contract_start_date'] = pd.to_datetime(rent_contracts['contract_start_date'], dayfirst=True, errors='coerce')
rent_contracts['contract_end_date'] = pd.to_datetime(rent_contracts['contract_end_date'], dayfirst=True, errors='coerce')

# Inspecting the data type for the date columns
rent_contracts[['contract_start_date', 'contract_end_date']].dtypes

contract_start_date    datetime64[ns]
contract_end_date      datetime64[ns]
dtype: object

In [246]:
# Filtering the dataset to only include obsvertaions from the last 3 years
from datetime import datetime 
rent_contracts_3y = rent_contracts[rent_contracts['contract_start_date'].dt.year >= datetime.now().year - 3]

# Comparing shapes before and after filtering
print("Rent Contracts Shape Before Filtering:", rent_contracts.shape)
print("Rent Contracts Shape After Filtering:", rent_contracts_3y.shape)

Rent Contracts Shape Before Filtering: (8171211, 40)
Rent Contracts Shape After Filtering: (3547373, 40)


In [247]:
# Removing Arabic columns from the dataset
arabic_cols = [col for col in rent_contracts_3y.columns if col.endswith("_ar")]
rent_contracts_3y = rent_contracts_3y.drop(columns=arabic_cols)

In [248]:
# Inspecting the shape of the dataset after removing Arabic columns
print(rent_contracts_3y.shape)

(3547373, 28)


In [249]:
# Displaying random observations from the dataset
display(rent_contracts_3y.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,property_usage_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
4736299,CNT2083758136,2,Renew,2023-05-05,2024-05-04,200000,200000,1,1,1.0,4.0,Villa,841.0,Villa,3.0,3 bed rooms+hall,Residential,2039.0,Marbella Village,Dubai Sports City,435.0,Al Hebiah Fourth,251.0,Sports City Swimming Academy,Damac Properties,Marina Mall,1.0,Person
451560,CRT1774226126,1,New,2021-11-01,2022-10-31,1000000,1000000,50,40,0.0,2.0,Unit,24.0,Hotel,421.0,Shop,Commercial,,,,362.0,Naif,37.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person
3126438,CNT1496770058,1,New,2021-07-05,2022-08-04,42000,42000,1,1,0.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,Residential,,,,355.0,Al Nahda First,103.0,Dubai International Airport,STADIUM Metro Station,City Centre Mirdif,1.0,Person
371605,CRT1567681926,2,Renew,2021-09-01,2022-08-31,140000,140000,1,1,1.0,4.0,Villa,841.0,Villa,3.0,3 bed rooms+hall,Residential,1034.0,Emirates Living - Springs 4,Springs - 5,352.0,Al Thanayah Fourth,257.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
5995995,CNT2123861725,2,Renew,2024-10-01,2025-09-30,864000,864000,180,72,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,Residential,,,Dubai Investment Park Second,459.0,Dubai Investment Park Second,15.0,Expo 2020 Site,,Ibn-e-Battuta Mall,2.0,Authority


In [250]:
# Inspecting missing values percentages of the dataset
rent_contracts_3y.isnull().sum() / rent_contracts_3y.shape[0] * 100

contract_id                    0.000000
contract_reg_type_id           0.000000
contract_reg_type_en           0.000000
contract_start_date            0.000000
contract_end_date              0.000000
contract_amount                0.000000
annual_amount                  0.000000
no_of_prop                     0.000000
line_number                    0.000000
is_free_hold                   0.000226
ejari_bus_property_type_id     0.000226
ejari_bus_property_type_en     0.000226
ejari_property_type_id         0.493097
ejari_property_type_en         0.508320
ejari_property_sub_type_id     0.510124
ejari_property_sub_type_en     0.576511
property_usage_en              0.090405
project_number                82.176275
project_name_en               82.176275
master_project_en             64.823237
area_id                        0.000085
area_name_en                   0.000085
actual_area                   11.296359
nearest_landmark_en            7.228983
nearest_metro_en              12.564594


In [251]:
# Inspecting values of Property Usage column
rent_contracts_3y['property_usage_en'].value_counts()

property_usage_en
Residential                              2492719
Commercial                               1014027
Industrial                                 20305
Industrial / Commercial                     7223
Multi Usage                                 3652
Industrial / Commercial / Residential       1861
Storage                                     1675
Tourist origin                              1463
Educational facility                         543
Health Facility                              472
Residential / Commercial                     169
Agriculture                                   57
Name: count, dtype: int64

In [252]:
# Filtrering the dataset to only include residential properties
rent_contracts_residential_3y = rent_contracts_3y[rent_contracts_3y['property_usage_en'] == 'Residential']

# Comparing shapes before and after filtering
print("Rent Contracts Before Filtering:", rent_contracts_3y.shape)
print("Rent Contracts After Filtering:", rent_contracts_residential_3y.shape)

Rent Contracts Before Filtering: (3547373, 28)
Rent Contracts After Filtering: (2492719, 28)


In [253]:
# Removing "property_usage_en" since the property usage column only holds a single value
rent_contracts_residential_3y = rent_contracts_residential_3y.drop(columns='property_usage_en')

# Confirming removal of column
rent_contracts_residential_3y.shape 

(2492719, 27)

In [254]:
# Displaying random observations from the dataset
display(rent_contracts_residential_3y.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
4123982,CNT1997189483,2,Renew,2022-08-01,2023-09-30,2085600,1787657,158,56,0.0,5.0,Virtual Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,,360.0,Muhaisanah Second,,Dubai International Airport,Etisalat Metro Station,City Centre Mirdif,2.0,Authority
3716673,CNT1864523987,1,New,2022-03-01,2023-02-28,21000000,21000000,168,115,0.0,2.0,Unit,842.0,Flat,3.0,3 bed rooms+hall,,,,341.0,Trade Center First,177.0,Burj Khalifa,Trade Centre Metro Station,Dubai Mall,2.0,Authority
5834588,CNT2121538409,2,Renew,2024-09-15,2025-09-14,40000,40000,1,1,0.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,,,,234.0,Hor Al Anz East,113.0,Dubai International Airport,Al Qiyadah Metro Station,City Centre Mirdif,1.0,Person
3015406,CNT1436205806,1,New,2021-05-01,2022-05-31,63000,63000,1,1,0.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,244.0,Al Murqabat,116.0,Dubai International Airport,Al Rigga Metro Station,Dubai Mall,1.0,Person
809151,CRT2095976466,2,Renew,2023-10-13,2024-10-12,40000,40000,1,1,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,1630.0,BINGHATTI APARTMENTS,Silicon Oasis,484.0,Nadd Hessa,70.0,IMG World Adventures,,,1.0,Person


In [255]:
# Inspecting missing values percentages of the dataset
rent_contracts_residential_3y.isnull().sum() / rent_contracts_residential_3y.shape[0] * 100

contract_id                    0.000000
contract_reg_type_id           0.000000
contract_reg_type_en           0.000000
contract_start_date            0.000000
contract_end_date              0.000000
contract_amount                0.000000
annual_amount                  0.000000
no_of_prop                     0.000000
line_number                    0.000000
is_free_hold                   0.000000
ejari_bus_property_type_id     0.000000
ejari_bus_property_type_en     0.000000
ejari_property_type_id         0.003169
ejari_property_type_en         0.009989
ejari_property_sub_type_id     0.007141
ejari_property_sub_type_en     0.034982
project_number                78.527062
project_name_en               78.527062
master_project_en             56.528794
area_id                        0.000000
area_name_en                   0.000000
actual_area                    5.859585
nearest_landmark_en            7.861696
nearest_metro_en              14.791278
nearest_mall_en               15.202075


The current state of the rent contracts dataset looks promising for further analysis, with the following key observations:

1. **Project and Area Information**:

   - **Project Details**: The `project_number`, `project_name_en`, and `master_project_en` columns still show a high level of missing values (approximately 78% and 57%). We might consider if the remaining project-specific records provide enough value or decide to use more generalized area data for analysis.

   - **Area Details**: `area_id` and `area_name_en` are fully populated, ensuring geographical insights are intact for the dataset.

2. **Property and Usage Types**:

   - **Residential Properties**: By filtering for residential properties (`property_usage_en`), we have a focused dataset aligned with our objective to calculate rental yields and forecast residential market trends.
   - **Property Types**: Columns such as `ejari_property_type_id`, `ejari_property_type_en`, and `ejari_property_sub_type_en` show only minor gaps (about 0.3% to 3.5%), which should be manageable for further analysis. The dataset retains strong coverage of key property details, like whether the unit is a flat, villa, or specific configuration (e.g., studio, 1-bedroom).

3. **Tenant and Contract Type Information**:
   - **Tenant Type**: The `tenant_type_id` and `tenant_type_en` columns have around 7% missing data, which may require imputation or filtering if necessary for tenant analysis.
   - **Contract Type**: `contract_reg_type_en` and `contract_amount` are complete, providing a clear view of contract types (e.g., New or Renew) and rental amounts for each unit.

4. **Proximity to Amenities**:
   - **Nearest Amenities**: Some columns indicating the nearest landmarks, metro stations, and malls contain moderate missing values (around 8% to 15%). These can potentially be filled or marked as unavailable, depending on the significance of proximity data in our analysis.

5. **Area and Size**:
   - **Actual Area**: With only about 5.9% missing values, `actual_area` is well-populated, giving us a solid foundation for size-based calculations, such as rent per square foot, which is essential for accurate yield analysis.

This filtered and cleaned dataset aligns well with our objectives and is structured to support deeper analysis of residential rental yields, tenant behavior, and other key metrics relevant to the real estate market. Next, we can proceed with imputation, if needed, and begin to engineer features for our model.

In [256]:
# Inspecting "contract_id" uniqueness
unique_contracts = rent_contracts_residential_3y['contract_id'].nunique()
total_contracts = rent_contracts_residential_3y.shape[0]
print(f"Unique contract IDs: {unique_contracts} out of {total_contracts} total entries")
if unique_contracts == total_contracts:
    print("Each contract_id is unique.")
else:
    print(f"There are {total_contracts - unique_contracts} duplicate contract IDs.")

Unique contract IDs: 2079161 out of 2492719 total entries
There are 413558 duplicate contract IDs.


In [257]:
# Displaying random samples of duplicated contract_id observations
rent_contracts_residential_3y[rent_contracts_residential_3y.duplicated(subset='contract_id', keep=False)].head(10)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
247628,CRT1352509366,1,New,2021-01-01,2021-12-31,28000,28000,1,1,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,333.0,ELITE II SPORTS RESIDENCE,Dubai Sports City,435.0,Al Hebiah Fourth,76.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
247629,CRT1352509366,1,New,2021-01-01,2021-12-31,28000,28000,1,2,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,333.0,ELITE II SPORTS RESIDENCE,Dubai Sports City,435.0,Al Hebiah Fourth,76.0,Sports City Swimming Academy,Dubai Internet City,Marina Mall,1.0,Person
247939,CRT1353000496,2,Renew,2021-01-05,2022-01-04,19000,19000,1,1,1.0,2.0,Unit,842.0,Flat,11.0,Studio,333.0,ELITE II SPORTS RESIDENCE,Dubai Sports City,435.0,Al Hebiah Fourth,37.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
247940,CRT1353000496,2,Renew,2021-01-05,2022-01-04,19000,19000,1,2,1.0,2.0,Unit,842.0,Flat,11.0,Studio,333.0,ELITE II SPORTS RESIDENCE,Dubai Sports City,435.0,Al Hebiah Fourth,37.0,Sports City Swimming Academy,Dubai Internet City,Marina Mall,1.0,Person
250293,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,1,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,18.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person
250294,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,2,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,18.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person
250295,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,3,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,18.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person
250296,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,4,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,18.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person
250297,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,5,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,19.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person
250298,CRT1356565156,2,Renew,2021-01-01,2021-12-31,561600,561600,26,6,1.0,2.0,Unit,4.0,Labor Camps,12.0,Room in labor Camp,,,JABAL ALI INDUSTRIAL DEVELOPMENT,395.0,Jabal Ali Industrial First,17.0,Expo 2020 Site,DANUBE Metro Station,Ibn-e-Battuta Mall,1.0,Person


In [258]:
# Identifying duplicate `contract_id` rows
duplicate_contract_ids = rent_contracts_residential_3y[rent_contracts_residential_3y.duplicated(subset='contract_id', keep=False)]

# Checking if each duplicate `contract_id` has unique `line_number` values
unique_line_number_check = duplicate_contract_ids.groupby('contract_id')['line_number'].nunique()

# Confirming all duplicates have unique line_number values per contract_id
all_unique_line_numbers = unique_line_number_check.eq(unique_line_number_check)

# Displaying a summary
print(all_unique_line_numbers.value_counts())

line_number
True    37822
Name: count, dtype: int64


The presence of duplicates in `contract_id` with different `line_number` values indicates that each `line_number` within a `contract_id` represents a distinct property or unit associated with that single contract. Essentially, a contract ID can cover multiple properties, each listed as a separate “line” or entry under the same contract ID. Here’s what this tells us:

1. **Multi-Property Contracts**: Each duplicate `contract_id` with a unique `line_number` shows that a single contract might include multiple properties or units (like multiple apartments or rooms within a building rented under one agreement).

2. **Granularity**: This `line_number` column allows us to differentiate between individual properties within the same contract, making the data more granular. Each line provides a breakdown of the contract at the property level.


Let's inspecting the Contract Registration Type Columns

In [259]:
# Displaying the value counts of the registration types
print("Unique Value Count of 'contract_reg_type_id':")
print(rent_contracts_residential_3y['contract_reg_type_id'].value_counts())

print("\nUnique Value Count of 'contract_reg_type_en':")
print(rent_contracts_residential_3y['contract_reg_type_en'].value_counts())


Unique Value Count of 'contract_reg_type_id':
contract_reg_type_id
2    1258228
1    1234491
Name: count, dtype: int64

Unique Value Count of 'contract_reg_type_en':
contract_reg_type_en
Renew    1258228
New      1234491
Name: count, dtype: int64


The counts for both contract_reg_type_id and contract_reg_type_en indicate two distinct types of contracts:

- **Renew (Type 2)**: 1,258,228 contracts

- **New (Type 1)**: 1,234,491 contracts

This alignment between the `contract_reg_type_id` and `contract_reg_type_en` values confirms that each contract type is consistently categorized across both columns, with no discrepancies.

Let's inspect the `contract_start_date` and `contract_end_date`

In [260]:
# Statistical summary of the contract_start_date & contract_end_date columns
print(rent_contracts_residential_3y[['contract_start_date', 'contract_end_date']].describe())

                 contract_start_date              contract_end_date
count                        2492719                        2492719
mean   2022-12-30 04:58:34.128950016  2024-01-10 13:19:02.813290496
min              2021-01-01 00:00:00            2021-01-31 00:00:00
25%              2022-01-20 00:00:00            2023-01-31 00:00:00
50%              2023-01-10 00:00:00            2024-01-15 00:00:00
75%              2023-12-10 00:00:00            2024-12-15 00:00:00
max              2204-10-04 00:00:00            2205-10-03 00:00:00


Upon examining the `contract_start_date` and `contract_end_date` fields, we discovered some extreme future dates extending as far as **2204** and **2205**. These outliers likely result from data entry errors, as they significantly deviate from realistic contract durations and our focus on recent years.



In [261]:
# Importing datetime
from datetime import datetime, timedelta

# Shape of data before filtering
print("Rent Contracts Residential Shape Before Filtering:", rent_contracts_residential_3y.shape)

# Get the current year
current_year = datetime.now().year

# Define the target years dynamically for the last three years
target_years = [current_year - 3, current_year - 2, current_year - 1, current_year, current_year + 1]

# Apply the filter based on contract start date falling within these years
rent_contracts_residential_3y = rent_contracts_residential_3y[
    rent_contracts_residential_3y['contract_start_date'].dt.year.isin(target_years)
]

# Shape of data after filtering
print("Rent Contracts Residential Shape After Filtering:", rent_contracts_residential_3y.shape)

Rent Contracts Residential Shape Before Filtering: (2492719, 27)
Rent Contracts Residential Shape After Filtering: (2492673, 27)


In [262]:
# Statistical summary of the contract_start_date & contract_end_date columns
print(rent_contracts_residential_3y[['contract_start_date', 'contract_end_date']].describe())

                 contract_start_date              contract_end_date
count                        2492673                        2492673
mean   2022-12-30 02:51:56.674508032  2024-01-10 10:58:30.568776448
min              2021-01-01 00:00:00            2021-01-31 00:00:00
25%              2022-01-20 00:00:00            2023-01-31 00:00:00
50%              2023-01-10 00:00:00            2024-01-15 00:00:00
75%              2023-12-10 00:00:00            2024-12-15 00:00:00
max              2025-12-31 00:00:00            2034-10-09 00:00:00


Now, we see:

- **Contract Start Date**: Ranges from **2021-01-01** to **2024-12-31**, capturing data within the last three years.

- **Contract End Date**: Has a max value extending up to **2034-10-09**, allowing for long-term contracts without excluding multi-year agreements that began within the last three years.


In [263]:
# Displaying random observations from the dataset
rent_contracts_residential_3y.sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5451524,CNT2110738708,2,Renew,2024-05-30,2025-05-29,66150,66150,1,1,1.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,232.0,Mirdif,146.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
4706238,CNT2082544926,2,Renew,2023-04-15,2024-04-14,39600,39600,1,1,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,,,Majan,465.0,Wadi Al Safa 3,88.0,IMG World Adventures,,,1.0,Person
2981292,CNT1423733790,1,New,2021-04-05,2022-04-04,40000,40000,1,1,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,805.0,UNIESTATE MILLENNIUM TOWER,Silicon Oasis,484.0,Nadd Hessa,102.0,IMG World Adventures,,,1.0,Person
3793164,CNT1889660265,1,New,2022-04-04,2023-05-03,48000,48000,1,1,1.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,372.0,Al Goze Industrial Second,112.0,Downtown Dubai,Noor Bank Metro Station,Mall of the Emirates,1.0,Person
5521641,CNT2112576451,2,Renew,2024-03-24,2025-03-23,71400,71400,1,1,0.0,2.0,Unit,842.0,Flat,3.0,3 bed rooms+hall,,,,234.0,Hor Al Anz East,164.0,Dubai International Airport,Al Qiyadah Metro Station,City Centre Mirdif,2.0,Authority


Let's inspect the `no_of_prop` column

In [264]:
# Unique values count of "no_of_prop" column
rent_contracts_residential_3y['no_of_prop'].value_counts()[:]

no_of_prop
1      2043919
2        19833
3        15283
5        14599
10       13926
4        13856
6        11150
8         9207
20        9165
7         9115
15        8676
12        7032
30        6927
9         6692
40        6516
90        6390
25        6270
13        5771
14        5162
32        5025
11        4962
17        4454
22        4442
18        4393
50        4343
16        4282
23        4074
21        4065
60        3899
28        3824
24        3816
46        3355
100       3300
80        3281
36        3154
27        2913
35        2888
45        2880
44        2813
19        2809
33        2801
38        2774
70        2730
66        2706
29        2566
48        2550
26        2543
52        2496
42        2394
41        2377
88        2376
53        2226
72        2160
34        2151
64        2108
62        2107
31        2099
37        2035
65        2015
55        1923
120       1919
58        1914
47        1879
264       1848
56        1800
160       1760

- To ensure our analysis remains focused on residential rental trends, we have decided to filter the data to include only contracts where:

	- `no_of_prop` is equal to 1, indicating that each contract involves a single rented unit.

	- `line_number` is also equal to 1, which ensures we’re capturing only the primary unit details for each contract.

- This approach allows us to:

	- **Exclude multi-unit and labor camp rentals**: By filtering out contracts with multiple properties, we avoid the influence of large-scale rental agreements, such as labor camps or commercial leases, which could skew our insights.

	- **Focus on individual residential units**: Each remaining contract will represent a unique residential unit, helping us to directly observe rental patterns and pricing trends specific to individual residential properties.

	- **Improve data clarity for rental trend analysis**: With this filter applied, our analysis will better reflect the true residential rental market trends, providing meaningful insights aligned with our project’s objectives.

In [265]:
# Filtering rent contracts to include only a single property per contract
rent_contracts_residential_single_prop_3y = rent_contracts_residential_3y[rent_contracts_residential_3y['no_of_prop'] == 1]

# Comparing shapes before and after filtering
print("Rent Contracts Residential Before Filtering:", rent_contracts_residential_3y.shape)
print("Rent Contracts Residential After Filtering:", rent_contracts_residential_single_prop_3y.shape)

Rent Contracts Residential Before Filtering: (2492673, 27)
Rent Contracts Residential After Filtering: (2043919, 27)


Let's inspect the `line_number` column

In [266]:
# Unique value count of line_number column
rent_contracts_residential_single_prop_3y['line_number'].value_counts()

line_number
1    2042556
2       1363
Name: count, dtype: int64

Despite this filter, we observed that some contracts still contain multiple entries, indicated by `line_number` values greater than 1. These secondary lines likely represent administrative adjustments rather than separate units, given that `no_of_prop` equals 1. To further refine our dataset, we’ll now filter for `line_number == 1`, ensuring that only primary contract lines are retained

In [267]:
# Filtering rent contracts to include only the first line number per contract
rent_contracts_residential_single_prop_first_line_3y = rent_contracts_residential_single_prop_3y[
    rent_contracts_residential_single_prop_3y['line_number'] == 1
]

# Comparing shapes before and after filtering
print("Rent Contracts Residential Single Prop Before Filtering:", rent_contracts_residential_single_prop_3y.shape)
print("Rent Contracts Residential Single Prop After Filtering:", rent_contracts_residential_single_prop_first_line_3y.shape)

Rent Contracts Residential Single Prop Before Filtering: (2043919, 27)
Rent Contracts Residential Single Prop After Filtering: (2042556, 27)


In [268]:
# Removing no_of_prop & line_number columns from the dataset
rent_contracts_residential_single_prop_first_line_3y = rent_contracts_residential_single_prop_first_line_3y.drop(columns=['line_number', 'no_of_prop'])

# Inspecting shape of data
print(rent_contracts_residential_single_prop_first_line_3y.shape)

(2042556, 25)


In [269]:
# Displaying random observations from the dataset
print(rent_contracts_residential_single_prop_first_line_3y.sample(5))

           contract_id  contract_reg_type_id contract_reg_type_en  \
3933407  CNT1936583169                     2                Renew   
3223016  CNT1544640201                     1                  New   
5887533  CNT2121884522                     2                Renew   
5573421  CNT2113759961                     2                Renew   
939634   CRT2116222116                     2                Renew   

        contract_start_date contract_end_date  contract_amount  annual_amount  \
3933407          2022-07-01        2023-06-30            80000          80000   
3223016          2021-08-25        2022-08-24            16000          16000   
5887533          2024-08-26        2025-08-25            55500          55500   
5573421          2024-05-10        2025-05-09            47000          47000   
939634           2024-07-01        2025-06-30           170500         170500   

         is_free_hold  ejari_bus_property_type_id ejari_bus_property_type_en  \
3933407           

Let's inspect the `is_free_hold` column

In [270]:
# Unique value count of "is_free_hold" column
rent_contracts_residential_single_prop_first_line_3y['is_free_hold'].value_counts()

is_free_hold
1.0    1102834
0.0     939722
Name: count, dtype: int64

In [271]:
# Group by is_free_hold and calculate average contract and annual amounts
print(rent_contracts_residential_single_prop_first_line_3y.groupby('is_free_hold').agg(
    count_contracts=('contract_amount', 'count'),
    avg_contract_amount=('contract_amount', 'mean'),
    avg_annual_amount=('annual_amount', 'mean')
    
))

              count_contracts  avg_contract_amount  avg_annual_amount
is_free_hold                                                         
0.0                    939722         64046.785953       63620.473497
1.0                   1102834         83053.780991       82337.823907


Our analysis of UAE rental contracts reveals a notable distinction between freehold and non-freehold properties, both in count and rental pricing. In our dataset, there are 1,102,429 contracts for freehold properties and 939,189 for non-freehold properties. The average rental amounts are significantly higher for freehold properties, with contract and annual amounts averaging approximately 83,000 AED and 82,000 AED, respectively, compared to around 64,000 AED and 63,000 AED for non-freehold rentals.

This difference aligns with the benefits of freehold properties in the UAE, where landlords retain full ownership and more flexibility in maintenance and modifications, potentially increasing desirability and rental value. This insight underscores a premium trend in the UAE rental market, where freehold properties offer both tenants and landlords enhanced long-term value and appeal.

In [272]:
# Displaying random observations from the dataset
display(rent_contracts_residential_single_prop_first_line_3y.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5662543,CNT2115974917,2,Renew,2024-07-17,2025-07-16,77399,77399,0.0,2.0,Unit,842.0,Flat,3.0,3 bed rooms+hall,,,,278.0,Mankhool,172.0,Burj Khalifa,ADCB Metro Station,Dubai Mall,1.0,Person
915360,CRT2113256426,1,New,2023-11-15,2024-11-14,43200,43200,0.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,240.0,Port Saeed,50.0,Dubai International Airport,Al Rigga Metro Station,Dubai Mall,2.0,Authority
2866182,CNT1378639121,2,Renew,2021-01-27,2022-01-26,60419,60419,1.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,232.0,Mirdif,146.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
4111083,CNT1994516530,2,Renew,2022-09-13,2023-09-12,59000,59000,1.0,2.0,Unit,842.0,Flat,2.0,2 bed rooms+hall,,,,232.0,Mirdif,122.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
5848489,CNT2121608466,1,New,2024-08-30,2025-08-29,120000,120000,1.0,2.0,Unit,842.0,Flat,1.0,1bed room+Hall,,,Burj Khalifa,390.0,Burj Khalifa,95.0,Downtown Dubai,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person


To streamline our analysis, we’ll examine four columns in the rent contracts dataset that outline the property type at different levels of hierarchy. These columns provide multiple layers of categorization, starting from broad classifications to more specific property descriptions.

- **Ejari Bus Property Type ID and Ejari Bus Property Type EN**: These columns represent the broadest classification, distinguishing between primary property types such as Unit, Villa, Land, and possibly other categories. This high-level classification provides an overview of the general property type.

- **Ejari Property Type ID and Ejari Property Type EN**: These columns break down the Ejari Bus Property Type into more detailed subcategories, specifying property types like Flat, Villa, and other specific property types. This further categorization allows for a more nuanced view within each broad property class.

By analyzing these columns together, we aim to identify the relationships between these property types, ensuring that each higher-level category logically includes its relevant subcategories. Additionally, understanding these hierarchies will aid in filtering and grouping property types, which will be essential as we calculate metrics and generate insights from the data.

Let’s start by examining the unique values and relationships within these columns to establish the hierarchy and check for consistency.

In [273]:
# Displaying unique value counts for the "ejari_bus_property_type" columns & "ejari_property_type" columns
print("Unique Value Count of 'ejari_bus_property_type_id':")
print(rent_contracts_residential_single_prop_first_line_3y['ejari_bus_property_type_id'].value_counts(dropna=False))

print("\nUnique Value Count of 'ejari_bus_property_type_en':")
print(rent_contracts_residential_single_prop_first_line_3y['ejari_bus_property_type_en'].value_counts(dropna=False))

print("\nUnique Value Count of 'ejari_property_type_id':")
print(rent_contracts_residential_single_prop_first_line_3y['ejari_property_type_id'].value_counts(dropna=False))

print("\nUnique Value Count of 'ejari_property_type_en':")
print(rent_contracts_residential_single_prop_first_line_3y['ejari_property_type_en'].value_counts(dropna=False))

Unique Value Count of 'ejari_bus_property_type_id':
ejari_bus_property_type_id
2.0    1814109
4.0     223314
5.0       4873
1.0        181
0.0         79
Name: count, dtype: int64

Unique Value Count of 'ejari_bus_property_type_en':
ejari_bus_property_type_en
Unit            1814109
Villa            223314
Virtual Unit       4873
Building            181
Land                 79
Name: count, dtype: int64

Unique Value Count of 'ejari_property_type_id':
ejari_property_type_id
842.0          1766639
841.0           205414
903.0            27724
4.0              22436
19.0             18075
985.0              612
844.0              505
923.0              386
10.0               174
0.0                168
24.0               145
352361946.0         90
NaN                 79
12.0                46
608333806.0         44
9.0                  8
1.0                  8
2.0                  3
Name: count, dtype: int64

Unique Value Count of 'ejari_property_type_en':
ejari_property_type_en
Flat      

Here’s a breakdown of the findings and suggested steps to enhance data quality and alignment with the analysis objectives:

1. **Ejari Bus Property Type Level**:
	
    - **Dominant Property Type**: **“Unit”** (**1,813,299** entries) is the most frequent type, followed by **“Villa”** (**223,187**), showing that standard residential units dominate the rental data in urban areas like Dubai.

	- **Less Common Types**: **“Virtual Unit”** (**4,872** entries), **“Building”** (**181**), and **“Land”** (**79**) are infrequent, aligning with expectations for urban rentals but possibly less relevant for standard residential rental analysis.

	- **Missing Data**: No missing values are detected here, but the low counts for **“Building”** and **“Land”** suggest minimal relevance for a residential-focused analysis.

2. **Ejari Property Type Level**:

	- **Main Categories**: **“Flat”** (**1,765,838** entries) and **“Villa”** (**205,301** entries) are predominant, which corresponds well with the general **“Unit”** and **“Villa”** categories in the **Ejari Bus Property Type** level. Other frequent entries, such as **“Studio”** and **“Labor Camps,”** provide additional granularity but may not align directly with residential-focused analysis.

	- **Niche Categories**: Some specialized property types, like **“Complex Villas,”** **“Staff Accommodation,”** **“Penthouse,”** and **“Portacabin,”** are less frequent. These categories may be relevant for specific use cases but could dilute insights focused on standard rentals.

	- **Erroneous and Missing Values**: Some `ejari_property_type_id` entries, such as **352361946.0** and **608333806.0**, suggest data entry errors. Additionally, NaN values in both `ejari_property_type_id` and `ejari_property_type_en` may require handling to avoid gaps in analysis. These values may be grouped under an **“Unknown”** or **“Other”** category if imputation is not viable.

3. **Alignment and Cleaning Recommendations**:

	- **Data Standardization**: To streamline the data for residential analysis, it may be beneficial to combine less common or niche categories (e.g., **“Labor Camps,”** **“Staff Accommodation,”** and **“Portacabin”**) under an **“Other”** or **“Non-Residential”** label, ensuring data remains relevant and interpretable.

	- **Error Handling**: Erroneous entries in `ejari_property_type_id` should be filtered or corrected to ensure data quality. Missing values, particularly in `ejari_property_type_id` and `ejari_property_type_en`, could be either imputed or grouped to prevent loss of context in analysis.

4. **Hierarchy Consolidation**:

	- **Unified Property Type Feature**: Developing a single consolidated property type feature by mapping `ejari_bus_property_type_id` and `ejari_property_type_id` into simplified residential categories (e.g., **“Apartment,”** **“Villa,”** **“Commercial”**) can facilitate more cohesive analysis and provide clear distinctions for financial modeling.


To focus our analysis on the relevant residential property types, we’ll filter the `ejari_bus_property_type_en` to include only **“Unit”** and **“Villa.”** This way, we align the rent contracts data with the transactions data, honing in on residential rentals.

In [274]:
# Filter the rent contracts data to include only residential property types
rent_contracts_unit_villa = rent_contracts_residential_single_prop_first_line_3y[
    rent_contracts_residential_single_prop_first_line_3y['ejari_bus_property_type_en'].isin(['Unit', 'Villa'])
]

# Comparing shapes before and after filtering
print("Rent Contracts Before Filtering:", rent_contracts_residential_single_prop_first_line_3y.shape)
print("Rent Contracts After Filtering:", rent_contracts_unit_villa.shape)

Rent Contracts Before Filtering: (2042556, 25)
Rent Contracts After Filtering: (2037423, 25)


In [275]:
# Displaying the unique values count in the "ejari_property_type_en"
rent_contracts_unit_villa.groupby("ejari_bus_property_type_en")['ejari_property_type_en'].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en
Unit                        Flat                      1765271
                            Studio                      27719
                            Labor Camps                 19135
                            Villa                         578
                            Building                      388
                            Staff Accommodation           244
                            Mezzanine                     174
                            Arabian House                 159
                            NaN                           145
                            Hotel                         124
                            Penthouse                      90
                            Portacabin                     44
                            Complex Villas                 16
                            Shop                            8
                            Store                           8
                   

1. **Dominant Property Types**:

	- **Units** are predominantly **“Flat”** (**1,764,471** entries), followed by **“Studio”** (**27,713** entries) and **“Labor Camps”** (**19,134** entries).

	- **Villas** primarily appear as **“Villa”** (**204,723** entries), with a smaller presence in **“Complex Villas”** (**18,045** entries) and a few unique entries in **“Villa addendum”** (**43** entries).

2. **Irrelevant Property Types for Residential Analysis**:

	- Entries such as **“Labor Camps”**, **“Portacabin”**, **“Staff Accommodation”**, and **“Office”** under the **“Unit”** category do not fit the residential scope and could be excluded from further analysis.

	- While these property types are interesting as outliers, their presence is minor and not aligned with standard residential rental analysis, so we might consider filtering them out.

3. **Missing Data**:

	- There is a small number of missing values in `ejari_property_type_en` (**144** entries under **“Unit”** and **23** under **“Villa”**).

	- This is manageable but worth addressing, possibly through imputation or removal, depending on the overall impact on analysis quality.

First, I'd like to investigate the values where the `ejari_bus_property_type_en` is "Villa" and the `ejari_property_type_en` is "Flat".

In [276]:
# Displaying the shaoe where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Villa') & 
    (rent_contracts_unit_villa['ejari_property_type_en'] == 'Flat')
].shape 

(100, 25)

In [277]:
# Displaying random osbervations where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Villa') & 
    (rent_contracts_unit_villa['ejari_property_type_en'] == 'Flat')
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
3177786,CNT1524489810,2,Renew,2021-09-06,2022-09-05,132528,132528,1.0,4.0,Villa,842.0,Flat,4.0,4 bed rooms+hall,,,,232.0,Mirdif,0.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
3772266,CNT1882740017,2,Renew,2022-01-01,2022-12-31,240000,240000,0.0,4.0,Villa,842.0,Flat,5.0,5 bed rooms+hall,,,,317.0,Jumeirah First,465.0,Burj Khalifa,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person
4388217,CNT2055923067,2,Renew,2023-01-25,2023-07-24,60000,120000,0.0,4.0,Villa,842.0,Flat,6.0,6 bed rooms+hall,,,,348.0,Al Goze Fourth,1100.0,Downtown Dubai,Noor Bank Metro Station,Dubai Mall,1.0,Person
4754129,CNT2084335970,2,Renew,2023-05-15,2024-05-14,80000,80000,0.0,4.0,Villa,842.0,Flat,2.0,2 bed rooms+hall,,,,318.0,Jumeirah Third,103.0,Burj Khalifa,Noor Bank Metro Station,Dubai Mall,1.0,Person
3194219,CNT1532724502,2,Renew,2021-07-15,2022-07-14,220000,220000,0.0,4.0,Villa,842.0,Flat,5.0,5 bed rooms+hall,,,,317.0,Jumeirah First,465.0,Burj Khalifa,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person


In [278]:
# Displaying contract_amount statistics where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Villa') & 
    (rent_contracts_unit_villa['ejari_property_type_en'] == 'Flat')
]['contract_amount'].describe()

count       100.000000
mean     180779.660000
std       98334.962672
min        3000.000000
25%       84750.000000
50%      145914.000000
75%      272318.750000
max      480000.000000
Name: contract_amount, dtype: float64

In [279]:
# Displaying ejari_property_sub_type_en value count where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Villa') & 
    (rent_contracts_unit_villa['ejari_property_type_en'] == 'Flat')
]['ejari_property_sub_type_en'].value_counts(dropna=False)

ejari_property_sub_type_en
5 bed rooms+hall    40
2 bed rooms+hall    29
4 bed rooms+hall    23
6 bed rooms+hall     4
3 bed rooms+hall     3
1bed room+Hall       1
Name: count, dtype: int64

In [280]:
# Displaying unique value counts for "ejari_property_type_en" column
rent_contracts_unit_villa['ejari_property_type_en'].value_counts(dropna=False)

ejari_property_type_en
Flat                   1765371
Villa                   205414
Studio                   27724
Labor Camps              19135
Complex Villas           18075
Building                   388
Arabian House              386
Staff Accommodation        244
Mezzanine                  174
NaN                        168
Hotel                      145
Penthouse                   90
Villa addendum              46
Portacabin                  44
Store                        8
Shop                         8
Office                       3
Name: count, dtype: int64

In [281]:
# Displaying random observations where bus property type is "Unit" & property type is null
display(rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Unit') & rent_contracts_unit_villa['ejari_property_type_en'].isna()
].sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
3703488,CNT1860089784,1,New,2022-02-28,2023-03-27,18000,16615,1.0,2.0,Unit,0.0,,0.0,,1673.0,MAG 5 BOULEVARD,Dubai South Residential District,462.0,Madinat Al Mataar,36.0,Expo 2020 Site,,,1.0,Person
3624532,CNT1834504207,1,New,2022-02-01,2023-01-31,50000,50000,1.0,2.0,Unit,0.0,,0.0,,1772.0,CREEK HORIZON,The Lagoons,447.0,Al Khairan First,70.0,Dubai International Airport,Creek Metro Station,City Centre Mirdif,1.0,Person
4964897,CNT2094346778,1,New,2023-09-27,2023-11-26,43000,258000,1.0,2.0,Unit,0.0,,0.0,,1722.0,TERHAB HOTEL & TOWERS AT JUMEIRAH VILLAGE TRIA...,Jumeirah Village Triangle,442.0,Al Barsha South Fifth,352.0,Sports City Swimming Academy,Damac Properties,Marina Mall,1.0,Person
4276149,CNT2034939346,2,Renew,2022-11-25,2023-11-24,38000,38000,1.0,2.0,Unit,0.0,,0.0,,593.0,GLOBAL GOLF RESIDENCE 2,Dubai Sports City,435.0,Al Hebiah Fourth,87.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
4215599,CNT2016444509,1,New,2022-10-24,2023-10-23,75000,75000,1.0,2.0,Unit,0.0,,0.0,,1662.0,ARABIAN GATE,Silicon Oasis,484.0,Nadd Hessa,121.0,IMG World Adventures,,,1.0,Person


In [282]:
# Displaying random observations where bus property type is "Villa" & property type is null
display(rent_contracts_unit_villa[
    (rent_contracts_unit_villa['ejari_bus_property_type_en'] == 'Villa') & rent_contracts_unit_villa['ejari_property_type_en'].isna()
].sample(10))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5065477,CNT2099669311,1,New,2023-11-01,2024-10-31,110000,110000,1.0,4.0,Villa,0.0,,0.0,,,,Liwan,464.0,Wadi Al Safa 2,847.0,IMG World Adventures,,,1.0,Person
4789077,CNT2085890766,2,Renew,2023-06-15,2024-06-14,90000,90000,1.0,4.0,Villa,0.0,,3.0,3 bed rooms+hall,,,Liwan,464.0,Wadi Al Safa 2,848.0,IMG World Adventures,,,1.0,Person
4761364,CNT2084624145,2,Renew,2023-05-25,2024-05-24,120000,120000,1.0,4.0,Villa,0.0,,0.0,,401.0,LOTUS PARK,Jumeirah Village Circle,441.0,Al Barsha South Fourth,216.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
4192216,CNT2011687633,1,New,2022-10-26,2023-10-25,95000,95000,1.0,4.0,Villa,0.0,,0.0,,,,Liwan,464.0,Wadi Al Safa 2,847.0,IMG World Adventures,,,1.0,Person
3928771,CNT1934896336,2,Renew,2022-06-15,2023-06-14,90000,90000,1.0,4.0,Villa,0.0,,3.0,3 bed rooms+hall,,,Liwan,464.0,Wadi Al Safa 2,848.0,IMG World Adventures,,,1.0,Person
4433324,CNT2062633479,2,Renew,2023-01-01,2023-12-31,125000,125000,0.0,4.0,Villa,0.0,,3.0,3 bed rooms+hall,,,,378.0,Al Garhoud,464.0,Dubai International Airport,Airport Terminal 1 Metro Station,City Centre Mirdif,1.0,Person
3186260,CNT1529298125,1,New,2021-08-01,2022-07-31,130000,130000,1.0,4.0,Villa,0.0,,0.0,,1832.0,AR II - Reem Community,Arabian Ranches II - Reem Community,463.0,Wadi Al Safa 7,181.0,Motor City,,,1.0,Person
3381255,CNT1701333536,2,Renew,2021-09-10,2022-09-09,120000,120000,1.0,4.0,Villa,0.0,,0.0,,401.0,LOTUS PARK,Jumeirah Village Circle,441.0,Al Barsha South Fourth,216.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
3848609,CNT1907905038,2,Renew,2022-05-18,2023-02-17,116250,155000,1.0,4.0,Villa,0.0,,0.0,,1062.0,The Lakes Zulal,,351.0,Al Thanyah Third,355.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
5499590,CNT2112264915,2,Renew,2024-05-01,2025-04-30,194250,194250,1.0,4.0,Villa,0.0,,3.0,3 bed rooms+hall,1036.0,Emirates Living - Springs 6,Springs - 4,352.0,Al Thanayah Fourth,363.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person


In [283]:
rent_contracts_unit_villa[['ejari_bus_property_type_id', 'ejari_bus_property_type_en']].value_counts()

ejari_bus_property_type_id  ejari_bus_property_type_en
2.0                         Unit                          1814109
4.0                         Villa                          223314
Name: count, dtype: int64

In [284]:
rent_contracts_unit_villa[['ejari_property_type_id', 'ejari_property_type_en']].value_counts()

ejari_property_type_id  ejari_property_type_en
842.0                   Flat                      1765371
841.0                   Villa                      205414
903.0                   Studio                      27724
4.0                     Labor Camps                 19135
19.0                    Complex Villas              18075
844.0                   Building                      388
923.0                   Arabian House                 386
985.0                   Staff Accommodation           244
10.0                    Mezzanine                     174
24.0                    Hotel                         145
352361946.0             Penthouse                      90
12.0                    Villa addendum                 46
608333806.0             Portacabin                     44
1.0                     Shop                            8
9.0                     Store                           8
2.0                     Office                          3
Name: count, dtype: int64

In [285]:
rent_contracts_unit_villa[['ejari_property_sub_type_id', 'ejari_property_sub_type_en']].value_counts()

ejari_property_sub_type_id  ejari_property_sub_type_en 
1.0                         1bed room+Hall                 712095
2.0                         2 bed rooms+hall               627198
11.0                        Studio                         330175
3.0                         3 bed rooms+hall               229233
4.0                         4 bed rooms+hall                78137
5.0                         5 bed rooms+hall                30534
12.0                        Room in labor Camp              18080
6.0                         6 bed rooms+hall                 5844
7.0                         7 bed rooms+hall                 1437
8.0                         8 bed rooms+hall                  986
35.0                        Duplex                            550
10.0                        10 bed rooms+hall                 460
422.0                       Office                            452
38.0                        Labor Camp                        394
621.0               

In [286]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
rent_contracts_unit_villa.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709938
                                                    2 bed rooms+hall               609748
                                                    Studio                         302128
                                                    3 bed rooms+hall               132919
                            Studio                  Studio                          27462
                            Labor Camps             Room in labor Camp              17970
                            Flat                    4 bed rooms+hall                 8225
                                                    5 bed rooms+hall                  676
                                                    Duplex                            498
                            Labor Camps             Studio                            407
                    

I’ll filter the `ejari_property_type_en` column to include the following residential categories: **“Flat”**, **“Villa”**, **“Studio”**, **“Complex Villas”**, **“Arabian House”**, and **“Penthouse”**. This selection aligns with our focus on core residential property types and will streamline further analysis.

Since there are only about **168** missing (**NaN**) values in `ejari_property_type_en`, it’s reasonable to leave them out to maintain data integrity, given their small proportion relative to the dataset size. This approach will help keep the dataset clean and focused on clearly defined property categories.

In [287]:
# Define the property types we want to keep
selected_property_types = ["Flat", "Villa", "Studio", "Complex Villas", "Arabian House", "Penthouse"]

# Filter the DataFrame to keep only the rows with the specified property types and exclude NaN values
filtered_rent_contracts = rent_contracts_unit_villa[
    rent_contracts_unit_villa['ejari_property_type_en'].isin(selected_property_types)
]

# Comparing shapes before and after filtering
print("Rent Contracts Before Filtering:", rent_contracts_unit_villa.shape)
print("Rent Contracts After Filtering:", filtered_rent_contracts.shape)


Rent Contracts Before Filtering: (2037423, 25)
Rent Contracts After Filtering: (1989336, 25)


In [288]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709938
                                                    2 bed rooms+hall               609748
                                                    Studio                         302128
                                                    3 bed rooms+hall               132919
                                                    4 bed rooms+hall                 8225
                                                    5 bed rooms+hall                  676
                                                    Duplex                            498
                                                    NaN                               323
                                                    Penthouse                         288
                                                    Office                            226
                    

Looking at the filtered table, here are a few observations and inconsistencies worth noting:

1. **Mixed Property Types in Hierarchies**:

	- Under the `ejari_bus_property_type_en` of **“Unit”**, there are entries classified as **“Villa”**, which should logically only appear under **“Villa”** in the broader hierarchy. This suggests some misclassification.

	- Similarly, in the **“Villa”** category within `ejari_bus_property_type_en`, we see **“Flat”** and **“Studio”** as `ejari_property_type_en` values. These should typically belong to **“Unit”**, as they usually represent apartment-like structures rather than standalone properties like villas.

2. Anomalies in `ejari_property_sub_type_en`:

	- There are entries like **“Office”** and **“Room in labor Camp”** under **“Unit”**, which do not align with the residential focus. Non-residential subtypes should be filtered out if our objective is residential analysis.

	- Unusual subtypes, such as **“Boardroom”** and **“Pharmacy”**, also appear, which likely stem from inconsistent data entry or incorrect mappings.

3. **NaN Values**:

	- There are a small number of **NaN** values under `ejari_property_type_en` and `ejari_property_sub_type_en`. It may be worth investigating these to determine if they can be clarified based on other available columns or simply removed if they represent noise in the dataset.

4. **High Variability in Subtypes**:

	- Subtypes like **“1bed room+Hall”** or **“2 bed rooms+hall”** appear under both **“Flat”** and **“Villa”** categories. This inconsistency could be due to data entry errors or a lack of clear categorization in the original dataset. Such cases might benefit from further consolidation or standardized grouping.

First, let’s filter out non-residential types by removing entries with subtypes that suggest commercial uses, such as **“Office”**, **“Boardroom”** or **“Pharmacy”**.

In [289]:
# Shape of the dataset before filtering
print("Rent Contracts Shape Before Filtering:", filtered_rent_contracts.shape)

# Filtering out non-residential sub property types
non_residential_sub_types = ['Office', 'Room in labor Camp', 'Hotel', 'Pharmacy', 'Boardroom', 'Shop', 'Commercial villa']

# Filter the DataFrame to keep only the rows with the specified property types
filtered_rent_contracts = filtered_rent_contracts[
    ~filtered_rent_contracts['ejari_property_sub_type_en'].isin(non_residential_sub_types)
]

# Shape of the dataset after filtering
print("Rent Contracts Shape After Filtering:", filtered_rent_contracts.shape)

Rent Contracts Shape Before Filtering: (1989336, 25)
Rent Contracts Shape After Filtering: (1988962, 25)


In [290]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709938
                                                    2 bed rooms+hall               609748
                                                    Studio                         302128
                                                    3 bed rooms+hall               132919
                                                    4 bed rooms+hall                 8225
                                                    5 bed rooms+hall                  676
                                                    Duplex                            498
                                                    NaN                               323
                                                    Penthouse                         288
                            Villa                   3 bed rooms+hall                  170
                    

In [291]:
# Inspecting the shape of the dataset where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Villa"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa')
].shape 

(574, 25)

In [292]:
# Changing values in the "ejari_property_type" columns where "ejari_bus_property_type" is "Unit" & "ejari_property_type" is "Villa" to "Flat"
# Update both ejari_property_type_en to 'Flat' and ejari_property_type_id to 842 where necessary
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa'), 
    ['ejari_property_type_en', 'ejari_property_type_id']
] = ['Flat', 842]

# Confirming the shape of the dataset where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Villa"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa')
].shape 

(0, 25)

In [293]:
# Inspecting the shape of the dataset where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat')
].shape 

(100, 25)

In [294]:
# Update both ejari_property_type_en to 'Villa' and ejari_property_type_id to 841 where necessary
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat'), 
    ['ejari_property_type_en', 'ejari_property_type_id']
] = ['Villa', 841]

# Confirming the shape of the dataset where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Flat"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat')
].shape 

(0, 25)

In [295]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302128
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            498
                                                    NaN                               323
                                                    Penthouse                         288
                            Penthouse               Penthouse                          90
                    

In [296]:
# Inspecting where the "ejari_property_type_en" is "Villa" and "ejari_property_sub_type_en" is Studio
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Studio')
].shape

(9, 25)

In [297]:
# Displaying random observations where the "ejari_property_type_en" is "Villa" and "ejari_property_sub_type_en" is Studio
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Studio')
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
4640558,CNT2080496748,2,Renew,2021-01-01,2021-12-31,15000,15000,0.0,4.0,Villa,841.0,Villa,11.0,Studio,,,,375.0,Jumeirah Second,33.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,2.0,Authority
4240979,CNT2024848402,1,New,2022-11-01,2025-11-30,675000,225000,0.0,4.0,Villa,841.0,Villa,11.0,Studio,,,,233.0,Hor Al Anz,343.0,Dubai International Airport,Abu Baker Al Siddique Metro Station,City Centre Mirdif,1.0,Person
354802,CRT1531944946,1,New,2021-08-01,2022-07-31,24000,24000,0.0,4.0,Villa,841.0,Villa,11.0,Studio,,,,267.0,Al Raffa,,Burj Khalifa,Al Fahidi Metro Station,Dubai Mall,1.0,Person
4640568,CNT2080496852,2,Renew,2022-01-01,2022-12-31,15000,15000,0.0,4.0,Villa,841.0,Villa,11.0,Studio,,,,375.0,Jumeirah Second,33.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,2.0,Authority
3026847,CNT1441084416,2,Renew,2021-05-01,2022-04-30,16800,16800,0.0,4.0,Villa,841.0,Villa,11.0,Studio,,,,232.0,Mirdif,56.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person


In [298]:
# Update records where ejari_property_type_en is 'Villa' and ejari_property_sub_type_en is 'Studio'
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Studio'),
    ['ejari_bus_property_type_id', 'ejari_bus_property_type_en', 'ejari_property_type_id', 'ejari_property_type_en']
] = [2.0, 'Unit', 842, 'Flat']

# Confirming update where the "ejari_property_type_en" is "Villa" and "ejari_property_sub_type_en" is Studio
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Studio')
].shape

(0, 25)

In [299]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302137
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            498
                                                    NaN                               323
                                                    Penthouse                         288
                            Penthouse               Penthouse                          90
                    

In [300]:
# Inspecting values where "ejari_property_type_en" is "Villa" & ejari_property_sub_type_en is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].shape 

(27, 25)

In [301]:
# Displaying random observations where "ejari_property_type_en" is "Villa" & ejari_property_sub_type_en is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5116247,CNT2101956502,2,Renew,2024-01-01,2024-12-31,250000,250000,0.0,4.0,Villa,841.0,Villa,35.0,Duplex,,,,318.0,Jumeirah Third,533.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,1.0,Person
4958133,CNT2093932702,2,Renew,2023-10-01,2024-09-30,220000,220000,0.0,4.0,Villa,841.0,Villa,35.0,Duplex,,,,318.0,Jumeirah Third,533.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,1.0,Person
3473182,CNT1780074085,2,Renew,2022-01-01,2022-12-31,220000,220000,0.0,4.0,Villa,841.0,Villa,35.0,Duplex,,,,318.0,Jumeirah Third,533.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,1.0,Person
4768855,CNT2084817266,2,Renew,2023-08-15,2024-08-14,250000,250000,0.0,4.0,Villa,841.0,Villa,35.0,Duplex,,,,318.0,Jumeirah Third,533.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,1.0,Person
5219672,CNT2105569314,2,Renew,2024-01-01,2024-12-31,260000,260000,0.0,4.0,Villa,841.0,Villa,35.0,Duplex,,,,318.0,Jumeirah Third,533.0,Burj Khalifa,Business Bay Metro Station,Dubai Mall,1.0,Person


In [302]:
# Update properties classified as 'Villa' with 'Duplex' in the sub-type to 'Unit' and 'Flat' with appropriate IDs
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex'),
    ['ejari_bus_property_type_en', 'ejari_bus_property_type_id', 'ejari_property_type_en', 'ejari_property_type_id']
] = ['Unit', 2.0, 'Flat', 842.0]

In [303]:
# Confirming change where "ejari_property_type_en" is "Villa" & ejari_property_sub_type_en is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].shape 

(0, 25)

In [304]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302137
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            525
                                                    NaN                               323
                                                    Penthouse                         288
                            Penthouse               Penthouse                          90
                    

In [305]:
# Inspecting values where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Arabian House"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Arabian House')
].shape 

(159, 25)

In [306]:
# Update 'Unit' entries labeled as 'Arabian House' to 'Villa' with appropriate IDs
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Arabian House'),
    ['ejari_bus_property_type_en', 'ejari_bus_property_type_id']
] = ['Villa', 4.0]

# Confirming change where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Arabian House"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Arabian House')
].shape 

(0, 25)

In [307]:
# Inspecting values where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Complex Villas"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas')
].shape 

(16, 25)

In [308]:
# Update 'Unit' entries labeled as 'Complex Villas' to 'Villa' with appropriate IDs
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas'),
    ['ejari_bus_property_type_en', 'ejari_bus_property_type_id']
] = ['Villa', 4.0]

# Confirming change where "ejari_bus_property_type_en" is "Unit" & "ejari_property_type_en" is "Complex Villas"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas')
].shape 

(0, 25)

In [309]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302137
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            525
                                                    NaN                               323
                                                    Penthouse                         288
                            Penthouse               Penthouse                          90
                    

In [310]:
# Inspecting values where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Complex Villas" 
# & "ejari_property_sub_type_en" is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].shape 

(25, 25)

In [311]:
# Displaying random observations where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Complex Villas" 
# & "ejari_property_sub_type_en" is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
3952531,CNT1943246902,2,Renew,2022-08-10,2023-08-09,145000,145000,0.0,4.0,Villa,19.0,Complex Villas,35.0,Duplex,,,,315.0,Al Manara,367.0,Burj Al Arab,First Abu Dhabi Bank Metro Station,Mall of the Emirates,2.0,Authority
5778261,CNT2118724304,2,Renew,2024-08-11,2025-08-10,25000,25000,0.0,4.0,Villa,19.0,Complex Villas,35.0,Duplex,,,,399.0,Al Aweer Second,1.0,,,,1.0,Person
3951031,CNT1942809185,2,Renew,2022-07-01,2023-06-30,20000,20000,0.0,4.0,Villa,19.0,Complex Villas,35.0,Duplex,,,,399.0,Al Aweer Second,1.0,,,,1.0,Person
2905398,CNT1393223356,1,New,2021-02-25,2022-02-24,135000,135000,0.0,4.0,Villa,19.0,Complex Villas,35.0,Duplex,,,,315.0,Al Manara,367.0,Burj Al Arab,First Abu Dhabi Bank Metro Station,Mall of the Emirates,1.0,Person
4467475,CNT2066569482,2,Renew,2023-02-25,2024-02-24,150000,150000,0.0,4.0,Villa,19.0,Complex Villas,35.0,Duplex,,,,315.0,Al Manara,367.0,Burj Al Arab,First Abu Dhabi Bank Metro Station,Mall of the Emirates,1.0,Person


In [312]:
# Update values for records where ejari_bus_property_type_en is 'Villa',
# ejari_property_type_en is 'Complex Villas', and ejari_property_sub_type_en is 'Duplex'
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex'), 
    ['ejari_bus_property_type_id', 'ejari_bus_property_type_en', 'ejari_property_type_id', 'ejari_property_type_en']
] = [2.0, 'Unit', 842, 'Flat']

# Confirming change where "ejari_bus_property_type_en" is "Villa" & "ejari_property_type_en" is "Complex Villas" 
# & "ejari_property_sub_type_en" is "Duplex"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
    (filtered_rent_contracts['ejari_property_type_en'] == 'Complex Villas') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Duplex')
].shape 

(0, 25)

In [313]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302137
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            550
                                                    NaN                               323
                                                    Penthouse                         288
                            Penthouse               Penthouse                          90
                    

In [314]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Penthouse"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Penthouse')
].shape 

(288, 25)

In [315]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Penthouse"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Penthouse')
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
3349832,CNT1668408504,1,New,2021-10-09,2022-10-08,125000,125000,1.0,2.0,Unit,842.0,Flat,621.0,Penthouse,1147.0,TWO TOWERS,TECOM Site C,443.0,Al Thanyah First,172.0,Burj Al Arab,Dubai Internet City,Mall of the Emirates,1.0,Person
3462439,CNT1775569038,2,Renew,2021-10-01,2022-09-30,210000,210000,0.0,2.0,Unit,842.0,Flat,621.0,Penthouse,,,,341.0,Trade Center First,400.0,Burj Khalifa,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person
3223124,CNT1544682271,1,New,2021-09-01,2022-08-31,250000,250000,1.0,2.0,Unit,842.0,Flat,621.0,Penthouse,388.0,GOLDEN MILE,Palm Jumeirah,410.0,Palm Jumeirah,435.0,Burj Al Arab,Palm Jumeirah,Marina Mall,1.0,Person
3986388,CNT1954954814,2,Renew,2022-09-29,2023-09-28,276000,276000,1.0,2.0,Unit,842.0,Flat,621.0,Penthouse,,,Jumeriah Beach Residence - JBR,330.0,Marsa Dubai,635.0,Burj Al Arab,Jumeirah Beach Residency,Marina Mall,1.0,Person
4918336,CNT2091977421,2,Renew,2023-09-01,2024-08-31,155400,155400,1.0,2.0,Unit,842.0,Flat,621.0,Penthouse,,,Dubai Marina,330.0,Marsa Dubai,274.0,Burj Al Arab,Marina Mall Metro Station,Marina Mall,1.0,Person


In [316]:
# Update values for records where ejari_property_type_en is 'Flat' and ejari_property_sub_type_en is 'Penthouse'
filtered_rent_contracts.loc[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Penthouse'), 
    ['ejari_property_type_id', 'ejari_property_type_en', 'ejari_property_sub_type_id', 'ejari_property_sub_type_en']
] = [352361946, 'Penthouse', 621, 'Penthouse']

# Confirming changes where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Penthouse"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Penthouse')
].shape 

(0, 25)

In [317]:
# Inspecting the "ejari_bus_property_type", "ejari_property_type" and "ejari_sub_property_type_en" columns grouped together
filtered_rent_contracts.groupby("ejari_bus_property_type_en")[['ejari_property_type_en', 'ejari_property_sub_type_en']].value_counts(dropna=False)

ejari_bus_property_type_en  ejari_property_type_en  ejari_property_sub_type_en 
Unit                        Flat                    1bed room+Hall                 709948
                                                    2 bed rooms+hall               609857
                                                    Studio                         302137
                                                    3 bed rooms+hall               133089
                                                    4 bed rooms+hall                 8367
                                                    5 bed rooms+hall                  747
                                                    Duplex                            550
                            Penthouse               Penthouse                         378
                            Flat                    NaN                               323
                                                    Room                               68
                    

In [318]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Room"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Room')
].shape 

(68, 25)

In [319]:
# Displaying random observations where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Room"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Room')
].sample(10)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
4460762,CNT2065793305,1,New,2023-02-10,2024-02-09,20000,20000,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,339.0,Al Mararr,22.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person
3582032,CNT1820124529,1,New,2022-01-17,2022-12-31,25875,27038,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,362.0,Naif,51.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person
2824536,CNT1363091197,2,Renew,2021-01-01,2021-12-31,4153804,4153804,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,372.0,Al Goze Industrial Second,5146.0,Burj Al Arab,Noor Bank Metro Station,Mall of the Emirates,2.0,Authority
3274634,CNT1600201321,1,New,2021-10-13,2022-10-12,35000,35000,1.0,2.0,Unit,842.0,Flat,601.0,Room,,,,232.0,Mirdif,71.0,Dubai International Airport,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
5079895,CNT2100446000,2,Renew,2023-10-10,2024-10-09,26250,26250,1.0,2.0,Unit,842.0,Flat,601.0,Room,641.0,DUNES VILLAGE,Dubai Investment Park Second,459.0,Dubai Investment Park Second,44.0,Expo 2020 Site,,,1.0,Person
4212673,CNT2015737425,2,Renew,2022-10-15,2023-10-14,26000,26000,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,339.0,Al Mararr,32.0,Dubai International Airport,Palm Deira Metro Stations,Dubai Mall,1.0,Person
5233565,CNT2105865235,1,New,2024-01-12,2024-12-31,27225,28055,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,362.0,Naif,51.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person
5937725,CNT2122793391,2,Renew,2024-10-10,2025-10-09,30188,30188,1.0,2.0,Unit,842.0,Flat,601.0,Room,641.0,DUNES VILLAGE,Dubai Investment Park Second,459.0,Dubai Investment Park Second,44.0,Expo 2020 Site,,,1.0,Person
5341972,CNT2108232123,2,Renew,2024-02-10,2025-02-09,22000,22000,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,339.0,Al Mararr,22.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person
5859540,CNT2121668256,1,New,2024-09-10,2025-09-09,24000,24000,0.0,2.0,Unit,842.0,Flat,601.0,Room,,,,339.0,Al Mararr,22.0,Dubai International Airport,Baniyas Square Metro Station,Dubai Mall,1.0,Person


In [320]:
# Inspecting the area names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Room"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Room')
]['area_name_en'].value_counts(dropna=False)

area_name_en
Naif                            23
Al Mararr                       14
Al Goze Industrial Second        7
Al Baraha                        6
Port Saeed                       5
Mirdif                           3
Al Rega                          3
Dubai Investment Park Second     3
Al Suq Al Kabeer                 2
Al Muteena                       1
Trade Center First               1
Name: count, dtype: int64

In [321]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
].shape

(323, 25)

In [322]:
# Displaying random observations where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
].sample(10)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5808089,CNT2121399438,2,Renew,2024-08-01,2025-07-31,101640,101640,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,196.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person
5209082,CNT2105046284,2,Renew,2024-01-01,2024-12-31,52000,52000,1.0,2.0,Unit,842.0,Flat,0.0,,506.0,THE PEARL,Culture Village,334.0,Al Jadaf,58.0,Dubai International Airport,Al Jadaf Metro Station,Dubai Mall,1.0,Person
5279246,CNT2106784586,1,New,2024-01-30,2025-01-29,60000,60000,1.0,2.0,Unit,842.0,Flat,0.0,,2431.0,Westwood By Imtiaz,Al Furjan,445.0,Jabal Ali First,39.0,Sports City Swimming Academy,Ibn Battuta Metro Station,Ibn-e-Battuta Mall,1.0,Person
3716475,CNT1864499562,1,New,2022-03-07,2023-03-06,50000,50000,1.0,2.0,Unit,842.0,Flat,0.0,,1504.0,DAMAC HILLS - GOLF PROMENADE,DAMAC HILLS,523.0,Al Hebiah Third,76.0,Motor City,,,1.0,Person
3984061,CNT1954158278,2,Renew,2022-08-01,2023-07-31,84000,84000,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,196.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person
5611841,CNT2114416019,1,New,2024-06-01,2025-05-31,55000,55000,1.0,2.0,Unit,842.0,Flat,0.0,,1929.0,SIGNATURE LIVINGS,Jumeirah Village Circle,441.0,Al Barsha South Fourth,40.0,Sports City Swimming Academy,Dubai Internet City,Mall of the Emirates,1.0,Person
5973801,CNT2123453061,2,Renew,2024-10-14,2025-10-13,185000,185000,1.0,2.0,Unit,842.0,Flat,0.0,,1641.0,SERENIA RESIDENCES THE PALM,Palm Jumeirah,410.0,Palm Jumeirah,103.0,Burj Al Arab,Al Sufouh,Mall of the Emirates,1.0,Person
5401910,CNT2109648755,2,Renew,2024-03-06,2025-03-05,43000,43000,1.0,2.0,Unit,842.0,Flat,0.0,,1929.0,SIGNATURE LIVINGS,Jumeirah Village Circle,441.0,Al Barsha South Fourth,37.0,Sports City Swimming Academy,Dubai Internet City,Mall of the Emirates,1.0,Person
5876470,CNT2121750992,2,Renew,2024-08-01,2025-07-31,99825,99825,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,283.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person
4774920,CNT2084957569,1,New,2023-07-01,2024-06-30,150000,150000,1.0,2.0,Unit,842.0,Flat,0.0,,1813.0,Creek Gate,The Lagoons,447.0,Al Khairan First,137.0,Dubai International Airport,Creek Metro Station,City Centre Mirdif,1.0,Person


In [323]:
# Inspecting area names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['area_name_en'].value_counts(dropna=False)

area_name_en
Al Barsha South Fourth               128
Jabal Ali First                       44
Al Hebiah Fifth                       23
Al Hebiah Fourth                      19
Al Thanyah First                      16
Al Hebiah Third                       10
Business Bay                           9
Al Warsan First                        8
Al Jadaf                               8
Palm Jumeirah                          6
Al Khairan First                       5
Burj Khalifa                           5
Hadaeq Sheikh Mohammed Bin Rashid      4
Al Safouh Second                       4
Al Hebiah Second                       4
Al Satwa                               4
Al Thanyah Third                       4
Al Merkadh                             3
Jumeirah First                         3
Al Yelayiss 2                          3
Al Thanyah Fifth                       3
Mirdif                                 2
Jabal Ali Industrial Second            2
Wadi Al Safa 3                         2
Ras

In [324]:
# Inspecting project names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['project_name_en'].value_counts(dropna=False)

project_name_en
SIGNATURE LIVINGS                                                                119
NaN                                                                               49
Westwood By Imtiaz                                                                27
REMRAAM                                                                           23
OASIS TOWER ONE                                                                   10
SANDHURST HOUSE                                                                    8
DAMAC HILLS - GOLF VITA                                                            7
SERENIA RESIDENCES THE PALM                                                        6
HUB CANAL 2 TOWER                                                                  6
GLITZ RESIDENCE 2                                                                  4
CREEK HORIZON                                                                      3
Azizi Star Hotel Apartments                      

In [325]:
# Inspecting actual area statistics where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['actual_area'].describe()

count    323.000000
mean      68.600619
std       50.513312
min       34.000000
25%       39.000000
50%       45.000000
75%       78.000000
max      480.000000
Name: actual_area, dtype: float64

In [326]:
# Comparing actual area statistics where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "Studio"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == 'Studio')
]['actual_area'].describe()

count    3.008240e+05
mean     6.246679e+03
std      1.521118e+06
min      0.000000e+00
25%      3.500000e+01
50%      4.200000e+01
75%      4.600000e+01
max      3.731260e+08
Name: actual_area, dtype: float64

In [327]:
# Comparing actual area statistics where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is "1bed room+Hall"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == '1bed room+Hall')
]['actual_area'].describe()


count    7.024380e+05
mean     7.806390e+02
std      2.883211e+05
min      0.000000e+00
25%      6.300000e+01
50%      7.400000e+01
75%      8.500000e+01
max      1.220494e+08
Name: actual_area, dtype: float64

In [328]:
# Group the data by project name and property sub-type
project_subtype_stats = filtered_rent_contracts.groupby(['project_name_en', 'ejari_property_sub_type_en'])['actual_area'].agg(['min', 'max']).reset_index()

# Function to impute missing property sub-type based on area
def impute_property_subtype(row):
    project_name = row['project_name_en']
    subtype = row['ejari_property_sub_type_en']
    area = row['actual_area']
    
    if pd.isna(subtype):
        matching_stats = project_subtype_stats[(project_subtype_stats['project_name_en'] == project_name) & (project_subtype_stats['ejari_property_sub_type_en'].notna())]
        for _, stats_row in matching_stats.iterrows():
            min_area, max_area = stats_row['min'], stats_row['max']
            if area >= min_area and area <= max_area:
                return stats_row['ejari_property_sub_type_en']
    return subtype

# Apply the imputation function to the dataframe
filtered_rent_contracts['ejari_property_sub_type_en'] = filtered_rent_contracts.apply(impute_property_subtype, axis=1)

In [329]:
# Inspecting project names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['project_name_en'].value_counts(dropna=False)

project_name_en
NaN                49
SANDHURST HOUSE     8
Name: count, dtype: int64

In [330]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "project_name_en" is "SANDHURST HOUSE"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['project_name_en'] == 'SANDHURST HOUSE')
]['ejari_property_sub_type_en'].value_counts(dropna=False)

ejari_property_sub_type_en
2 bed rooms+hall    145
1bed room+Hall       77
NaN                   8
Name: count, dtype: int64

In [331]:
# Inspecting values where "ejari_property_type_en" is "Flat" & "project_name_en" is "SANDHURST HOUSE"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['project_name_en'] == 'SANDHURST HOUSE')
].groupby('ejari_property_sub_type_en')['actual_area'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
ejari_property_sub_type_en,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1bed room+Hall,77.0,79.402597,2.461598,77.0,77.0,80.0,80.0,86.0
2 bed rooms+hall,145.0,138.965517,20.870438,101.0,130.0,133.0,154.0,204.0


In [332]:
# Define filter for SANDHURST HOUSE flats with NaN in property sub type
sandhurst_flatsa = (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & \
                 (filtered_rent_contracts['project_name_en'] == 'SANDHURST HOUSE') & \
                 (filtered_rent_contracts['ejari_property_sub_type_en'].isna())

# Apply area-based filling for Studio and 1-bedroom categories
filtered_rent_contracts.loc[sandhurst_flatsa & (filtered_rent_contracts['actual_area'].between(47, 48)), 
                            'ejari_property_sub_type_en'] = 'Studio'

# Confirming changes where "ejari_property_type_en" is "Flat" & "project_name_en" is "SANDHURST HOUSE"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    (filtered_rent_contracts['project_name_en'] == 'SANDHURST HOUSE')
]['ejari_property_sub_type_en'].value_counts(dropna=False)

ejari_property_sub_type_en
2 bed rooms+hall    145
1bed room+Hall       77
Studio                8
Name: count, dtype: int64

In [333]:
# Inspecting project names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['project_name_en'].value_counts(dropna=False)

project_name_en
NaN    49
Name: count, dtype: int64

In [336]:
# Inspecting observations where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
].sample(5) 


Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
4578756,CNT2078175691,2,Renew,2023-04-05,2024-04-04,31000,31000,1.0,2.0,Unit,842.0,Flat,0.0,,,,International City Phase 1,343.0,Al Warsan First,67.0,,Rashidiya Metro Station,City Centre Mirdif,1.0,Person
4485996,CNT2068521081,2,Renew,2023-02-01,2024-01-31,80496,80496,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,196.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person
4896547,CNT2090762831,2,Renew,2023-08-06,2024-08-05,63000,63000,1.0,2.0,Unit,842.0,Flat,0.0,,,,Jumeriah Garden City,266.0,Al Satwa,100.0,Burj Khalifa,Trade Centre Metro Station,Dubai Mall,1.0,Person
4932731,CNT2092592118,2,Renew,2023-08-01,2024-07-31,92400,92400,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,196.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person
3188016,CNT1530083882,2,Renew,2021-08-01,2022-07-31,75000,75000,1.0,2.0,Unit,842.0,Flat,0.0,,,,TECOM Site C,443.0,Al Thanyah First,196.0,Burj Al Arab,Dubai Internet City,Marina Mall,1.0,Person


In [337]:
# Inspecting actual area statistics where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['actual_area'].describe()

count     49.000000
mean     134.571429
std       89.202018
min       41.000000
25%       66.000000
50%      107.000000
75%      196.000000
max      480.000000
Name: actual_area, dtype: float64

In [338]:
# Inspecting area names where "ejari_property_type_en" is "Flat" & "ejari_property_sub_type_en" is null
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_type_en'] == 'Flat') & 
    filtered_rent_contracts['ejari_property_sub_type_en'].isna()
]['area_name_en'].value_counts(dropna=False)

area_name_en
Al Thanyah First                16
Jabal Ali First                 11
Al Warsan First                  6
Al Thanyah Third                 4
Al Satwa                         4
Al Safouh Second                 4
Ras Al Khor Industrial Third     2
Wadi Al Safa 3                   2
Name: count, dtype: int64

In [339]:
# Group the data by area name and property sub-type
area_subtype_stats = filtered_rent_contracts.groupby(['area_name_en', 'ejari_property_sub_type_en'])['actual_area'].agg(['min', 'max']).reset_index()

# Function to impute missing property sub-type based on area
def impute_property_subtype(row):
    area_name = row['area_name_en']
    subtype = row['ejari_property_sub_type_en']
    actual_area = row['actual_area']
    
    if pd.isna(subtype):
        matching_stats = area_subtype_stats[
            (area_subtype_stats['area_name_en'] == area_name) & 
            (area_subtype_stats['ejari_property_sub_type_en'].notna()) &
            (area_subtype_stats['ejari_property_sub_type_en'].isin(['Studio', '1bed room+Hall', '2 bed rooms+hall']))
        ]
        for _, stats_row in matching_stats.iterrows():
            min_area, max_area = stats_row['min'], stats_row['max']
            if actual_area >= min_area and actual_area <= max_area:
                return stats_row['ejari_property_sub_type_en']
    return subtype

# Apply the imputation function to the dataframe
filtered_rent_contracts['ejari_property_sub_type_en'] = filtered_rent_contracts.apply(impute_property_subtype, axis=1)

# Check the results
print("Distribution of property subtypes after imputation:")
print(filtered_rent_contracts['ejari_property_sub_type_en'].value_counts(dropna=False))

Distribution of property subtypes after imputation:
ejari_property_sub_type_en
1bed room+Hall                 711917
2 bed rooms+hall               626787
Studio                         302274
3 bed rooms+hall               229167
4 bed rooms+hall                78110
5 bed rooms+hall                30531
6 bed rooms+hall                 5844
7 bed rooms+hall                 1437
8 bed rooms+hall                  986
Duplex                            550
10 bed rooms+hall                 460
Penthouse                         378
9 bed rooms+hall                  360
Room                               68
11 bed rooms+hall                  51
15 bed room+hall                   22
2 bed rooms+hall+Maids Room        20
Name: count, dtype: int64


In [340]:
# Creating a map of the numbers to fix the "ejari_property_sub_type_id" column
sub_type_id_map = filtered_rent_contracts.groupby('ejari_property_sub_type_en')['ejari_property_sub_type_id'].first().to_dict()
sub_type_id_map

{'10 bed rooms+hall': 10.0,
 '11 bed rooms+hall': 875988406.0,
 '15 bed room+hall': 801092396.0,
 '1bed room+Hall': 1.0,
 '2 bed rooms+hall': 2.0,
 '2 bed rooms+hall+Maids Room': 170501486.0,
 '3 bed rooms+hall': 3.0,
 '4 bed rooms+hall': 4.0,
 '5 bed rooms+hall': 5.0,
 '6 bed rooms+hall': 6.0,
 '7 bed rooms+hall': 7.0,
 '8 bed rooms+hall': 8.0,
 '9 bed rooms+hall': 9.0,
 'Duplex': 35.0,
 'Penthouse': 621.0,
 'Room': 601.0,
 'Studio': 11.0}

In [341]:
# Update the "ejari_property_sub_type_id" column based on the mapping
filtered_rent_contracts['ejari_property_sub_type_id'] = filtered_rent_contracts['ejari_property_sub_type_en'].map(sub_type_id_map)

In [342]:
# Displaying random observations where "ejari_property_sub_type_en" is "2 bed rooms+hall+Maids Room"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '2 bed rooms+hall+Maids Room'
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
5278084,CNT2106778585,2,Renew,2024-01-25,2025-01-24,95000,95000,0.0,2.0,Unit,842.0,Flat,170501486.0,2 bed rooms+hall+Maids Room,,,,341.0,Trade Center First,149.0,Burj Khalifa,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person
2879369,CNT1383128799,1,New,2021-02-07,2022-02-06,45000,45000,1.0,2.0,Unit,842.0,Flat,170501486.0,2 bed rooms+hall+Maids Room,1306.0,SEASONS COMMUNITY- SUMMER,Jumeirah Village Circle,441.0,Al Barsha South Fourth,118.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
4733554,CNT2083685015,2,Renew,2021-10-01,2023-09-30,70000,35000,1.0,2.0,Unit,842.0,Flat,170501486.0,2 bed rooms+hall+Maids Room,,,Jumeirah Village Circle,441.0,Al Barsha South Fourth,149.0,Sports City Swimming Academy,Dubai Internet City,Mall of the Emirates,1.0,Person
5710589,CNT2117104441,2,Renew,2023-12-01,2024-11-30,73500,73500,1.0,2.0,Unit,842.0,Flat,170501486.0,2 bed rooms+hall+Maids Room,,,Jumeirah Village Circle,441.0,Al Barsha South Fourth,141.0,Sports City Swimming Academy,Dubai Internet City,Mall of the Emirates,1.0,Person
3551097,CNT1809386429,1,New,2022-02-05,2023-02-04,85000,85000,0.0,2.0,Unit,842.0,Flat,170501486.0,2 bed rooms+hall+Maids Room,,,,341.0,Trade Center First,149.0,Burj Khalifa,Buj Khalifa Dubai Mall Metro Station,Dubai Mall,1.0,Person


In [343]:
# Replace "2 bed rooms+hall+Maids Room" with "2 bed rooms+hall"
filtered_rent_contracts['ejari_property_sub_type_en'] = filtered_rent_contracts['ejari_property_sub_type_en'].replace(
    '2 bed rooms+hall+Maids Room', '2 bed rooms+hall'
)

# Creating a map of the numbers to fix the "ejari_property_sub_type_id" column
sub_type_id_map = filtered_rent_contracts.groupby('ejari_property_sub_type_en')['ejari_property_sub_type_id'].first().to_dict()

# Update the "ejari_property_sub_type_id" column based on the mapping
filtered_rent_contracts['ejari_property_sub_type_id'] = filtered_rent_contracts['ejari_property_sub_type_en'].map(sub_type_id_map)

# Verify the changes
print("Updated distribution of property subtypes:")
print(filtered_rent_contracts['ejari_property_sub_type_en'].value_counts(dropna=False))

Updated distribution of property subtypes:
ejari_property_sub_type_en
1bed room+Hall       711917
2 bed rooms+hall     626807
Studio               302274
3 bed rooms+hall     229167
4 bed rooms+hall      78110
5 bed rooms+hall      30531
6 bed rooms+hall       5844
7 bed rooms+hall       1437
8 bed rooms+hall        986
Duplex                  550
10 bed rooms+hall       460
Penthouse               378
9 bed rooms+hall        360
Room                     68
11 bed rooms+hall        51
15 bed room+hall         22
Name: count, dtype: int64


In [344]:
# Displaying random observations where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall'
].sample(5)

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
656084,CRT2037189697,2,Renew,2022-11-01,2023-12-31,100000,100000,0.0,4.0,Villa,841.0,Villa,801092396.0,15 bed room+hall,,,,382.0,Al Waheda,,Dubai International Airport,Abu Hail Metro Station,,1.0,Person
937572,CRT2116083516,2,Renew,2024-07-01,2025-09-30,250000,250000,0.0,4.0,Villa,841.0,Villa,801092396.0,15 bed room+hall,,,,382.0,Al Waheda,929.0,Dubai International Airport,Abu Hail Metro Station,,,
490302,CRT1832505186,1,New,2022-02-01,2024-04-30,540000,270000,0.0,2.0,Unit,842.0,Flat,801092396.0,15 bed room+hall,,,,249.0,Al Muteena,302.0,Dubai International Airport,Salah Al Din Metro Station,Dubai Mall,1.0,Person
598760,CRT1979128476,2,Renew,2022-06-08,2023-06-07,180000,180000,0.0,4.0,Villa,841.0,Villa,801092396.0,15 bed room+hall,,,,278.0,Mankhool,959.0,Burj Khalifa,ADCB Metro Station,Dubai Mall,,
4272301,CNT2034326907,1,New,2022-12-01,2024-01-31,330000,282857,0.0,2.0,Unit,842.0,Flat,801092396.0,15 bed room+hall,,,,266.0,Al Satwa,35.0,Burj Khalifa,Trade Centre Metro Station,Dubai Mall,1.0,Person


In [345]:
# Inspecting actual area statistics where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall'
]['actual_area'].describe()

count      17.000000
mean      658.588235
std       516.056690
min         0.000000
25%       151.000000
50%       929.000000
75%       959.000000
max      1747.000000
Name: actual_area, dtype: float64

In [346]:
# Inspecting property type distribution where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall'
]['ejari_property_type_en'].value_counts(dropna=False)

ejari_property_type_en
Villa    18
Flat      4
Name: count, dtype: int64

In [347]:
# Inspecting annual amount statistics where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall'
]['annual_amount'].describe()

count        22.000000
mean     260186.318182
std      141168.775263
min       24000.000000
25%      185000.000000
50%      250000.000000
75%      322500.000000
max      700000.000000
Name: annual_amount, dtype: float64

Looking at these statistics for the "15 bed rooms+hall" category, there are several red flags:

1. **Area Statistics are concerning**:


    - The minimum area is 0 square meters, which is impossible

    - There's a huge variance (std of 516 m²)

    - The range from min (0 m²) to max (1,747 m²) is extremely wide

    - Some of these areas (especially the lower ones) are too small for a 15-bedroom property


2. **Property Type Distribution**:


    - 4 Flats listed as 15 bedrooms is highly unlikely

    - While 18 villas is more plausible for 15 bedrooms, it's still unusually large


3. **Annual Amount Statistics**:


    - The minimum rent of 24,000 is extremely low for a 15-bedroom property

    - Even the median (250,000) seems low for properties of this size

    - The large standard deviation suggests inconsistent data

In [349]:
# Inspecting observations where actual area is 0 & "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    (filtered_rent_contracts['actual_area'] == 0) & 
    (filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall')
]

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
357131,CRT1535724236,1,New,2021-07-01,2022-08-31,210000,210000,0.0,4.0,Villa,841.0,Villa,801092396.0,15 bed room+hall,,,,233.0,Hor Al Anz,0.0,Dubai International Airport,Abu Hail Metro Station,,,


In [350]:
# Create a boolean mask to identify the rows to delete
mask = (filtered_rent_contracts['actual_area'] == 0) & \
       (filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall')

# Use the mask to filter out the rows and create a new DataFrame
filtered_rent_contracts = filtered_rent_contracts[~mask] 

# Inspecting actual area statistics where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall'
]['actual_area'].describe()

count      16.000000
mean      699.750000
std       503.334547
min         1.000000
25%       264.250000
50%       929.000000
75%       959.000000
max      1747.000000
Name: actual_area, dtype: float64

In [352]:
# Inspecting area distribution across different "ejari_bus_property_type_en"
filtered_rent_contracts.groupby('ejari_bus_property_type_en')['actual_area'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
ejari_bus_property_type_en,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Unit,1748125.0,3470.633236,1024440.0,0.0,57.0,83.0,116.0,373126042.0
Villa,212025.0,12496.505879,2856844.0,0.0,175.0,271.0,457.0,930091045.0


In [353]:
# Shape of the dataset before filtering
print("Rent Contracts Shape Before Filtering:", filtered_rent_contracts.shape)

# Define area limits for each property type
villa_limits = (57, 15200)
unit_limits = (9, 3201)

# Filter observations based on ejari_bus_property_type_en and area limits
filtered_rent_contracts = filtered_rent_contracts[
    ((filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa') & 
     (filtered_rent_contracts['actual_area'].between(*villa_limits))) |
    ((filtered_rent_contracts['ejari_bus_property_type_en'] == 'Unit') & 
     (filtered_rent_contracts['actual_area'].between(*unit_limits)))
].copy()

# Shape of the dataset after filtering
print("Rent Contracts Shape After Filtering:", filtered_rent_contracts.shape)

Rent Contracts Shape Before Filtering: (1988961, 25)
Rent Contracts Shape After Filtering: (1866620, 25)


In [354]:
# Inspecting "ejari_property_sub_type_en" value counts after filtering
print(filtered_rent_contracts['ejari_property_sub_type_en'].value_counts(dropna=False))

ejari_property_sub_type_en
1bed room+Hall       675520
2 bed rooms+hall     588098
Studio               293386
3 bed rooms+hall     210943
4 bed rooms+hall      66933
5 bed rooms+hall      24431
6 bed rooms+hall       4348
7 bed rooms+hall        890
8 bed rooms+hall        569
Duplex                  527
Penthouse               374
10 bed rooms+hall       281
9 bed rooms+hall        202
Room                     65
11 bed rooms+hall        38
15 bed room+hall         15
Name: count, dtype: int64


In [356]:
# Inspecting actual area statistics where "ejari_property_sub_type_en" is "15 bed room+hall"
filtered_rent_contracts[
    (filtered_rent_contracts['ejari_property_sub_type_en'] == '15 bed room+hall') &
    (filtered_rent_contracts['ejari_bus_property_type_en'] == 'Villa')
]['actual_area'].describe()

count      11.000000
mean      970.181818
std       341.225678
min       445.000000
25%       929.000000
50%       959.000000
75%       959.000000
max      1747.000000
Name: actual_area, dtype: float64

To enhance the forecasting accuracy of our model, we are refining the property categories by removing rare, potentially outlier configurations that may not contribute meaningfully to investor insights. Specifically, we plan to:

1. **Remove Irregular Categories**:

	- Exclude properties with 11-bedroom and 15-bedroom configurations. These categories are sparse and introduce substantial variability in actual area sizes, making them unhelpful for reliable predictive modeling.

	- Additionally, the “Room” category will be removed due to its low count and distinct nature from full units.

2. **Consolidate Large Bedroom Counts**:

	- Group properties with 8 to 10 bedrooms into a single category. This approach will capture meaningful insights from larger properties while mitigating inconsistencies due to the variability in area data.

3. **Refine Naming for Clarity**:

	- Standardize the naming convention for other categories (e.g., “1 Bed + Hall” instead of “1bed room+Hall”) to ensure consistency and improve readability in reports and visualizations.

In [362]:
# Standardize the 'ejari_property_sub_type_en' column by stripping whitespace and converting to lowercase
filtered_rent_contracts['ejari_property_sub_type_en'] = filtered_rent_contracts['ejari_property_sub_type_en'].str.strip().str.lower()

# Remove properties with 11-bedroom and 15-bedroom configurations
filtered_rent_contracts = filtered_rent_contracts[
    ~filtered_rent_contracts['ejari_property_sub_type_en'].isin(['11 bed rooms+hall', '15 bed room+hall', 'room'])
]

# Consolidate 8, 9, and 10-bedroom properties into a single category '8-10 Bed + Hall'
filtered_rent_contracts['ejari_property_sub_type_en'] = np.where(
    filtered_rent_contracts['ejari_property_sub_type_en'].isin(['8 bed rooms+hall', '9 bed rooms+hall', '10 bed rooms+hall']),
    '8-10 Bed + Hall',
    filtered_rent_contracts['ejari_property_sub_type_en']
)

# Standardize category names for improved clarity
name_replacements = {
    '1bed room+Hall': '1 Bed + Hall',
    '2 bed rooms+hall': '2 Beds + Hall',
    '3 bed rooms+hall': '3 Beds + Hall',
    '4 bed rooms+hall': '4 Beds + Hall',
    '5 bed rooms+hall': '5 Beds + Hall',
    '6 bed rooms+hall': '6 Beds + Hall',
    '7 bed rooms+hall': '7 Beds + Hall',
    'Studio': 'Studio',
    'Duplex': 'Duplex',
    'Penthouse': 'Penthouse'
}

filtered_rent_contracts['ejari_property_sub_type_en'] = filtered_rent_contracts['ejari_property_sub_type_en'].replace(name_replacements)

# Verifying the changes
print(filtered_rent_contracts['ejari_property_sub_type_en'].value_counts())

ejari_property_sub_type_en
1 bed + hall       675520
2 beds + hall      588098
studio             293386
3 beds + hall      210943
4 beds + hall       66933
5 beds + hall       24431
6 beds + hall        4348
8-10 bed + hall      1052
7 beds + hall         890
duplex                527
penthouse             374
Name: count, dtype: int64


In [363]:
# Displaying missing values percentages for each column
filtered_rent_contracts.isnull().sum() / filtered_rent_contracts.shape[0] * 100

contract_id                    0.000000
contract_reg_type_id           0.000000
contract_reg_type_en           0.000000
contract_start_date            0.000000
contract_end_date              0.000000
contract_amount                0.000000
annual_amount                  0.000000
is_free_hold                   0.000000
ejari_bus_property_type_id     0.000000
ejari_bus_property_type_en     0.000000
ejari_property_type_id         0.000000
ejari_property_type_en         0.000000
ejari_property_sub_type_id     0.000000
ejari_property_sub_type_en     0.000000
project_number                72.104075
project_name_en               72.104075
master_project_en             49.113475
area_id                        0.000000
area_name_en                   0.000000
actual_area                    0.000000
nearest_landmark_en            8.662514
nearest_metro_en              15.150640
nearest_mall_en               15.754818
tenant_type_id                 8.798651
tenant_type_en                 8.798651


In [375]:
# Displaying random observations of the dataset
display(filtered_rent_contracts.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en,tenant_type_id,tenant_type_en
348294,CRT1518978396,1,New,2021-08-01,2022-07-31,28000,28000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,622.0,HAMZA TOWER,Dubai Sports City,435.0,Al Hebiah Fourth,80.0,Sports City Swimming Academy,Nakheel Metro Station,Marina Mall,1.0,Person
3087935,CNT1475699717,2,Renew,2021-05-25,2022-05-24,62500,62500,0.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,,,,355.0,Al Nahda First,146.0,Dubai International Airport,STADIUM Metro Station,City Centre Mirdif,1.0,Person
4183917,CNT2010168399,2,Renew,2022-11-05,2023-11-04,37000,37000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,,,Arjan,409.0,Al Barshaa South Third,78.0,Motor City,Sharaf Dg Metro Station,Mall of the Emirates,1.0,Person
4170889,CNT2007229126,2,Renew,2022-10-27,2023-10-26,55650,55650,1.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,,,The Gardens,445.0,Jabal Ali First,105.0,Sports City Swimming Academy,Ibn Battuta Metro Station,Ibn-e-Battuta Mall,1.0,Person
4185432,CNT2010359609,1,New,2022-10-15,2023-10-14,75000,75000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,,,Dubai Health Care City Phase 2,334.0,Al Jadaf,105.0,Dubai International Airport,Creek Metro Station,Dubai Mall,1.0,Person


In [374]:
# Displaying the cintract_amount statistics for each tenant type
display(filtered_rent_contracts.groupby("tenant_type_en")['contract_amount'].describe())

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
tenant_type_en,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Authority,116371.0,64841.493018,268108.1,0.0,32000.0,47000.0,68000.0,80000000.0
Person,1585904.0,73473.06672,2662165.0,0.0,36000.0,50820.0,78000.0,3300038000.0


We're dropping the `tenant_type_id` and `tenant_type_en` columns because they don't seem to provide much useful information for our analysis.  The limited categories and skewed distribution suggest they won't be strong predictors of rental prices or investment potential.

In [376]:
# Dropping tenant type columns from the dataset
filtered_rent_contracts = filtered_rent_contracts.drop(columns=['tenant_type_id', 'tenant_type_en'])

# Confirming changes by displaying the dataset shape
print("Rent Contracts Shape After Dropping Tenant Type Columns:", filtered_rent_contracts.shape)

Rent Contracts Shape After Dropping Tenant Type Columns: (1866502, 23)


Let's start with cleaning and validating `contract_amount` and `annual_amount` to remove zeroes and filter outliers. Ensuring realistic values in these columns is essential for accurate financial analysis and reliable investment insights.

In [378]:
# Changing scientific notation to float
pd.set_option('display.float_format', lambda x: '%.6f' % x)

In [380]:
# Statistical summary of contract_amount & annual_amount columns
print(filtered_rent_contracts[['contract_amount', 'annual_amount']].describe())

        contract_amount     annual_amount
count    1866502.000000    1866502.000000
mean       74774.936507      74511.775176
std      2455389.326710    2449258.616803
min            0.000000          0.000000
25%        36000.000000      37000.000000
50%        51975.000000      52000.000000
75%        80000.000000      80000.000000
max   3300037950.000000 3300037950.000000


These statistics show that both contract_amount and annual_amount columns have outliers and values set to zero, which may distort analysis. Given that min is zero, yet realistic values start at the 25th percentile, we could remove zeroes and examine the extreme high values (likely outliers). The substantial difference between mean and median (50%) suggests skewness, likely due to those high-value outliers.

In [382]:
# Filter out rows with zero values in contract_amount or annual_amount
filtered_rent_contracts = filtered_rent_contracts[
    (filtered_rent_contracts['contract_amount'] > 0) &
    (filtered_rent_contracts['annual_amount'] > 0)
]

# Check new statistics to confirm the removal
print(filtered_rent_contracts[['contract_amount', 'annual_amount']].describe())

        contract_amount     annual_amount
count    1866488.000000    1866488.000000
mean       74775.497372      74512.334068
std      2455398.526749    2449267.793889
min            1.000000          1.000000
25%        36000.000000      37000.000000
50%        51975.000000      52000.000000
75%        80000.000000      80000.000000
max   3300037950.000000 3300037950.000000


In [384]:
# Inspecting the unbeleivable max contract amount
filtered_rent_contracts.loc[
    filtered_rent_contracts['contract_amount'].idxmax()
]

contract_id                                  CRT2079944156
contract_reg_type_id                                     2
contract_reg_type_en                                 Renew
contract_start_date                    2023-04-01 00:00:00
contract_end_date                      2024-03-31 00:00:00
contract_amount                                 3300037950
annual_amount                                   3300037950
is_free_hold                                      1.000000
ejari_bus_property_type_id                        2.000000
ejari_bus_property_type_en                            Unit
ejari_property_type_id                          842.000000
ejari_property_type_en                                Flat
ejari_property_sub_type_id                        1.000000
ejari_property_sub_type_en                    1 bed + hall
project_number                                  331.000000
project_name_en                   ELITE 3 SPORTS RESIDENCE
master_project_en                        Dubai Sports Ci

In [387]:
# Calculate the 1st and 99th percentiles for both contract and annual amounts
lower_contract, upper_contract = filtered_rent_contracts['contract_amount'].quantile([0.01, 0.99])
lower_annual, upper_annual = filtered_rent_contracts['annual_amount'].quantile([0.01, 0.99])

# Filter the dataset to retain only values within the 1st and 99th percentiles for both columns
filtered_rent_contracts = filtered_rent_contracts[
    (filtered_rent_contracts['contract_amount'] >= lower_contract) & 
    (filtered_rent_contracts['contract_amount'] <= upper_contract) &
    (filtered_rent_contracts['annual_amount'] >= lower_annual) &
    (filtered_rent_contracts['annual_amount'] <= upper_annual)
]

# Statistical summary of contract_amount & annual_amount columns
print(filtered_rent_contracts[['contract_amount', 'annual_amount']].describe())

       contract_amount  annual_amount
count   1813711.000000 1813711.000000
mean      66676.892952   67296.590671
std       46605.625814   46502.085698
min       13915.000000   17500.000000
25%       37000.000000   37500.000000
50%       52000.000000   52174.000000
75%       80000.000000   80000.000000
max      336000.000000  320000.000000


In [389]:
# Changing scientific notation back
pd.reset_option('display.float_format')

In [390]:
# Displaying random observations of the dataset
display(filtered_rent_contracts.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area,nearest_landmark_en,nearest_metro_en,nearest_mall_en
5186813,CNT2104330158,2,Renew,2024-01-11,2025-01-10,55787,55787,1.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,,,The Gardens,445.0,Jabal Ali First,94.0,Sports City Swimming Academy,Ibn Battuta Metro Station,Ibn-e-Battuta Mall
832266,CRT2101160956,1,New,2023-11-18,2024-11-17,75000,75000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,40.0,SULAFA TOWER,Dubai Marina,330.0,Marsa Dubai,90.0,Burj Al Arab,Mina Seyahi,Marina Mall
4584567,CNT2078372178,2,Renew,2023-04-15,2024-04-14,230000,230000,0.0,4.0,Villa,841.0,Villa,4.0,4 beds + hall,,,,313.0,Al Saffa First,356.0,Downtown Dubai,Noor Bank Metro Station,Dubai Mall
3492543,CNT1787018575,1,New,2021-12-05,2022-12-04,85000,85000,1.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,227.0,ELITE RESIDENCE,Dubai Marina,330.0,Marsa Dubai,124.0,Burj Al Arab,Mina Seyahi,Marina Mall
5115578,CNT2101952576,2,Renew,2023-12-01,2024-11-30,54000,54000,0.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,,,,355.0,Al Nahda First,126.0,Dubai International Airport,STADIUM Metro Station,City Centre Mirdif


In [391]:
# Displaying missing values percentages in the dataset
filtered_rent_contracts.isnull().sum() / filtered_rent_contracts.shape[0] * 100

contract_id                    0.000000
contract_reg_type_id           0.000000
contract_reg_type_en           0.000000
contract_start_date            0.000000
contract_end_date              0.000000
contract_amount                0.000000
annual_amount                  0.000000
is_free_hold                   0.000000
ejari_bus_property_type_id     0.000000
ejari_bus_property_type_en     0.000000
ejari_property_type_id         0.000000
ejari_property_type_en         0.000000
ejari_property_sub_type_id     0.000000
ejari_property_sub_type_en     0.000000
project_number                72.140545
project_name_en               72.140545
master_project_en             49.138038
area_id                        0.000000
area_name_en                   0.000000
actual_area                    0.000000
nearest_landmark_en            8.562831
nearest_metro_en              15.223043
nearest_mall_en               15.813048
dtype: float64

Upon reviewing the transaction dataset, we found that the data in the nearest amenities column lacked accuracy and did not contribute meaningful insights. To maintain data quality and ensure that only relevant features are included, we will remove the nearest amenities column from this dataset as well.

In [401]:
# Selecting columns to drop
amenities_columns = [col for col in filtered_rent_contracts.columns if 'nearest' in col]

# Drop the 'nearest_amenities' column due to data inaccuracy
filtered_rent_contracts = filtered_rent_contracts.drop(columns=amenities_columns)

# Confirm column removal
print("Rent Contracts Shape After Dropping Amenities Columns:", filtered_rent_contracts.shape)

Rent Contracts Shape After Dropping Amenities Columns: (1813711, 20)


In [402]:
# Creating a dataset that matches the transactions dataset
master_project_rent_contracts = filtered_rent_contracts[
    filtered_rent_contracts['project_number'].notnull()
].copy()

# Comparing shapes before and after filtering
print("Rent Contracts Before Filtering:", filtered_rent_contracts.shape)
print("Rent Contracts After Filtering:", master_project_rent_contracts.shape)

Rent Contracts Before Filtering: (1813711, 20)
Rent Contracts After Filtering: (505290, 20)


In [403]:
# Displaying random observations of the dataset
display(master_project_rent_contracts.sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area
466103,CRT1794925226,1,New,2021-12-17,2022-12-16,155000,155000,1.0,2.0,Unit,842.0,Flat,3.0,3 beds + hall,1772.0,CREEK HORIZON,The Lagoons,447.0,Al Khairan First,177.0
831883,CRT2101127646,2,Renew,2023-11-05,2024-11-04,125000,125000,1.0,4.0,Villa,841.0,Villa,3.0,3 beds + hall,1824.0,REEM-MIRA COMMUNITY PH 3,,506.0,Al Yelayiss 1,202.0
989221,CRT2122808316,2,Renew,2024-10-06,2025-10-05,95000,95000,1.0,2.0,Unit,842.0,Flat,2.0,2 beds + hall,2210.0,MIDTOWN - NOOR,International Media Production Zone,485.0,Me'Aisem First,106.0
516289,CRT1868142696,1,New,2022-03-15,2023-03-14,140000,140000,1.0,4.0,Villa,841.0,Villa,4.0,4 beds + hall,1900.0,GOLF LINKS,Dubai World Central,462.0,Madinat Al Mataar,455.0
461998,CRT1789734176,1,New,2021-12-09,2022-12-08,90000,90000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,1747.0,5242,Dubai Marina,330.0,Marsa Dubai,64.0


In [404]:
# Missing data in the master_project_rent_contracts dataset
master_project_rent_contracts.isnull().sum() / master_project_rent_contracts.shape[0] * 100

contract_id                   0.000000
contract_reg_type_id          0.000000
contract_reg_type_en          0.000000
contract_start_date           0.000000
contract_end_date             0.000000
contract_amount               0.000000
annual_amount                 0.000000
is_free_hold                  0.000000
ejari_bus_property_type_id    0.000000
ejari_bus_property_type_en    0.000000
ejari_property_type_id        0.000000
ejari_property_type_en        0.000000
ejari_property_sub_type_id    0.000000
ejari_property_sub_type_en    0.000000
project_number                0.000000
project_name_en               0.000000
master_project_en             9.400146
area_id                       0.000000
area_name_en                  0.000000
actual_area                   0.000000
dtype: float64

In [414]:
# Displaying observations where "master_project_en" is null
display(master_project_rent_contracts[
    master_project_rent_contracts['master_project_en'].isnull()
].sample(5))

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_sub_type_id,ejari_property_sub_type_en,project_number,project_name_en,master_project_en,area_id,area_name_en,actual_area
949931,CRT2117597516,2,Renew,2024-07-07,2025-07-06,175000,175000,1.0,2.0,Unit,842.0,Flat,1.0,1 bed + hall,2015.0,BEACH VISTA,,330.0,Marsa Dubai,69.0
499410,CRT1844752576,2,Renew,2022-02-14,2022-08-13,49000,98000,1.0,4.0,Villa,841.0,Villa,3.0,3 beds + hall,1488.0,REEM - MIRA OASIS COMMUNITY,,506.0,Al Yelayiss 1,266.0
5669864,CNT2116093947,1,New,2024-07-01,2025-07-19,220000,220000,1.0,2.0,Unit,842.0,Flat,3.0,3 beds + hall,1665.0,PARAMOUNT TOWER HOTEL & RESIDENCES,,526.0,Business Bay,153.0
3538394,CNT1804458573,2,Renew,2022-01-15,2023-01-14,165000,165000,1.0,2.0,Unit,842.0,Flat,3.0,3 beds + hall,1670.0,AL HABTOOR CITY,,526.0,Business Bay,182.0
5580656,CNT2113852580,1,New,2024-05-20,2025-05-19,200000,200000,1.0,2.0,Unit,842.0,Flat,3.0,3 beds + hall,2015.0,BEACH VISTA,,330.0,Marsa Dubai,179.0
