# Jumereirah Garden City

## Objective

The objective of this analysis is to gain insights into the **Dubai real estate market** by analyzing key datasets such as **transactions**, **rent contracts**, **valuations**, **projects**, **units**, and **buildings**. The goal is to create a comprehensive model that predicts future property prices and rental yields, offering valuable insights for potential buyers, investors, and stakeholders.

## What I Will Provide

1. **Data Exploration and Cleaning**: I will thoroughly explore the datasets, clean the data by handling missing values, and remove irrelevant columns.
   
2. **Data Integration**: After filtering and cleaning the **transactions dataset**, I will integrate additional datasets like **rent contracts**, **valuations**, **projects**, **units**, and **buildings** to enrich the analysis.

3. **Feature Engineering**: I will create and engineer new features that contribute to improving the forecasting model, focusing on property type, location, project details, and other relevant variables.

4. **Model Development**: I will develop a model to forecast future property prices, leveraging the clean and enriched data from all relevant datasets.

5. **Actionable Insights**: The end result will provide actionable insights for stakeholders, offering forecasts of property prices and rental yields, and allowing better decision-making for real estate investments in Dubai.

### Step 1: Begin with the Transactions Dataset

In this first step, I will focus on the **transactions dataset**. The goal is to filter and clean the data to retain only relevant information that will contribute to our objective. Once the transactions dataset is prepared, I will gradually integrate other datasets to enrich the analysis.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [2]:
# Loading transactions dataset
transactions = pd.read_csv('../data/raw/transactions.csv')

In [5]:
# Enable displaying of all columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [6]:
# Information about transactions dataset
transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1314126 entries, 0 to 1314125
Data columns (total 46 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   transaction_id        1314126 non-null  object 
 1   procedure_id          1314126 non-null  int64  
 2   trans_group_id        1314126 non-null  int64  
 3   trans_group_ar        1314126 non-null  object 
 4   trans_group_en        1314126 non-null  object 
 5   procedure_name_ar     1314126 non-null  object 
 6   procedure_name_en     1314126 non-null  object 
 7   instance_date         1314126 non-null  object 
 8   property_type_id      1314126 non-null  int64  
 9   property_type_ar      1314126 non-null  object 
 10  property_type_en      1314126 non-null  object 
 11  property_sub_type_id  1029207 non-null  float64
 12  property_sub_type_ar  1029207 non-null  object 
 13  property_sub_type_en  1029207 non-null  object 
 14  property_usage_ar     1314126 non-

In [7]:
# Displaying first 5 rows of transactions dataset
transactions.head()

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,property_usage_ar,property_usage_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
0,1-11-2018-8205,11,1,مبايعات,Sales,بيع,Sell,13-08-2018,4,فيلا,Villa,,,,أخرى,Other,1,العقارات القائمة,Existing Properties,278,منخول,Mankhool,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,34.41,165000.0,4795.12,,,1.0,2.0,0.0
1,1-11-2016-12930,11,1,مبايعات,Sales,بيع,Sell,02-11-2016,4,فيلا,Villa,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,,,0,390.0,2089900.0,5358.72,,,1.0,1.0,0.0
2,1-11-2016-13524,11,1,مبايعات,Sales,بيع,Sell,15-11-2016,4,فيلا,Villa,,,,أخرى,Other,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,,,0,278.71,2800000.0,10046.28,,,1.0,1.0,0.0
3,2-13-2014-4939,13,2,رهون,Mortgages,تسجيل رهن,Mortgage Registration,23-06-2014,4,فيلا,Villa,,,,تجاري,Commercial,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,0,16952.94,12000000.0,707.84,,,1.0,1.0,0.0
4,1-11-2002-81,11,1,مبايعات,Sales,بيع,Sell,14-01-2002,2,مبنى,Building,,,,تجاري,Commercial,1,العقارات القائمة,Existing Properties,271,الكرامه,Al Karama,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,232.26,1500000.0,6458.28,,,2.0,1.0,0.0


In [9]:
# Reordering columns in a more logical manner and dropping non-essential columns
transactions_df = transactions[['transaction_id', 'instance_date', 'trans_group_id', 'trans_group_en', 'trans_group_ar',
                                'procedure_id', 'procedure_name_en', 'procedure_name_ar', 'property_type_id', 'property_type_en',
                                'property_type_ar', 'property_sub_type_id', 'property_sub_type_en', 'property_usage_en', 
                                'property_usage_ar', 'reg_type_id', 'reg_type_en', 'reg_type_ar', 'area_id', 'area_name_en',
                                'area_name_ar', 'master_project_en', 'master_project_ar', 'project_number', 'project_name_en',
                                'project_name_ar', 'building_name_en', 'building_name_ar', 'rooms_en', 'rooms_ar', 'has_parking', 
                                'procedure_area', 'meter_sale_price', 'actual_worth']]

# Displaying new ordered dataset
transactions_df.head()

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
0,1-11-2018-8205,13-08-2018,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Other,أخرى,1,Existing Properties,العقارات القائمة,278,Mankhool,منخول,,,,,,,,,,0,34.41,4795.12,165000.0
1,1-11-2016-12930,02-11-2016,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,276,Al Bada,البدع,,,,,,,,,,0,390.0,5358.72,2089900.0
2,1-11-2016-13524,15-11-2016,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Other,أخرى,1,Existing Properties,العقارات القائمة,276,Al Bada,البدع,,,,,,,,,,0,278.71,10046.28,2800000.0
3,2-13-2014-4939,23-06-2014,2,Mortgages,رهون,13,Mortgage Registration,تسجيل رهن,4,Villa,فيلا,,,Commercial,تجاري,1,Existing Properties,العقارات القائمة,276,Al Bada,البدع,,,,,,,,,,0,16952.94,707.84,12000000.0
4,1-11-2002-81,14-01-2002,1,Sales,مبايعات,11,Sell,بيع,2,Building,مبنى,,,Commercial,تجاري,1,Existing Properties,العقارات القائمة,271,Al Karama,الكرامه,,,,,,,,,,0,232.26,6458.28,1500000.0


In [11]:
# Looking at all the unique values count in master_project_en column
transactions_df['master_project_en'].value_counts()

master_project_en
Business Bay                                     85521
Dubai Marina                                     81884
Jumeirah Village Circle                          75720
Jumeirah Lakes Towers                            63838
Burj Khalifa                                     58352
International City Phase 1                       53040
Palm Jumeirah                                    39320
Dubai Sports City                                33783
Silicon Oasis                                    27696
Al Furjan                                        25544
The Greens                                       22530
The Lagoons                                      21414
Jumeriah Beach Residence  - JBR                  20550
Arjan                                            19451
TOWN SQUARE                                      19247
Residential Complex                              18913
DAMAC HILLS 2                                    18167
International Media Production Zone            

In [12]:
# Looking at all the unique values count in area_name_en column
transactions_df['area_name_en'].value_counts()

area_name_en
Marsa Dubai                          121524
Business Bay                          91328
Al Thanyah Fifth                      86427
Al Barsha South Fourth                76481
Burj Khalifa                          63830
Al Warsan First                       54013
Jabal Ali First                       46704
Palm Jumeirah                         39728
Wadi Al Safa 5                        39505
Al Hebiah Fourth                      38805
Al Merkadh                            33049
Hadaeq Sheikh Mohammed Bin Rashid     32800
Al Thanyah Third                      31527
Al Thanayah Fourth                    30038
Nadd Hessa                            27593
Me'Aisem First                        23963
Al Hebiah Fifth                       22908
Al Khairan First                      21410
Madinat Al Mataar                     20305
Al Barshaa South Third                19748
Wadi Al Safa 6                        19580
Al Yelayiss 2                         19259
Al Hebiah Third    

In [13]:
# Filtering the dataset by "Al Satwa" area
transactions_satwa = transactions_df[transactions_df['area_name_en'] == 'Al Satwa']

# Checking the information about the filtered dataset
transactions_satwa.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2133 entries, 21 to 1312825
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   transaction_id        2133 non-null   object 
 1   instance_date         2133 non-null   object 
 2   trans_group_id        2133 non-null   int64  
 3   trans_group_en        2133 non-null   object 
 4   trans_group_ar        2133 non-null   object 
 5   procedure_id          2133 non-null   int64  
 6   procedure_name_en     2133 non-null   object 
 7   procedure_name_ar     2133 non-null   object 
 8   property_type_id      2133 non-null   int64  
 9   property_type_en      2133 non-null   object 
 10  property_type_ar      2133 non-null   object 
 11  property_sub_type_id  111 non-null    float64
 12  property_sub_type_en  111 non-null    object 
 13  property_usage_en     2133 non-null   object 
 14  property_usage_ar     2133 non-null   object 
 15  reg_type_id           

In [16]:
# Checking the missing percentages in the filtered dataset
transactions_satwa.isnull().sum() / transactions_satwa.shape[0] * 100

transaction_id           0.000000
instance_date            0.000000
trans_group_id           0.000000
trans_group_en           0.000000
trans_group_ar           0.000000
procedure_id             0.000000
procedure_name_en        0.000000
procedure_name_ar        0.000000
property_type_id         0.000000
property_type_en         0.000000
property_type_ar         0.000000
property_sub_type_id    94.796062
property_sub_type_en    94.796062
property_usage_en        0.000000
property_usage_ar        0.000000
reg_type_id              0.000000
reg_type_en              0.000000
reg_type_ar              0.000000
area_id                  0.000000
area_name_en             0.000000
area_name_ar             0.000000
master_project_en       75.246132
master_project_ar       75.246132
project_number          94.374121
project_name_en         94.374121
project_name_ar         94.374121
building_name_en        94.796062
building_name_ar        94.796062
rooms_en                94.796062
rooms_ar      

In [29]:
# Checking the first 5 rows of the filtered dataset
display(transactions_satwa.head())

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
21,1-11-2015-5530,26-03-2015,1,Sales,مبايعات,11,Sell,بيع,2,Building,مبنى,,,Commercial,تجاري,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,40.65,49512.99,2012703.0
22,1-11-2006-1314,14-08-2006,1,Sales,مبايعات,11,Sell,بيع,1,Land,أرض,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,221.02,7012.94,1550000.0
23,1-11-2003-1074,21-07-2003,1,Sales,مبايعات,11,Sell,بيع,2,Building,مبنى,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,231.7,3366.42,780000.0
24,3-9-2006-300121,18-09-2006,3,Gifts,هبات,9,Grant,هبه,1,Land,أرض,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,19.35,6459.53,124992.0
25,1-11-2021-10599,23-06-2021,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,239.23,12540.23,3000000.0


In [15]:
# Checking the different type of transactions that occurs in Al Satwa
transactions_satwa["trans_group_en"].value_counts()

trans_group_en
Sales        1580
Mortgages     402
Gifts         151
Name: count, dtype: int64

In [21]:
# Checking the different type of transactions procedures that occurs in Al Satwa
transactions_satwa["procedure_name_en"].value_counts(dropna=False)

procedure_name_en
Sell                          1203
Mortgage Registration          327
Delayed Sell                   255
Grant                          150
Sell - Pre registration        106
Modify Mortgage                 61
Adding Land By Sell             12
Lease Finance Registration       9
Lease to Own Registration        6
Development Mortgage             1
Lease Finance Modification       1
Grant on Delayed Sell            1
Development Registration         1
Name: count, dtype: int64

In [22]:
# Checking the different type of transactions properties that occurs in Al Satwa
transactions_satwa["property_type_en"].value_counts(dropna=False)

property_type_en
Land        1144
Villa        599
Building     279
Unit         111
Name: count, dtype: int64

In [23]:
# Checking the different type of properties usage in Al Satwa
transactions_satwa["property_usage_en"].value_counts(dropna=False)

property_usage_en
Residential                 1637
Other                        271
Commercial                   182
Multi-Use                     28
Industrial                     8
Hospitality                    4
Residential / Commercial       3
Name: count, dtype: int64

In [24]:
# Checking the different type of properties registrations in Al Satwa
transactions_satwa["reg_type_en"].value_counts(dropna=False)

reg_type_en
Existing Properties    2027
Off-Plan Properties     106
Name: count, dtype: int64

In [25]:
# Checking the different master_project_en in Al Satwa
transactions_satwa["master_project_en"].value_counts(dropna=False)

master_project_en
NaN                     1605
Jumeriah Garden City     528
Name: count, dtype: int64

In [26]:
# Checking the different project_name_en in Al Satwa
transactions_satwa["project_name_en"].value_counts(dropna=False)

project_name_en
NaN                               2013
Hyde Walk Residence by Imtiaz       95
Jardin Astral                        9
The Grandala                         8
MAYFAIR GARDENS BY MAJID             3
ALBA TOWER                           2
161 Jumeirah Lane                    1
Trillium Heights                     1
171 Garden Heights                   1
Name: count, dtype: int64

In [27]:
# Checking the different building_name_en in Al Satwa
transactions_satwa["building_name_en"].value_counts(dropna=False)

building_name_en
NaN                              2022
HYDE WALK RESIDENCE BY IMTIAZ      93
Jardin Astral                       7
The Grandala                        6
Diamond Building                    5
Name: count, dtype: int64

In [30]:
# Checking the different rooms_en in Al Satwa
transactions_satwa["rooms_en"].value_counts(dropna=False)

rooms_en
NaN       2022
Studio      57
1 B/R       50
2 B/R        4
Name: count, dtype: int64

In [31]:
# Statistical summary of procedure_area column
transactions_satwa['procedure_area'].describe()

count     2133.000000
mean       773.377829
std       2243.637882
min          3.450000
25%        232.000000
50%        232.260000
75%       1068.380000
max      45545.970000
Name: procedure_area, dtype: float64

In [32]:
# Checking the statistical summary of meter_sale_price column
transactions_satwa['meter_sale_price'].describe()

count      2133.000000
mean      10314.636882
std       10816.164545
min         181.820000
25%        3014.120000
50%        7837.770000
75%       13992.880000
max      180831.830000
Name: meter_sale_price, dtype: float64

In [33]:
# Checkling the statistical summary of actual_worth column
transactions_satwa['actual_worth'].describe()

count    2.133000e+03
mean     7.332873e+06
std      2.128794e+07
min      9.000000e+03
25%      8.000000e+05
50%      1.900000e+06
75%      6.060000e+06
max      4.200000e+08
Name: actual_worth, dtype: float64

**Key Insights and Observations**

1.	**Transaction Types**:

    - The majority of the transactions are **Sales (1,580)**, with a smaller proportion for **Mortgages (402)** and **Gifts (151)**. This aligns with the dataset’s focus on property transactions, so focusing on **Sales** for further modeling could be valuable since it dominates the data.

2.	**Procedure Name**:

    - The most frequent procedure is **Sell (1,203)**, followed by **Mortgage Registration (327)**. Procedures like **Delayed Sell (255)** also appear relatively often. Procedures such as **Development Mortgage** and **Grant on Delayed Sell** are rare, suggesting they can be dropped due to their low occurrence, which may not add value to the model.

3.	**Property Types**:

    - **Land** dominates **(1,144)**, followed by **Villa (599)** and **Building (279)**. **Unit** properties are relatively few **(111)**. 

4.	**Property Usage**:

    - **Residential** is the dominant usage **(1,637)**, with a few **Commercial (182)** and Other categories. Given your focus on residential properties, this is a good direction to filter down and drop other property usages.

5.	**Registration Type**:

    - Most properties are **Existing Properties (2,027)** with a small proportion of **Off-Plan Properties (106)**. 

6.	**Master Projects**:

    - **Jumeirah Garden City** is the only named master project **(528 entries)**. The remaining entries have missing values **(1,605)**. This could indicate many of the properties are not tied to larger development projects, which may be a notable feature to investigate for further analysis or imputation.

7.	**Project and Building Names**:

    - Many entries have **missing Project (2,013)** and **Building (2,022)** names. Only a few specific projects like **Hyde Walk Residence by Imtiaz** and **Jardin Astral** are mentioned. Depending on their relevance, you may want to exclude missing project/building names or impute them where necessary.

8.	**Rooms**:

    - A large portion of entries have missing room information (**2,022 missing values**), while a small fraction specifies **Studio (57)**, **1 B/R (50)**, or **2 B/R (4)**. The high proportion of missing values here may need special attention, and filtering non-residential properties (like “Other” or “Commercial”) could reduce this issue.

9.	**Procedure Area**, **Sale Price**, and **Actual Worth**:

    - **Procedure Area** ranges significantly, with an average of **773 sq.m** but can be as large as **45,545 sq.m**.

    - The **meter sale price** also has a wide range, with an average of **10,314 AED** per sq.m, but values range from **181 AED** to **180,831 AED**, indicating the diversity of property values.

    - Actual worth ranges from **9,000 AED** to as much as **420 million AED**, reflecting a highly varied dataset in terms of property value.

### Property Type & Usage

In [57]:
import os 

# Create output folder for saving plots
output_dir = "plots/"
os.makedirs(output_dir, exist_ok=True)

# Group by property type
property_type_counts = transactions_satwa['property_type_en'].value_counts()
property_type_fig = px.bar(property_type_counts, title="Property Type Distribution", 
                           labels={'property_type_en': 'Property Type', 'variable': 'Count'}, 
                           text=property_type_counts.values)
# Hide the legend
property_type_fig.update_layout(showlegend=False)

property_type_fig.write_image(f"{output_dir}/property_type_distribution.png")
property_type_fig.show()

# Group by property usage
property_usage_counts = transactions_satwa['property_usage_en'].value_counts()
property_usage_fig = px.bar(property_usage_counts, title="Property Usage Distribution", 
                            labels={'property_usage_en': 'Property Usage', 'variable': 'Count'}, 
                            text=property_usage_counts.values)

# Hide the legend
property_usage_fig.update_layout(showlegend=False)

property_usage_fig.write_image(f"{output_dir}/property_usage_distribution.png")
property_usage_fig.show()

print("Property Type Distribution:")
print(property_type_counts)

print("Property Usage Distribution:")
print(property_usage_counts)

Property Type Distribution:
property_type_en
Land        1144
Villa        599
Building     279
Unit         111
Name: count, dtype: int64
Property Usage Distribution:
property_usage_en
Residential                 1637
Other                        271
Commercial                   182
Multi-Use                     28
Industrial                     8
Hospitality                    4
Residential / Commercial       3
Name: count, dtype: int64


**Property Type Distribution**:

- **Land (1144)**: The majority of properties in this dataset are land plots. This might indicate a high potential for new developments or investment opportunities in this area. Investors or developers looking to build may find this significant.

- **Villa (599)**: Villas represent a substantial portion of the properties, indicating that the area is residential in nature with a focus on high-value properties. Villas typically appeal to families or luxury buyers, so this could be a key selling point.

- **Building (279)**: There are a moderate number of buildings, possibly including both commercial and residential. This adds diversity to the property market and could attract investors looking for multi-unit buildings.

- **Unit (111)**: Apartments or smaller residential units make up a smaller segment, indicating that while there is availability for smaller units, the focus is more on land and villas.

**Property Usage Distribution**:

- **Residential (1637)**: The vast majority of properties are residential, which shows that this area is heavily geared towards housing rather than commercial or industrial use. This is valuable for agents targeting families, expatriates, or long-term residents.

- **Other (271) and Commercial (182)**: While significantly smaller, these categories suggest some diversity in property use, potentially offering opportunities for mixed-use developments or retail spaces.

- **Multi-Use (28)**: There is some potential for mixed-use projects, which could attract buyers looking for both commercial and residential opportunities in one place.

- **Industrial (8) and Hospitality (4)**: These are very minimal, suggesting that the area is not primarily industrial or focused on tourism, but it’s good to know there are some commercial opportunities.

**Insights**:

- The dominance of residential properties and land suggests this area is ripe for development, particularly in the residential and villa market. The significant portion of land and villas shows potential for luxury residential projects or investment in new developments.

- The relatively smaller presence of units and buildings indicates that there is room for growth in higher-density housing if developers are looking for areas with untapped potential.


### Property Size (Procedure Area)

In [58]:
# Summary statistics for property size
property_size_stats = transactions_satwa['procedure_area'].describe()
print("\nProperty Size (Procedure Area) Stats:")
print(property_size_stats)

# Histogram for property size distribution
property_size_fig = px.histogram(transactions_satwa, x='procedure_area', nbins=100, 
                                 title="Distribution of Property Sizes",
                                 labels={'procedure_area': 'Property Size (sqm)'})
property_size_fig.write_image(f"{output_dir}/property_size_distribution.png")
property_size_fig.show()


Property Size (Procedure Area) Stats:
count     2133.000000
mean       773.377829
std       2243.637882
min          3.450000
25%        232.000000
50%        232.260000
75%       1068.380000
max      45545.970000
Name: procedure_area, dtype: float64


**Property Size (Procedure Area)**:

- **Count**: 2,133 entries were analyzed, indicating a moderate sample size for property sizes in this area.

- **Mean (Average)**: The average property size is **773.38 square meters**, but with a high standard deviation, this suggests a wide variety of property sizes.

- **Standard Deviation (Std)**: At **2,243.64**, the high standard deviation shows that property sizes vary greatly. This could mean the area has both small and large properties, potentially catering to a broad market of buyers.

- **Minimum**: The smallest property size is **3.45 square meters**, which might be a very small unit or an anomaly. It could represent a storage unit or small shop space.

- **25th Percentile**: A quarter of the properties are below **232 square meters**, suggesting there are a good number of smaller residential units or apartments.

- **Median (50th Percentile)**: The median property size is **232.26 square meters**, indicating that half of the properties are around this size or smaller. This suggests that the area is more likely dominated by mid-sized homes, possibly villas or larger apartments.

- **75th Percentile**: The upper quarter of properties have sizes above **1,068 square meters**, indicating there are large villas or estates in the area.

- **Maximum**: The largest property size is a massive **45,545.97 square meters**, indicating some very large land plots or potentially commercial projects.

**Insights**:

- There is a significant variety in property sizes, with most falling into mid-sized residential plots or units (around the median of 232 square meters). However, the presence of large properties suggests opportunities for luxury developments or large estate sales.

- The mix of small, medium, and very large properties provides the agent with flexibility in targeting different buyer segments, from individuals looking for smaller units to high-end buyers or investors seeking larger plots for development.

This suggests that agents could market the diversity of property sizes, highlighting both the more accessible mid-sized properties and the luxury potential of larger estates or land plots.

### Property Worth

In [44]:
# Scatter plot for actual worth vs size
property_worth_fig = px.scatter(transactions_satwa, x='procedure_area', y='actual_worth', 
                                title="Property Worth vs Size", 
                                labels={'procedure_area': 'Property Size (sqft)', 
                                        'actual_worth': 'Actual Worth (AED)'}, 
                                opacity=0.6)
property_worth_fig.write_image(f"{output_dir}/property_worth_vs_size.png")
property_worth_fig.show()

# Checking the correlation between actual worth and procedure area
print("\nCorrelation between Actual Worth and Procedure Area:")
print(transactions_satwa['actual_worth'].corr(transactions_satwa['procedure_area']))


Correlation between Actual Worth and Procedure Area:
0.6857260289403518


**Property Worth**

1. **Moderate Positive Correlation**:

	- The correlation score of **0.69** indicates a moderately strong positive relationship between property size and actual worth. This suggests that, generally, larger properties tend to have higher worth. However, the correlation is not perfect, meaning other factors could be influencing the price beyond size alone.

2. **Outliers Affecting the Relationship**:

	- Several outliers, especially in the higher property worth range (100M AED and beyond), show that there are properties with high worth at moderate or even smaller sizes. These outliers may distort the overall correlation and should be further explored to determine why these properties deviate from the norm—likely due to premium locations, luxury features, or unique attributes.

3.	**Most Properties Cluster Below 10,000 sqm**:

	- The bulk of the data points are clustered below 10,000 square meter, with the property worth generally under 100 million AED. This reinforces the idea that most of the dataset consists of average-sized properties, possibly residential, which tend to have moderate values.

4. **Large, High-Value Properties**:

	- The few properties that exceed 20,000 sqm are outliers, and they significantly drive up the property worth to over 400 million AED. These are likely commercial, industrial, or highly exclusive residential properties, potentially land plots or estates.

### Market Demand (Sales Trends)

In [54]:
monthly_sales

instance_date
1978-04-01     1
1987-10-01     1
1990-01-01     2
1991-10-01     1
1992-03-01     1
1992-10-01     1
1993-02-01     1
1993-05-01     1
1993-10-01     1
1993-12-01     1
1994-09-01     1
1995-05-01     1
1995-06-01     1
1995-07-01     1
1995-09-01     1
1995-10-01     1
1995-11-01     1
1996-01-01     1
1997-11-01     1
1997-12-01     1
1998-01-01     1
1998-02-01     3
1998-03-01    11
1998-04-01     4
1998-05-01     7
1998-06-01     9
1998-07-01     5
1998-08-01     6
1998-09-01     1
1998-10-01     3
1998-11-01     7
1998-12-01     4
1999-01-01     1
1999-02-01     3
1999-03-01     7
1999-04-01     3
1999-05-01     6
1999-06-01    10
1999-07-01     6
1999-08-01    18
1999-09-01     4
1999-10-01     5
1999-11-01     5
1999-12-01     3
2000-01-01     4
2000-02-01     9
2000-03-01     4
2000-04-01     8
2000-05-01    11
2000-06-01     4
2000-07-01    10
2000-09-01     3
2000-10-01     3
2000-11-01     5
2000-12-01     5
2001-01-01     8
2001-02-01     8
2001-03-01     1


In [55]:
# Convert instance_date to datetime
transactions_satwa['instance_date'] = pd.to_datetime(transactions_satwa['instance_date'], format="%d-%m-%Y")

# Group by year and month to count sales transactions
monthly_sales = transactions_satwa.groupby(transactions_satwa['instance_date'].dt.to_period("M")).size()

# If your index is a PeriodIndex, convert it to string or datetime
monthly_sales.index = monthly_sales.index.to_timestamp()  # Convert PeriodIndex to datetime
# or 
monthly_sales.index = monthly_sales.index.astype(str)  # Convert PeriodIndex to string

# Recreate the plot after converting the index
sales_trend_fig = px.line(monthly_sales, title="Sales Trend Over Time",
                          labels={'value': 'Number of Sales', 'instance_date': 'Date'})

# Hide the legend
sales_trend_fig.update_layout(showlegend=False)

sales_trend_fig.write_image(f"{output_dir}/sales_trend_over_time.png")
sales_trend_fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [60]:
monthly_sales.sort_values(ascending=False)

instance_date
2024-02-01    74
2017-01-01    63
2017-03-01    57
2024-03-01    32
2024-09-01    23
2005-02-01    22
2004-04-01    20
2017-04-01    20
2017-02-01    20
1999-08-01    18
2024-01-01    18
2024-07-01    18
2017-05-01    18
2024-04-01    17
2007-03-01    17
2006-02-01    17
2004-06-01    16
2024-06-01    16
2004-03-01    16
2017-11-01    16
2003-07-01    16
2004-08-01    15
2003-12-01    15
2004-07-01    14
2024-08-01    14
2005-04-01    14
2005-06-01    14
2005-09-01    14
2005-10-01    14
2003-04-01    14
2023-03-01    14
2006-07-01    13
2024-05-01    13
2005-03-01    13
2005-01-01    13
2006-12-01    13
2004-05-01    13
2007-01-01    13
2017-10-01    13
2007-04-01    13
2003-05-01    12
2022-03-01    12
2005-08-01    12
2006-04-01    12
2006-11-01    12
2023-02-01    12
2004-09-01    11
2004-12-01    11
2005-05-01    11
2003-09-01    11
2001-07-01    11
2018-01-01    11
2005-12-01    11
2003-10-01    11
2000-05-01    11
1998-03-01    11
2023-12-01    11
2023-09-01    11


**Market Demand (Sales Trend)**:

1. **High Sales Peaks (Recent Years)**:

	- The highest sales activity occurred in **February 2024** with 74 sales, followed by **January and March 2017**, and **March 2024**, which also saw strong activity. These peak periods suggest that there was likely high demand or favorable market conditions during these months.

	- This spike in sales during 2024 might indicate a booming real estate market that could be driven by specific projects or policies that affected sales during this period.

2. **2017 Spike in Sales**:

	- The strong activity in **2017**, especially between **January** and **April**, is quite noticeable, as it ranks among the top months with the highest number of sales. This could have been driven by external factors such as significant real estate developments, promotional campaigns, or economic changes at that time.

3. **Steady Sales Growth (2000-2020)**:

	- From the year 2000 onward, there is a steady increase in sales, with occasional dips. This gradual growth indicates a healthy market expansion over the years, which might correlate with Dubai’s overall real estate development and economic growth.

	- **2005** and **2006** saw a relatively strong number of transactions compared to the surrounding years, suggesting some market growth or increased activity during this period.

4. **Early Trends and Lower Activity**:

	- The earlier periods (before 2000) show sporadic sales, indicating a much smaller market or a lower number of transactions being recorded at that time. This aligns with the idea that Dubai’s real estate market was not as large or as active in its earlier years compared to the recent two decades.

5. **Pandemic Impact**:

	- A noticeable dip is seen around **2020**, likely caused by the **COVID-19 pandemic**, which temporarily slowed down economic activities, including real estate transactions.

6. **Strong Activity in Specific Projects**:

	- Many of the top months with high sales correspond to ongoing or completed large-scale residential projects or new developments. Understanding the impact of specific projects (e.g., Dubai Hills, Jumeriah Garden City) on sales would provide more granular insights.

### Price per Sqm (Meter Sale Price)

In [71]:
# Average price per sqft for each property type
price_per_sqm = transactions_satwa.groupby('property_type_en')['meter_sale_price'].mean()

# Bar chart for price per sqft
price_per_sqm_fig = px.bar(price_per_sqm, title="Average Price per Sqm by Property Type",
                            labels={'property_type_en': 'Property Type', 'value': 'Price per Sqm (AED)'},
                            text=(price_per_sqm.values).round(2))
# Hide the legend
price_per_sqm_fig.update_layout(showlegend=False)
price_per_sqm_fig.write_image(f"{output_dir}/price_per_sqm.png")
price_per_sqm_fig.show()

print("Average price per sqft for each property type:")
print(price_per_sqm)

Average price per sqft for each property type:
property_type_en
Building    15996.036452
Land         9167.228260
Unit        24027.660811
Villa        7318.609048
Name: meter_sale_price, dtype: float64


**Average Price per Sqm by Property Type**:

1. **Units Command the Highest Price per Sqm**:

	- **Units** (apartments) have the highest average price per sqm, at **24,027.66 AED**. This indicates that units, often located in highly desirable areas or within luxury buildings, are in high demand, potentially driven by location, amenities, and smaller footprint compared to villas or land.

2. **Buildings Have a High Price per Sqm**:
	
    - **Buildings** follow closely, with an average price of **15,996.04 AED** per sqm. This suggests that entire buildings, likely in central or premium locations, are also highly valued due to their large size and potential for long-term commercial or residential use.

3. **Land and Villas Have Lower Price per Sqm**:

	- **Land** comes in at **9,167.23 AED** per sqm, which is significantly lower than units or buildings. This may reflect that land prices are more variable and depend heavily on location, zoning, and future development potential.

	- **Villas**, priced at **7,318.61 AED** per sqm, have the lowest average price among the property types. Villas typically cover larger areas, so the price per square meter is often lower compared to smaller, more compact units or buildings. This also indicates that the market for villas may focus on spaciousness rather than premium cost per square meter.

**Insights for the Agent**:

1. **Units Offer Higher Returns per Sqm**:

	- Focusing on **units** can be beneficial for maximizing returns per square meter, especially in prime areas or developments where demand for high-rise living and luxury apartments is strong.

2. **Villas Offer Affordability in Spaciousness**:

	- **Villas**, though offering larger living spaces, come with a much lower price per sqm. This could be appealing to buyers or investors looking for value in spacious residential properties at a lower price per square meter, especially in residential or suburban areas.

 3. **Land as a Long-term Investment**:

	- **Land** could be an attractive option for investors who seek long-term growth potential. The relatively lower price per sqm indicates opportunities for future development, especially in growing areas or new developments.

## Focused Analysis on Recent Residential Trends

After conducting a broad analysis of the area, including various property types like land, buildings, villas, and units, I now want to shift the focus toward more recent and relevant trends in residential properties. This step is crucial to gaining deeper insights into current investment opportunities, especially within the residential sector, which often holds the highest interest for both investors and buyers.

By filtering the dataset to cover the last 5 years, I aim to provide a more specific and up-to-date analysis. This time-bound approach will help highlight recent sales patterns and trends in the market for units and villas. It will allow me to pinpoint more actionable insights related to property prices, sizes, and their corresponding values, all while focusing exclusively on residential properties.

This more focused analysis should offer valuable information for investors or agents looking to make informed decisions about current opportunities in the residential sector.

In [73]:
# Displaying a few random rows from the dataset
transactions_satwa.sample(5)

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
714319,1-11-2003-1504,2003-10-08,1,Sales,مبايعات,11,Sell,بيع,1,Land,أرض,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,232.26,3099.97,720000.0
1277310,1-11-2016-8531,2016-07-21,1,Sales,مبايعات,11,Sell,بيع,2,Building,مبنى,,,Industrial,صناعي,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,13679.87,4020.51,55000000.0
388406,1-11-2012-20028,2012-10-22,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,1305.29,3830.57,5000000.0
253432,1-11-2013-11442,2013-04-15,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Commercial,تجاري,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,232.0,3448.28,800000.0
194995,1-11-2024-13244,2024-04-18,1,Sales,مبايعات,11,Sell,بيع,1,Land,أرض,,,Other,أخرى,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,,,,,,,,0,1315.38,18245.68,24000000.0


**Filter on Sales Transactions**

I will start by filtering the dataset to include only sales transactions. This helps narrow down the data to only those transactions that directly impact property sales trends, excluding mortgages, gifts, or other unrelated transactions.

In [74]:
# Filter on Sales Transactions
transactions_satwa_filtered = transactions_satwa[transactions_satwa['trans_group_en'] == 'Sales']

# Comparing the shapes of the dataset
print("Transactions Satwa Shape:", transactions_satwa.shape)
print("Filtered Transactions Shape:", transactions_satwa_filtered.shape)

Transactions Satwa Shape: (2133, 34)
Filtered Transactions Shape: (1580, 34)


**Focus on Residential Properties**

Next, I will filter the `property_usage_en` column to include only residential properties. This ensures that the data is clean and focused on properties that are used for residential purposes, aligning with the goal of providing insights for potential homebuyers or real estate investors.

In [75]:
# Filter on property_usage_en
transactions_satwa_filtered = transactions_satwa_filtered[transactions_satwa_filtered['property_usage_en'] == 'Residential']

# Comparing the shapes of the dataset
print("Transactions Satwa Shape:", transactions_satwa.shape)
print("Filtered Transactions Shape:", transactions_satwa_filtered.shape)

Transactions Satwa Shape: (2133, 34)
Filtered Transactions Shape: (1198, 34)


**Restrict to Villas and Units**

Finally, I will filter the `property_type_en` to include only villas and units, since these are the key residential property types that will provide the most meaningful insights for residential investors.

In [76]:
# Filter on property_type_en
transactions_satwa_filtered = transactions_satwa_filtered[transactions_satwa_filtered['property_type_en'].isin(['Unit', 'Villa'])]

# Comparing the shapes of the dataset
print("Transactions Satwa Shape:", transactions_satwa.shape)
print("Filtered Transactions Shape:", transactions_satwa_filtered.shape)

Transactions Satwa Shape: (2133, 34)
Filtered Transactions Shape: (494, 34)


In [77]:
# Checking information about the filtered dataset
transactions_satwa_filtered.info()

<class 'pandas.core.frame.DataFrame'>
Index: 494 entries, 25 to 1311990
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   transaction_id        494 non-null    object        
 1   instance_date         494 non-null    datetime64[ns]
 2   trans_group_id        494 non-null    int64         
 3   trans_group_en        494 non-null    object        
 4   trans_group_ar        494 non-null    object        
 5   procedure_id          494 non-null    int64         
 6   procedure_name_en     494 non-null    object        
 7   procedure_name_ar     494 non-null    object        
 8   property_type_id      494 non-null    int64         
 9   property_type_en      494 non-null    object        
 10  property_type_ar      494 non-null    object        
 11  property_sub_type_id  110 non-null    float64       
 12  property_sub_type_en  110 non-null    object        
 13  property_usage_en   

**Focusing on the Last 5 Years**

To focus on more recent market trends and provide insights that reflect current property sales patterns, I'll filter the dataset to include only transactions from the last 5 years.

In [78]:
# Filter the data on last 5 years
transactions_satwa_filtered_5y = transactions_satwa_filtered[
    transactions_satwa_filtered['instance_date'] >= pd.Timestamp.now() - pd.DateOffset(years=5)]

# Comparing the shapes of the dataset
print("Filtered Transactions Satwa Shape:", transactions_satwa_filtered.shape)
print("Last 5 Years Transactions Shape:", transactions_satwa_filtered_5y.shape)

Filtered Transactions Satwa Shape: (494, 34)
Last 5 Years Transactions Shape: (207, 34)


In [79]:
# Checking information about the filtered dataset
transactions_satwa_filtered_5y.info()

<class 'pandas.core.frame.DataFrame'>
Index: 207 entries, 25 to 1311413
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   transaction_id        207 non-null    object        
 1   instance_date         207 non-null    datetime64[ns]
 2   trans_group_id        207 non-null    int64         
 3   trans_group_en        207 non-null    object        
 4   trans_group_ar        207 non-null    object        
 5   procedure_id          207 non-null    int64         
 6   procedure_name_en     207 non-null    object        
 7   procedure_name_ar     207 non-null    object        
 8   property_type_id      207 non-null    int64         
 9   property_type_en      207 non-null    object        
 10  property_type_ar      207 non-null    object        
 11  property_sub_type_id  110 non-null    float64       
 12  property_sub_type_en  110 non-null    object        
 13  property_usage_en   

In [82]:
# Checking percentages of missing values
transactions_satwa_filtered_5y.isnull().sum() / transactions_satwa_filtered_5y.shape[0] * 100

transaction_id           0.000000
instance_date            0.000000
trans_group_id           0.000000
trans_group_en           0.000000
trans_group_ar           0.000000
procedure_id             0.000000
procedure_name_en        0.000000
procedure_name_ar        0.000000
property_type_id         0.000000
property_type_en         0.000000
property_type_ar         0.000000
property_sub_type_id    46.859903
property_sub_type_en    46.859903
property_usage_en        0.000000
property_usage_ar        0.000000
reg_type_id              0.000000
reg_type_en              0.000000
reg_type_ar              0.000000
area_id                  0.000000
area_name_en             0.000000
area_name_ar             0.000000
master_project_en       46.859903
master_project_ar       46.859903
project_number          48.792271
project_name_en         48.792271
project_name_ar         48.792271
building_name_en        46.859903
building_name_ar        46.859903
rooms_en                46.859903
rooms_ar      

In [80]:
# Checking random rows from the filtered dataset
transactions_satwa_filtered_5y.sample(5)

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
798198,1-102-2024-78538,2024-09-30,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,3061.0,Jardin Astral,جردن أسترل,Jardin Astral,جردن أسترل,2 B/R,غرفتين,1,135.21,18970.49,2565000.0
564006,1-102-2024-9087,2024-02-15,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,Studio,استوديو,1,35.6,29635.96,1055040.0
646718,1-102-2024-14993,2024-03-11,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,Studio,استوديو,1,35.94,29628.55,1064850.0
735486,1-11-2020-738,2020-01-20,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,232.26,23034.53,5350000.0
11466,1-102-2024-9057,2024-02-15,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,1 B/R,غرفة,1,89.64,18406.96,1650000.0


### Property Type

In [88]:
# Property Type Distribution
property_type_distribution = transactions_satwa_filtered_5y['property_type_en'].value_counts()

# Property Type Distribution Plot
property_type_fig = px.bar(property_type_distribution, 
                           title="Property Type Distribution (Last 5 Years)",
                           labels={'property_type_en': 'Property Type', 'value': 'Count'},
                           text=property_type_distribution)
property_type_fig.update_traces(texttemplate='%{text}', textposition='outside')

# Hide the legend
property_type_fig.update_layout(showlegend=False)
property_type_fig.write_image(f"{output_dir}/property_type_distribution_5_years.png")
property_type_fig.show()

# Printing the distribution of property types
print("Property Type Distribution (Last 5 Years):")
print(property_type_distribution)

Property Type Distribution (Last 5 Years):
property_type_en
Unit     110
Villa     97
Name: count, dtype: int64


**Property Type Distribution (Last 5 Years)**

- **Units** slightly outnumber **Villas**, with 110 units compared to 97 villas in the dataset.

- This distribution indicates a fairly balanced market between these two residential property types in the last five years.

- The near parity suggests that both property types are in demand. However, the slight dominance of units might indicate that apartments/units are a slightly more popular choice in this specific timeframe, possibly due to factors like affordability, location preferences, or development projects focusing on apartment buildings.

### Registration Type

In [90]:
# Registration Type Distribution
property_usage_distribution = transactions_satwa_filtered_5y['reg_type_en'].value_counts()
property_usage_fig = px.bar(property_usage_distribution, 
                            title="Registration Type Distribution (Last 5 Years)",
                            labels={'reg_type_en': 'Registration Type', 'value': 'Count'},
                            text=property_usage_distribution)
property_usage_fig.update_traces(texttemplate='%{text}', textposition='outside')
# Hide the legend
property_usage_fig.update_layout(showlegend=False)
property_usage_fig.write_image(f"{output_dir}/registration_type_distribution_5_years.png")
property_usage_fig.show()

# Printing the distribution of registration types
print("Registration Type Distribution (Last 5 Years):")
print(property_usage_distribution)

Registration Type Distribution (Last 5 Years):
reg_type_en
Off-Plan Properties    106
Existing Properties    101
Name: count, dtype: int64


**Registration Type Distribution (Last 5 Years)**

- **Off-Plan Properties** account for 106 transactions, while **Existing Properties** represent 101 transactions.

- This indicates a balanced interest in both new developments (off-plan) and existing residential properties, signaling opportunities for investors in both markets.

- The slight edge in off-plan properties might suggest recent developments or new projects gaining popularity in the last five years. These could be appealing to buyers looking for more flexible payment plans or wanting to invest in future-ready developments.



### Property Size Distribution by Property Type

In [92]:
# Property Size Distribution by Property Type
size_fig = px.box(transactions_satwa_filtered_5y, 
                  x='property_type_en', 
                  y='procedure_area',
                  title="Property Size Distribution by Property Type (Last 5 Years)",
                  labels={'procedure_area': 'Size (sqm)', 'property_type_en': 'Property Type'})
size_fig.write_image(f"{output_dir}/property_size_distribution_5_years.png")
size_fig.show()

In [98]:
print(transactions_satwa_filtered_5y.groupby('property_type_en')['procedure_area'].describe())

                  count        mean         std   min     25%     50%     75%  \
property_type_en                                                                
Unit              110.0   60.405182   24.583114  35.6   35.88   54.71   82.77   
Villa              97.0  434.468041  991.966174  25.4  231.51  232.26  234.39   

                      max  
property_type_en           
Unit               135.21  
Villa             9606.80  


**Property Size Distribution by Property Type (Last 5 Years)**

- **Units**

    - Average size: 60.41 sqm.

    - Sizes are more compact, with most units falling between 35.6 sqm and 135.21 sqm, reflecting typical apartment dimensions.

    - The relatively small variation in sizes (standard deviation of 24.58 sqm) shows a more consistent property offering, with sizes mostly concentrated within a similar range.

- **Villas**

    - Average size: 434.47 sqm, indicating a much larger footprint compared to units.

    - The size distribution has a much broader range, with significant variation (standard deviation of 991.97 sqm) and a maximum size of 9606.80 sqm. This suggests that some villas in the dataset represent large, expansive properties, which could include luxury estates.

    - The wide range in villa sizes points to different market segments, ranging from medium-sized family villas to very large luxury properties.



### Price per Square Meter by Property Type

In [100]:
# Price per Square Meter by Property Type
price_fig = px.box(transactions_satwa_filtered_5y, 
                   x='property_type_en', 
                   y='meter_sale_price',
                   title="Price per Square Meter by Property Type (Last 5 Years)",
                   labels={'meter_sale_price': 'Price per Square Meter (AED)', 'property_type_en': 'Property Type'})
price_fig.write_image(f"{output_dir}/price_per_sqm_distribution_5_years.png")
price_fig.show()

In [103]:
(transactions_satwa_filtered_5y.groupby('property_type_en')['meter_sale_price'].describe())

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
property_type_en,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Unit,110.0,24111.88,4729.211166,16666.67,20355.3775,22355.175,27684.3875,33529.08
Villa,97.0,15002.503918,12066.579849,1718.14,9902.7,12637.82,14638.77,70198.43


**Price per Square Meter by Property Type**:

1. **Overall Price Levels**:

	- **Units** have a significantly higher average price per square meter (24,111.88 AED) compared to **villas** (15,002.50 AED). This indicates that units are perceived as more valuable on a per-square-meter basis, likely because they are in prime locations or have more modern amenities.

	- Villas, though larger, are not commanding as high a price per square meter, which could be due to their location, market demand, or the fact that villa buyers prioritize space over price per square meter.

2. **Price Variability**:

	- **Villas** show much higher variability in price, with a standard deviation of 12,066.58 AED, compared to units (4,729.21 AED). This suggests that villas have a wider range of property characteristics, from lower-end to premium villas, depending on factors like location, luxury features, and exclusivity.

	- **Units**, on the other hand, appear to be more consistent in pricing, with less deviation between the lower and upper quartiles.

3. **Quartile Analysis**:

	- The **median price** for **units** is 22,355.18 AED per square meter, while the **75th percentile** (upper quartile) is 27,684.39 AED, showing that the top 25% of unit sales fetch significantly higher prices.

	- **Villas** have a **median price** of 12,637.82 AED per square meter, with the **75th percentile** reaching 14,638.77 AED. This suggests that most villas fall in a narrower range of pricing but with a few outliers (likely luxury villas) pushing the max price up to 70,198.43 AED.

 4.	**Outliers in Villa Pricing**:

	- There are some villas priced at the extreme high end, with the maximum value reaching over 70,000 AED per square meter, signaling either ultra-luxury or highly desirable villa properties. These properties could be in exclusive areas or offer unique features that drive up the price.

### Actual Worth Distribution

In [105]:
# Actual Worth Distribution
worth_fig = px.histogram(transactions_satwa_filtered_5y, 
                         x='actual_worth', 
                         nbins=50,
                         title="Actual Worth Distribution (Last 5 Years)",
                         labels={'actual_worth': 'Worth (AED)'})
worth_fig.write_image(f"{output_dir}/actual_worth_distribution_5_years.png")
worth_fig.show()

In [106]:
transactions_satwa_filtered_5y['actual_worth'].describe()

count    2.070000e+02
mean     2.891224e+06
std      4.915825e+06
min      3.060000e+05
25%      1.149000e+06
50%      1.865000e+06
75%      3.000000e+06
max      6.500000e+07
Name: actual_worth, dtype: float64

**Actual Worth Distribution**:

1. **Concentration of Properties in the Lower Range**:

	- The majority of properties are concentrated in the lower price range, specifically between 300,000 AED and 10 million AED, as shown by the tall bar near the lower end of the distribution.

	- This indicates that most properties in the last five years have an actual worth of **less than 10 million AED**, with a significant clustering below 5 million AED. This suggests that the market has a high volume of **mid-range residential properties**, such as average-sized villas and units.

2. **Outliers and High-Value Properties**:

	- There are a few **outliers** in the dataset, with property values reaching as high as **65 million AED**. These represent ultra-luxury properties, but they are relatively rare.

	- While the mean actual worth is approximately **2.89 million AED**, the presence of high-value outliers inflates this average. The **median** (1.865 million AED) is a better representation of typical properties in this dataset.

3. **Price Range Spread**:

	- The **standard deviation** is **4.92 million AED**, indicating a wide spread of property values, from lower-end to high-end properties.

	- The **upper quartile** (75th percentile) is **3 million AED**, showing that only the top 25% of properties exceed this value, while 75% of the properties are valued below it.

4. **Luxury Market Potential**:

	- The properties worth more than **20 million AED** are rare, signaling an exclusive high-end market. These properties could represent luxury villas or prime-location units, and targeting this niche could yield significant returns for agents who focus on premium property sales.

### Removing Outliers

In [107]:
display(transactions_satwa_filtered_5y.sample(10))

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
1311413,1-11-2020-2969,2020-03-12,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,594.58,3363.72,2000000.0
420212,1-102-2024-7536,2024-02-07,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,Studio,استوديو,1,39.59,26016.67,1030000.0
748949,1-102-2024-14041,2024-03-07,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,1 B/R,غرفة,1,87.56,21687.99,1899000.0
366743,1-102-2024-70498,2024-09-11,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,3061.0,Jardin Astral,جردن أسترل,Jardin Astral,جردن أسترل,2 B/R,غرفتين,1,135.21,20338.73,2750000.0
983835,1-11-2020-856,2020-01-23,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,234.39,8625.84,2021810.0
1272518,1-11-2023-2688,2023-02-06,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,227.61,18013.27,4100000.0
769453,1-102-2024-9444,2024-02-16,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,1 B/R,غرفة,1,87.31,17050.11,1488645.0
636973,1-11-2022-3859,2022-03-01,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,1393.55,6099.53,8500000.0
461197,1-102-2024-60563,2024-08-13,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,3046.0,The Grandala,جراندالا,The Grandala,ذا جراندالا,Studio,استوديو,1,46.42,18071.52,838880.0
1078232,1-102-2024-10047,2024-02-20,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,1 B/R,غرفة,1,87.31,21360.67,1865000.0


In [108]:
# Statistical summary on the procedure_area column
transactions_satwa_filtered_5y['procedure_area'].describe()

count     207.000000
mean      235.690676
std       702.775464
min        25.400000
25%        53.170000
50%        89.640000
75%       232.260000
max      9606.800000
Name: procedure_area, dtype: float64

In [112]:
# Locating the the observation where the procedure_area is 9606 sqm
transactions_satwa_filtered_5y.loc[transactions_satwa_filtered_5y['procedure_area'].idxmax()]



transaction_id               1-11-2020-9639
instance_date           2020-10-27 00:00:00
trans_group_id                            1
trans_group_en                        Sales
trans_group_ar                      مبايعات
procedure_id                             11
procedure_name_en                      Sell
procedure_name_ar                       بيع
property_type_id                          4
property_type_en                      Villa
property_type_ar                       فيلا
property_sub_type_id                    NaN
property_sub_type_en                    NaN
property_usage_en               Residential
property_usage_ar                      سكني
reg_type_id                               1
reg_type_en             Existing Properties
reg_type_ar                العقارات القائمة
area_id                                 266
area_name_en                       Al Satwa
area_name_ar                         السطوه
master_project_en                       NaN
master_project_ar               

In [113]:
# Locate the row with the maximum procedure_area
outlier_index = transactions_satwa_filtered_5y['procedure_area'].idxmax()

# Remove the outlier from the dataset
transactions_satwa_filtered_5y_no_outliers = transactions_satwa_filtered_5y.drop(outlier_index)

# Confirm the removal by checking the updated statistics
print(transactions_satwa_filtered_5y_no_outliers['procedure_area'].describe())

count     206.000000
mean      190.199854
std       256.601058
min        25.400000
25%        52.400000
50%        89.640000
75%       232.260000
max      1393.550000
Name: procedure_area, dtype: float64


In [114]:
transactions_satwa_filtered_5y_no_outliers.sample(10)

Unnamed: 0,transaction_id,instance_date,trans_group_id,trans_group_en,trans_group_ar,procedure_id,procedure_name_en,procedure_name_ar,property_type_id,property_type_en,property_type_ar,property_sub_type_id,property_sub_type_en,property_usage_en,property_usage_ar,reg_type_id,reg_type_en,reg_type_ar,area_id,area_name_en,area_name_ar,master_project_en,master_project_ar,project_number,project_name_en,project_name_ar,building_name_en,building_name_ar,rooms_en,rooms_ar,has_parking,procedure_area,meter_sale_price,actual_worth
1036559,1-102-2024-76418,2024-09-25,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,3046.0,The Grandala,جراندالا,The Grandala,ذا جراندالا,Studio,استوديو,1,44.81,20204.2,905350.0
1207580,1-11-2024-19480,2024-06-11,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,225.29,13538.11,3050000.0
358634,1-102-2024-51275,2024-07-16,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,Studio,استوديو,1,39.67,25207.97,1000000.0
549722,1-11-2024-36960,2024-10-02,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,231.05,12984.2,3000000.0
255992,1-102-2024-14602,2024-03-08,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,Studio,استوديو,1,54.71,20654.36,1130000.0
1128265,1-11-2022-12924,2022-06-10,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,232.26,12055.46,2800000.0
605173,1-102-2024-11422,2024-02-26,1,Sales,مبايعات,102,Sell - Pre registration,بيع - تسجيل مبدئى,3,Unit,وحدة,60.0,Flat,Residential,سكني,0,Off-Plan Properties,على الخارطة,266,Al Satwa,السطوه,Jumeriah Garden City,جميرا جاردن ستي,2860.0,Hyde Walk Residence by Imtiaz,هايد ووك ريزيدنس من امتياز,HYDE WALK RESIDENCE BY IMTIAZ,هايد ووك ريزيدنس من امتياز,1 B/R,غرفة,1,80.7,18339.53,1480000.0
778319,1-11-2024-33090,2024-09-09,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,232.26,12916.56,3000000.0
1004342,1-11-2021-18817,2021-10-26,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,234.58,9058.74,2125000.0
799444,1-11-2023-19455,2023-06-20,1,Sales,مبايعات,11,Sell,بيع,4,Villa,فيلا,,,Residential,سكني,1,Existing Properties,العقارات القائمة,266,Al Satwa,السطوه,,,,,,,,,,0,227.61,15816.53,3600000.0


In [116]:
transactions_satwa_filtered_5y_no_outliers = transactions_satwa_filtered_5y_no_outliers[['instance_date', 'property_type_en', 'reg_type_en',
                                                                                         'master_project_en', 'project_name_en', 'building_name_en',
                                                                                         'rooms_en', 'has_parking', 'procedure_area', 'meter_sale_price',
                                                                                         'actual_worth']]
transactions_satwa_filtered_5y_no_outliers.head()

Unnamed: 0,instance_date,property_type_en,reg_type_en,master_project_en,project_name_en,building_name_en,rooms_en,has_parking,procedure_area,meter_sale_price,actual_worth
25,2021-06-23,Villa,Existing Properties,,,,,0,239.23,12540.23,3000000.0
760,2024-05-02,Villa,Existing Properties,,,,,0,1393.55,5740.73,8000000.0
11465,2024-02-16,Unit,Off-Plan Properties,Jumeriah Garden City,Hyde Walk Residence by Imtiaz,HYDE WALK RESIDENCE BY IMTIAZ,Studio,1,35.6,27247.19,970000.0
11466,2024-02-15,Unit,Off-Plan Properties,Jumeriah Garden City,Hyde Walk Residence by Imtiaz,HYDE WALK RESIDENCE BY IMTIAZ,1 B/R,1,89.64,18406.96,1650000.0
18267,2022-01-26,Villa,Existing Properties,,,,,0,234.67,10333.66,2425000.0


In [119]:
display(transactions_satwa_filtered_5y_no_outliers.sample(5))

Unnamed: 0,instance_date,property_type_en,reg_type_en,master_project_en,project_name_en,building_name_en,rooms_en,has_parking,procedure_area,meter_sale_price,actual_worth
569545,2023-03-27,Villa,Existing Properties,,,,,0,278.71,8880.2,2475000.0
408412,2024-04-02,Villa,Existing Properties,,,,,0,233.37,9641.34,2250000.0
358633,2024-02-15,Unit,Off-Plan Properties,Jumeriah Garden City,Hyde Walk Residence by Imtiaz,HYDE WALK RESIDENCE BY IMTIAZ,1 B/R,1,71.62,20664.62,1480000.0
1078233,2024-02-15,Unit,Off-Plan Properties,Jumeriah Garden City,Hyde Walk Residence by Imtiaz,HYDE WALK RESIDENCE BY IMTIAZ,Studio,1,54.71,22299.4,1220000.0
778319,2024-09-09,Villa,Existing Properties,,,,,0,232.26,12916.56,3000000.0


### Step 1: Property Type

1. **Transaction Volume by Property Type**:

    - We will first determine the number of transactions for each property type (Unit vs. Villa).
    
    - This will show which property type is more actively sold, providing insights into demand trends.

2. **Price per Square Meter by Property Type**:

    - We’ll calculate the average price per square meter for both Units and Villas.

    - This can help the real estate agent understand whether units or villas are priced at a premium and which property type provides a better return on investment.

3. **Size Distribution by Property Type**:

    - We’ll analyze the average property size (procedure area) for Units and Villas.

    - This can provide insights into whether larger properties are associated with higher prices or if there’s a sweet spot in size for a better price per square meter.

4. **Price Distribution for Units vs. Villas**:

    - We’ll visualize the distribution of actual worth for Units and Villas.
    
    - This will highlight any major differences in price ranges between these two property types.

**1.1 Transaction Volume by Property Type**

We will begin by analyzing how many transactions occurred for Units vs. Villas. This will give us a quick insight into which property type is more actively sold.

In [131]:
# Transaction volume by property type
property_type_volume = transactions_satwa_filtered_5y_no_outliers['property_type_en'].value_counts()

# Plotting the transaction volume
fig_property_type_volume = px.bar(property_type_volume, 
                                  title="Transaction Volume by Property Type (Last 5 Years)", 
                                  labels={'property_type_en': 'Property Type', 'value': 'Number of Transactions'},
                                  text=property_type_volume.values)
# Hide the legend
fig_property_type_volume.update_layout(showlegend=False)

fig_property_type_volume.write_image(f"{output_dir}/transaction_volume_by_property_type_5_years.png")
fig_property_type_volume.show()

# Displaying the transactions volume by property type
print("Transaction Volume by Property Type (Last 5 Years):")
print(property_type_volume)

Transaction Volume by Property Type (Last 5 Years):
property_type_en
Unit     110
Villa     96
Name: count, dtype: int64


**Transaction Volume by Property Type (Last 5 Years) Insights**

1. **Balanced Demand Between Units and Villas**:

    - The transaction volume between **Units (110 transactions)** and **Villas (96 transactions)** is relatively balanced.

    - This suggests that there is strong interest in both property types, making it crucial for the agent to market both types effectively.

2. **Slightly Higher Demand for Units**:

    - Units had a slightly higher number of transactions. This may indicate that buyers are more inclined towards smaller, more affordable properties compared to Villas.

    - For marketing purposes, it would be beneficial to target a broader range of buyers for Units, such as first-time homebuyers or investors looking for rental properties.

3. **Villas for High-End Market**:

    - Despite having slightly fewer transactions, the Villas market is still robust. This could indicate that Villas cater to a higher-end market segment where buyers prioritize space and luxury.

    - The agent can use this information to focus on marketing Villas to wealthier clients or families looking for larger properties with more space and privacy.

4. **Key Marketing Focus**:

    - For **Units**: Emphasize affordability, rental yield, and convenience (especially for younger buyers, small families, and investors).
    
    - For **Villas**: Highlight exclusivity, luxury, and spaciousness (targeted more at families, long-term residents, or high-net-worth individuals).

**1.2 Price per Square Meter by Property Type**

Next, we’ll calculate and visualize the average price per square meter for Units and Villas.

In [132]:
# Average price per square meter by property type
price_per_sqm = transactions_satwa_filtered_5y_no_outliers.groupby('property_type_en')['meter_sale_price'].mean()

# Plotting the average price per square meter
fig_price_per_sqm = px.bar(price_per_sqm, 
                           title="Average Price per Square Meter by Property Type (Last 5 Years)", 
                           labels={'index': 'Property Type', 'value': 'Price per Square Meter (AED)'},
                           text=price_per_sqm.values.round(2))
# Hide the legend
fig_price_per_sqm.update_layout(showlegend=False)
fig_price_per_sqm.write_image(f"{output_dir}/price_per_sqm_by_property_type_5_years.png")
fig_price_per_sqm.show()

print("Statistical summary on Price per Square Meter by Property Type (Last 5 Years):")
print(transactions_satwa_filtered_5y_no_outliers.groupby('property_type_en')['meter_sale_price'].describe())

Statistical summary on Price per Square Meter by Property Type (Last 5 Years):
                  count          mean           std       min         25%  \
property_type_en                                                            
Unit              110.0  24111.880000   4729.211166  16666.67  20355.3775   
Villa              96.0  15088.300417  12100.143336   1718.14  10036.5075   

                        50%         75%       max  
property_type_en                                   
Unit              22355.175  27684.3875  33529.08  
Villa             12777.190  14746.4075  70198.43  


**Price per Square Meter by Property Type (Last 5 Years) Insights**

1. **Higher Price per Square Meter for Units**:

    - **Units** have an average price per square meter of **24,111.88 AED**, which is significantly higher than Villas at **15,088.30 AED**.

    - This suggests that Units are commanding a premium, likely due to their location in higher-demand areas, smaller sizes (making them more affordable overall), or greater appeal to investors looking for rental properties with higher yields.

2. **Villas Represent Good Value for Space**:

    - Despite the higher total value of Villas, the **price per square meter is lower**. This may indicate that Villas offer better value for buyers looking for more space.

    - The agent can use this insight to highlight the spaciousness and overall value that Villas provide, especially for buyers or investors seeking larger properties at a comparatively lower price per square meter.

3. **Wide Range in Villa Prices**:

    - The price range for Villas is broader, with a minimum of 1,718.14 AED per sqm and a maximum of 70,198.43 AED per sqm. This reflects a diverse market where Villas can cater to both middle-income and high-end buyers, depending on location and property features.

    - Agents can segment Villas by price range and target specific types of buyers, from luxury investors to families seeking larger homes with more value for money.

4. **Units as Prime Investment Properties**:

    - With a high price per square meter and lower standard deviation compared to Villas, Units may appeal more to investors seeking properties with higher immediate returns and predictability in the market.

    - The agent can highlight the **consistent high value** of Units and their strong investment potential, especially for short-term rentals or smaller properties in high-demand areas.



**1.3 Size Distribution by Property Type**

Now, let’s analyze the distribution of property sizes for Units and Villas.

In [135]:
# Property size distribution by property type
size_distribution = transactions_satwa_filtered_5y_no_outliers.groupby('property_type_en')['procedure_area'].describe()

# Plotting the size distribution
fig_size_distribution = px.box(transactions_satwa_filtered_5y_no_outliers, 
                               x='property_type_en', 
                               y='procedure_area', 
                               title="Property Size Distribution by Property Type (Last 5 Years)", 
                               labels={'property_type_en': 'Property Type', 'procedure_area': 'Size (sqm)'})
fig_size_distribution.write_image(f"{output_dir}/size_distribution_by_property_type_5_years.png")
fig_size_distribution.show()

# Displaying the statistical summary of property size by property type
print("Statistical summary on Procedure Area by Property Type (Last 5 Years):")
print(size_distribution)

Statistical summary on Procedure Area by Property Type (Last 5 Years):
                  count        mean         std   min      25%     50%  \
property_type_en                                                         
Unit              110.0   60.405182   24.583114  35.6   35.880   54.71   
Villa              96.0  338.922917  315.488864  25.4  231.395  232.26   

                      75%      max  
property_type_en                    
Unit               82.770   135.21  
Villa             233.625  1393.55  


**Property Size Distribution by Property Type (Last 5 Years) Insights**

1. **Villas Offer Significantly Larger Spaces**:

    - **Villas** have a much larger average size at **338.92 sqm** compared to **Units**, which average **60.41 sqm**. This is consistent with the general expectation that Villas provide more living space.

    - The agent can use this insight to target families or buyers seeking larger living spaces, emphasizing that Villas are a great option for those looking for spacious properties.

2. **Concentrated Distribution for Units**:

    - **Units** have a relatively tight range in size, with a 75th percentile of **82.77 sqm**, and a maximum size of **135.21 sqm**. This suggests that Units are typically small, compact properties, ideal for individual buyers, couples, or investors looking for rental properties.

    - This can be used to attract buyers looking for more affordable or manageable properties, such as young professionals or small families.

3. **Wide Range of Villa Sizes**:

    - The **Villa** market displays a broader range of property sizes, from a minimum of **25.4 sqm** to a maximum of **1,393.55 sqm**. This indicates that Villas are available in various sizes, catering to different buyer profiles, from smaller Villas to luxury properties with extensive land.

    - The agent can highlight this range of options, catering to both mid-market buyers and those seeking high-end, large Villas.

4. **Appealing to Different Buyer Segments**:

    - The contrast in sizes between Villas and Units highlights the opportunity to target different buyer segments. Villas can be marketed to those seeking more space, privacy, and luxury, while Units can be promoted to individuals or investors seeking affordable, low-maintenance properties in high-demand locations.

    - This insight can help the agent design more effective marketing strategies for different property types, addressing the needs of diverse buyers in the real estate market.

In [136]:
transactions_satwa_filtered_5y_no_outliers.to_csv("satwa_transactions_cleaned.csv", index=False)

**1.4 Price Distribution for Units vs. Villas**

Lastly, we’ll plot the distribution of actual property worth for Units and Villas.

In [137]:
# Price distribution by property type
fig_price_distribution = px.box(transactions_satwa_filtered_5y_no_outliers, 
                                x='property_type_en', 
                                y='actual_worth', 
                                title="Price Distribution by Property Type (Last 5 Years)", 
                                labels={'property_type_en': 'Property Type', 'actual_worth': 'Property Worth (AED)'})
fig_price_distribution.write_image(f"{output_dir}/price_distribution_by_property_type_5_years.png")
fig_price_distribution.show()

# Price statistical summary by property type
print("Price Statitical Summary by Property Type (Last 5 Years):")
print(transactions_satwa_filtered_5y_no_outliers.groupby('property_type_en')['actual_worth'].describe())

Price Statitical Summary by Property Type (Last 5 Years):
                  count          mean           std       min        25%  \
property_type_en                                                           
Unit              110.0  1.369903e+06  3.946362e+05  838880.0  1036260.0   
Villa              96.0  3.987438e+06  2.778937e+06  306000.0  2493750.0   

                        50%        75%         max  
property_type_en                                    
Unit              1192000.0  1671902.5   2750000.0  
Villa             3000000.0  5312500.0  15595000.0  


**Price Distribution by Property Type (Last 5 Years) Insights**


1. **Villas**:

- Average worth (mean): AED 3.99 million.

- Villas show a wide range of property values, with a minimum of AED 306,000 and a maximum of AED 15.6 million. The 25th percentile sits at AED 2.49 million, while the median (50th percentile) is AED 3 million, indicating most villas are valued around AED 3 million or higher.

- The standard deviation is quite high (AED 2.78 million), indicating significant variability in villa prices, with a few extremely high-value properties driving the variance.

2. **Units**:

- Average worth (mean): AED 1.37 million.

- The range for units is narrower compared to villas, with prices starting at AED 838,880 and reaching a maximum of AED 2.75 million. The median (50th percentile) price is AED 1.19 million.

- The standard deviation (AED 394,636) is much smaller than for villas, showing more consistency in unit pricing, and no extreme outliers like we see with villas.

**Strategic Insight for the Real Estate Agent**:

1. **Villa Market**:

- Villas cater to higher-value investors, with a substantial portion priced over AED 3 million. The higher variability and the presence of a few luxury properties (above AED 10 million) may indicate that luxury villas in prime locations represent a premium market segment. Targeting high-net-worth buyers would be key for this segment.

- The range of pricing suggests flexibility for both mid-tier buyers and those looking for high-end, exclusive properties.

2. **Unit Market**:

- Units offer a more stable and predictable investment opportunity, with most properties priced between AED 1 million and AED 1.67 million. This price range is attractive for mid-tier investors, expatriates, and buyers looking for rental properties.

- The tighter price distribution means that buyers have less risk in terms of price volatility, making units a safer investment compared to villas.

3. **Marketing Strategy**:

- Villas should be marketed as luxury or premium investments, particularly highlighting the unique and upscale features of high-end properties.

- Units should focus on value for money, return on investment, and predictability, especially appealing to expatriates or investors seeking rental income.

This analysis would provide useful guidance for tailoring marketing and sales strategies to the needs of different client segments based on their budget and investment goals.

In [138]:
rent_contracts = pd.read_csv("../data/raw/rent_contracts.csv")

In [139]:
rent_contracts.head()

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,property_usage_en,property_usage_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,area_id,area_name_ar,area_name_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en
0,CRT1012981266,1,جديد,New,07-04-2019,06-04-2020,85000,85000,1,1,1.0,2.0,وحدة,Unit,2.0,Office,مكتب,422.0,Office,مكتب,Commercial,تجاري,467.0,إمباير هايتس,EMPIRE HEIGHTS,الخليج التجاري,Business Bay,526.0,الخليج التجارى,Business Bay,140.0,وسط مدينة دبي,Downtown Dubai,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
1,CRT1012983196,1,جديد,New,20-04-2019,19-04-2020,110000,110000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,2.0,2 bed rooms+hall,غرفتين و صالة,Residential,سكني,,,,قرية جميرا المثلثة,Jumeirah Village Triangle,442.0,البرشاء جنوب الخامسة,Al Barsha South Fifth,734.0,أكاديمية المدينة الرياضية للسباحة,Sports City Swimming Academy,محطة مترو النخيل,Nakheel Metro Station,مارينا مول,Marina Mall,1.0,شخص,Person
2,CRT1012984226,1,جديد,New,11-04-2019,10-04-2020,100000,100000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,3.0,3 bed rooms+hall,ثلاثة غرفة و صالة,Residential,سكني,1488.0,ريم - ميرا أوسيس كوميونتي,REEM - MIRA OASIS COMMUNITY,,,506.0,اليلايس 1,Al Yelayiss 1,324.0,دورة دبي للدراجات,Dubai Cycling Course,,,,,1.0,شخص,Person
3,CRT1012984996,2,تجديد,Renew,18-03-2019,17-03-2020,150000,150000,1,1,1.0,4.0,فيلا,Villa,841.0,Villa,فيلا,3.0,3 bed rooms+hall,ثلاثة غرفة و صالة,Residential,سكني,1377.0,أرابيان رانشز - مجمع بالما,ARABIAN RANCHES - PALMA COMMUNITY,المرابع العربية 2 - بالما,Arabian Ranches II - PALMA,463.0,وادي الصفا 7,Wadi Al Safa 7,405.0,موتور سيتي,Motor City,,,,,1.0,شخص,Person
4,CRT1012986616,1,جديد,New,15-04-2019,14-04-2020,95000,95000,1,1,1.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,Residential,سكني,,,,جميرا بيتش ريزيدنس - الجيه بي آر,Jumeriah Beach Residence - JBR,330.0,مرسى دبي,Marsa Dubai,103.0,برج العرب,Burj Al Arab,مساكن شاطئ جميرا,Jumeirah Beach Residency,مارينا مول,Marina Mall,,,


In [140]:
rent_contracts_satwa = rent_contracts[rent_contracts['area_name_en'] == 'Al Satwa']

print("Shape of Rent Contracts Dataset:", rent_contracts.shape)
print("Shape of Rent Contracts in Al Satwa:", rent_contracts_satwa.shape)

Shape of Rent Contracts Dataset: (7810167, 40)
Shape of Rent Contracts in Al Satwa: (35305, 40)


In [141]:
print("Shape of Rent Contracts in Al Satwa:", rent_contracts_satwa.shape)

rent_contracts_satwa = rent_contracts_satwa[rent_contracts_satwa['property_usage_en'] == 'Residential']

print("Shape of Rent Contracts in Al Satwa After filtering:", rent_contracts_satwa.shape)

Shape of Rent Contracts in Al Satwa: (35305, 40)
Shape of Rent Contracts in Al Satwa After filtering: (24663, 40)


In [142]:
rent_contracts_satwa = rent_contracts_satwa.drop(columns=['property_usage_en', 'property_usage_ar'])

In [143]:
rent_contracts_satwa = rent_contracts_satwa.drop(columns=['area_id', 'area_name_en', 'area_name_ar'])

In [144]:
rent_contracts_satwa.head()

Unnamed: 0,contract_id,contract_reg_type_id,contract_reg_type_ar,contract_reg_type_en,contract_start_date,contract_end_date,contract_amount,annual_amount,no_of_prop,line_number,is_free_hold,ejari_bus_property_type_id,ejari_bus_property_type_ar,ejari_bus_property_type_en,ejari_property_type_id,ejari_property_type_en,ejari_property_type_ar,ejari_property_sub_type_id,ejari_property_sub_type_en,ejari_property_sub_type_ar,project_number,project_name_ar,project_name_en,master_project_ar,master_project_en,actual_area,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,tenant_type_id,tenant_type_ar,tenant_type_en
1226,CRT1014406536,1,جديد,New,04-04-2019,03-04-2020,43000,43000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,11.0,Studio,أستوديو,,,,,,23.0,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person
1513,CRT1015217806,1,جديد,New,08-04-2019,07-05-2020,160000,160000,1,1,0.0,4.0,فيلا,Villa,841.0,Villa,فيلا,7.0,7 bed rooms+hall,سبعة غرف و صالة,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,
1689,CRT1015412266,2,تجديد,Renew,12-02-2019,11-02-2020,96000,96000,1,1,0.0,4.0,فيلا,Villa,841.0,Villa,فيلا,4.0,4 bed rooms+hall,اربعة غرف و صالة,,,,,,,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,,,
2390,CRT1017677636,2,تجديد,Renew,01-01-2019,31-12-2019,36000,36000,1,1,0.0,2.0,وحدة,Unit,842.0,Flat,شقه,1.0,1bed room+Hall,غرفة و صالة,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,
3527,CRT1019228486,1,جديد,New,01-05-2019,30-04-2020,200000,200000,1,1,0.0,4.0,فيلا,Villa,841.0,Villa,فيلا,4.0,4 bed rooms+hall,اربعة غرف و صالة,,,,,,0.0,برج خليفة,Burj Khalifa,محطة مترو بوج خليفة دبي مول,Buj Khalifa Dubai Mall Metro Station,مول دبي,Dubai Mall,1.0,شخص,Person


In [146]:
rent_contracts_satwa['contract_start_date'] = pd.to_datetime(rent_contracts_satwa['contract_start_date'], format="%d-%m-%Y")
rent_contracts_satwa['contract_end_date'] = pd.to_datetime(rent_contracts_satwa['contract_end_date'], format="%d-%m-%Y")

print("Shape of Rent Contracts in Al Satwa:", rent_contracts_satwa.shape)
rent_contracts_satwa[rent_contracts_satwa['contract_start_date'] >= "01-01-2019"]
print("Shape of Rent Contracts in Al Satwa After filtering:", rent_contracts_satwa.shape)

Shape of Rent Contracts in Al Satwa: (24663, 35)
Shape of Rent Contracts in Al Satwa After filtering: (24663, 35)


In [147]:
transactions.head()

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_ar,trans_group_en,procedure_name_ar,procedure_name_en,instance_date,property_type_id,property_type_ar,property_type_en,property_sub_type_id,property_sub_type_ar,property_sub_type_en,property_usage_ar,property_usage_en,reg_type_id,reg_type_ar,reg_type_en,area_id,area_name_ar,area_name_en,building_name_ar,building_name_en,project_number,project_name_ar,project_name_en,master_project_en,master_project_ar,nearest_landmark_ar,nearest_landmark_en,nearest_metro_ar,nearest_metro_en,nearest_mall_ar,nearest_mall_en,rooms_ar,rooms_en,has_parking,procedure_area,actual_worth,meter_sale_price,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3
0,1-11-2018-8205,11,1,مبايعات,Sales,بيع,Sell,13-08-2018,4,فيلا,Villa,,,,أخرى,Other,1,العقارات القائمة,Existing Properties,278,منخول,Mankhool,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,34.41,165000.0,4795.12,,,1.0,2.0,0.0
1,1-11-2016-12930,11,1,مبايعات,Sales,بيع,Sell,02-11-2016,4,فيلا,Villa,,,,سكني,Residential,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,,,0,390.0,2089900.0,5358.72,,,1.0,1.0,0.0
2,1-11-2016-13524,11,1,مبايعات,Sales,بيع,Sell,15-11-2016,4,فيلا,Villa,,,,أخرى,Other,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو أبراج الإمارات,Emirates Towers Metro Station,مول دبي,Dubai Mall,,,0,278.71,2800000.0,10046.28,,,1.0,1.0,0.0
3,2-13-2014-4939,13,2,رهون,Mortgages,تسجيل رهن,Mortgage Registration,23-06-2014,4,فيلا,Villa,,,,تجاري,Commercial,1,العقارات القائمة,Existing Properties,276,البدع,Al Bada,,,,,,,,برج خليفة,Burj Khalifa,محطة مترو المركز التجاري,Trade Centre Metro Station,مول دبي,Dubai Mall,,,0,16952.94,12000000.0,707.84,,,1.0,1.0,0.0
4,1-11-2002-81,11,1,مبايعات,Sales,بيع,Sell,14-01-2002,2,مبنى,Building,,,,تجاري,Commercial,1,العقارات القائمة,Existing Properties,271,الكرامه,Al Karama,,,,,,,,مطار دبي الدولي,Dubai International Airport,محطة مترو بنك أبوظبي التجاري,ADCB Metro Station,مول دبي,Dubai Mall,,,0,232.26,1500000.0,6458.28,,,2.0,1.0,0.0
