## Phase 1: Problem Understanding & Data Exploration

### 1. Introduction
This notebook analyzes real estate rental prices and prepares the dataset for machine learning.

### 2. Goal of the Dataset
The goal of this dataset is to analyze and predict real estate rental prices based on key property attributes. The dataset includes details such as:
- **Property Type (e.g., Studio, Apartment, Villa)**
- **City and Neighborhood**
- **Rental Price per Night**
- **Property Size (sqm)**
- **Ratings and Number of Reviews**

This dataset is useful for:
- **Real estate market analysis**
- **Rental price estimation**
- **Investment decision-making**
- **Understanding factors affecting rental costs**

### 3. Dataset Source
- **Source:** [Kaggle - Real Estate Rental Prices](https://www.kaggle.com/datasets/mouathalmansour/real-estate-rental-prices?resource=download)
- **Description:** This dataset is collected from online rental listings and provides detailed information about properties available for rent.

### 4. General Information
- **Number of Observations (Rows):** At least **a few hundred listings**
- **Number of Features (Columns):** **10**
- **Types of Variables:**
  - **Categorical:** Property type, city, neighborhood
  - **Numerical:** Price per night, land area, number of reviews
  - **Text:** Property name

### Dataset Columns & Data Types:
| Column Name        | Type         | Description |
|-------------------|-------------|-------------|
| التصنيف  | Categorical | Property type (e.g., "Studio", "Apartment") |
| المدينة | Categorical | City where the property is located |
| الحي | Categorical | Neighborhood within the city |
| اسم العقار | Text | Name of the rental property |
| سعر الليلة | Numerical | Rental price per night (in local currency) |
| المساحة | Numerical | Property size in square meters (sqm) |
| عدد المقيمين | Numerical | Maximum number of occupants allowed |
| التقييم | Numerical | Average user rating of the property |
| الرقم | Numerical | Property ID (Index) |


In [2]:
### 5. Dataset Summary
#### Sample Data Preview

import pandas as pd

# Load dataset
df = pd.read_csv("/content/real_estate_rental_prices.csv")  # Adjust path if needed

# Show first 5 rows
df.head()


#### Check Dataset Structure

# Display dataset structure
df.info()

# Check for missing values
df.isnull().sum()


#### Summary Statistics

df.describe()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16913 entries, 0 to 16912
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    16913 non-null  int64  
 1   الرقم         16913 non-null  int64  
 2   التقييم       16913 non-null  float64
 3   عدد المقيمين  16913 non-null  int64  
 4   المساحة       16913 non-null  int64  
 5   اسم العقار    16913 non-null  object 
 6   الحي          16913 non-null  object 
 7   سعر الليلة    16913 non-null  object 
 8   المدينة       16913 non-null  object 
 9   التصنيف       16913 non-null  object 
dtypes: float64(1), int64(4), object(5)
memory usage: 1.3+ MB


Unnamed: 0.1,Unnamed: 0,الرقم,التقييم,عدد المقيمين,المساحة
count,16913.0,16913.0,16913.0,16913.0,16913.0
mean,8456.0,732.85993,7.62601,21.408207,564.237805
std,4882.506887,647.153886,3.579511,35.302708,3880.00216
min,0.0,0.0,0.0,0.0,10.0
25%,4228.0,197.0,7.8,1.0,35.0
50%,8456.0,540.0,9.3,8.0,60.0
75%,12684.0,1090.0,9.9,25.0,300.0
max,16912.0,2387.0,10.0,443.0,110000.0


In [3]:
# Display first 5 rows
df.head()


Unnamed: 0.1,Unnamed: 0,الرقم,التقييم,عدد المقيمين,المساحة,اسم العقار,الحي,سعر الليلة,المدينة,التصنيف
0,0,0,10.0,7,40,استديو بسرير ماستر وجلسة,حي العزيزية,250,العلا,استديو
1,1,1,9.2,6,3000,استديو بسريرين فردية وبأثاث بسيط,العذيب,280,العلا,استديو
2,2,2,10.0,43,1000,شقة بغرفة معيشة وغرفتين نوم,حي العزيزية,400,العلا,شقة
3,3,3,9.4,4,400,استراحة بصالة جلوس وغرفتين نوم,حي المعتدل,799,العلا,استراحة
4,4,4,9.6,29,3000,شقة بغرفة جلوس وغرفة نوم,جنوب المستشفى,550,العلا,شقة
