## Dehli Indian Housing Exploratory Data Analysis by Dumisani Maxwell Mukuchura

#### Dataset Source: *https://www.kaggle.com/datasets/bhavyadhingra00020/india-rental-house-price*

##### Contact: dumisanimukuchura@gmail.com LinkedIn: https://www.linkedin.com/in/dumisani-maxwell-mukuchura-4859b7170/

##### This Project will explore the Dehli Indian Housing and understand what factors are influential to price

In [1]:
#Import Standard Modules and Libraies to use

import os #For file and directory operations

#Data Libraries

import pandas as pd
import numpy as np

#Visualization Libraries
import matplotlib.pyplot as plt
import plotly.express as px

### Defining file path and import 'Indian_housing_Delhi_data.csv'

In [15]:
#Defining file path and import 'Indian_housing_Delhi_data.csv'
# Get the current working directory
current_dir = os.getcwd()

# Trim to one level above
base_dir = os.path.dirname(current_dir)

# Construct the path to the data folder
data_dir = os.path.join(base_dir, "data")

# Construct the full path to the CSV file
csv_file_path = os.path.join(data_dir, "Indian_housing_Delhi_data.csv")

# Read the CSV file into a DataFrame
housing_data = pd.read_csv(csv_file_path)

#Make a Copy and Maintain the original dataset as is 
housing_df = housing_data.copy()

#Check if import was successful with a head() check
housing_data.head()

Unnamed: 0,house_type,house_size,location,city,latitude,longitude,price,currency,numBathrooms,numBalconies,isNegotiable,priceSqFt,verificationDate,description,SecurityDeposit,Status
0,1 RK Studio Apartment,400 sq ft,Kalkaji,Delhi,28.545561,77.254349,22000,INR,1.0,,,,Posted a day ago,"Fully furnished, loaded with amenities & gadge...",No Deposit,Furnished
1,1 RK Studio Apartment,400 sq ft,Mansarover Garden,Delhi,28.643259,77.132828,20000,INR,1.0,,,,Posted 9 days ago,Here is an excellent 1 BHK Independent Floor a...,No Deposit,Furnished
2,2 BHK Independent Floor,500 sq ft,Uttam Nagar,Delhi,28.618677,77.053352,8500,INR,1.0,,,,Posted 12 days ago,"Zero Brokerage.\n\n2 Room set, Govt bijali Met...",No Deposit,Semi-Furnished
3,3 BHK Independent House,"1,020 sq ft",Model Town,Delhi,28.712898,77.18,48000,INR,3.0,,,,Posted a year ago,Itâs a 3 bhk independent house situated in M...,No Deposit,Furnished
4,2 BHK Apartment,810 sq ft,Sector 13 Rohini,Delhi,28.723539,77.131424,20000,INR,2.0,,,,Posted a year ago,Well designed 2 bhk multistorey apartment is a...,No Deposit,Unfurnished


### Basic Exploration to understand the Dataset

In [16]:
#Data types and non-null values

housing_df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   house_type        5000 non-null   object 
 1   house_size        5000 non-null   object 
 2   location          5000 non-null   object 
 3   city              5000 non-null   object 
 4   latitude          5000 non-null   float64
 5   longitude         5000 non-null   float64
 6   price             5000 non-null   int64  
 7   currency          5000 non-null   object 
 8   numBathrooms      4975 non-null   float64
 9   numBalconies      2737 non-null   float64
 10  isNegotiable      179 non-null    object 
 11  priceSqFt         0 non-null      float64
 12  verificationDate  5000 non-null   object 
 13  description       4715 non-null   object 
 14  SecurityDeposit   5000 non-null   object 
 15  Status            5000 non-null   object 
dtypes: float64(5), int64(1), object(10)
memory

In [10]:
#Dimesnions of the dDataFrame

housing_df.shape

(5000, 16)

In [11]:
#Statistical Summary
housing_df.describe()

Unnamed: 0,latitude,longitude,price,numBathrooms,numBalconies,priceSqFt
count,5000.0,5000.0,5000.0,4975.0,2737.0,0.0
mean,28.578012,77.174499,222173.8,2.918593,1.95433,
std,0.190186,0.115636,273984.3,1.087823,0.547219,
min,20.011379,72.771332,3000.0,1.0,1.0,
25%,28.544489,77.138248,29500.0,2.0,2.0,
50%,28.569295,77.196472,125000.0,3.0,2.0,
75%,28.618687,77.22895,301102.0,4.0,2.0,
max,28.805466,80.361313,3010101.0,10.0,8.0,


In [12]:
#Check for missing  Values 

housing_df.isnull().sum()

house_type             0
house_size             0
location               0
city                   0
latitude               0
longitude              0
price                  0
currency               0
numBathrooms          25
numBalconies        2263
isNegotiable        4821
priceSqFt           5000
verificationDate       0
description          285
SecurityDeposit        0
Status                 0
dtype: int64

In [14]:
#Identifying unique value counts for each column 

housing_df.nunique()

house_type            28
house_size           339
location             288
city                   1
latitude             993
longitude            962
price                709
currency               1
numBathrooms          10
numBalconies           7
isNegotiable           1
priceSqFt              0
verificationDate      52
description         3820
SecurityDeposit      646
Status                 3
dtype: int64

### Thoughts from initial basic exploration of DataFrame 'housing_df'

- Dataframe has 16 columns and 5000 rows, 0 to 4999
- Shape of DataFrame is (5000, 16)
- 5 columns have missing values/ NaN values: 'numBathrooms', 'numBalconies', 'isNegotiable', 'priceSqFt', 'description'