## NPC Real Estate Data Exploration
### By Okechukwu Victory

### Introduction
> Nigeria Property Centre (NPC) is a real estate and property website in Nigeria with property listings for sale, rent and lease. They offer Nigerian property seekers an easy way to find details of property like homes, houses, lands, shops, office spaces and other commercial properties to buy or rent. They provide a platform for advertising property from organisations and Nigerian private property owners.

> The dataset consists of houses for sale gotten through web scrapping of the Nigeria Property Centre website. The dataset consists 5628 house listings and 4 variables namely Title, Date, Price and Location

In [1]:
# Import the necessary libraries
import numpy as np
import pandas as pd
import seaborn as sb
import datetime
import matplotlib.pyplot as plt

In [2]:
# Loading the dataset
df = pd.read_csv('NPC_data.csv')

In [3]:
# Creating a copy for data cleaning
df_clean = df.copy()

### Data cleaning
1. Dropping duplicated/null values
2. Removing unwanted characters in price column and converting it to int variable
3. Extracting major districts of abuja from location
4. cleaning date up, removing today and yesterday, and converting it to datetime variable
5. Extracting number of bedrooms from title
6. Extracting property type from title

In [4]:
df_clean.head()

Unnamed: 0,Title,Date,Price,Address
0,4 bedroom terraced duplex for sale,21 Aug 2021,"₦42,500,000",Get Your Keys With Just 50% Initial Deposit/m...
1,6 bedroom house for sale,01 May 2022,"₦400,000,000","Apo, Abuja"
2,Block of flats for sale,12 Sep 2022,"₦190,000,000","Behind Sandralina Hotel, Jabi, Abuja"
3,3 bedroom terraced duplex for sale,Today,"₦56,000,000","Dutse Junction Off Gwarinpa Express, Gwarinpa..."
4,3 bedroom house for sale,Today,"₦75,000,000","Asokoro District, Abuja"


#### Dropping duplicated values

In [5]:
# Checking the shape
df_clean.shape

(5365, 4)

In [6]:
# Dropping of duplicated
df_clean = df_clean.drop_duplicates(ignore_index=True)
df_clean.shape

(5020, 4)

In [7]:
# Checking for null values
df_clean.notnull().sum()

Title      5020
Date       5020
Price      5020
Address    5020
dtype: int64

#### Removing unwanted characters in price column and converting it to int variable

In [8]:
# Checking the dataframe
df_clean.head()

Unnamed: 0,Title,Date,Price,Address
0,4 bedroom terraced duplex for sale,21 Aug 2021,"₦42,500,000",Get Your Keys With Just 50% Initial Deposit/m...
1,6 bedroom house for sale,01 May 2022,"₦400,000,000","Apo, Abuja"
2,Block of flats for sale,12 Sep 2022,"₦190,000,000","Behind Sandralina Hotel, Jabi, Abuja"
3,3 bedroom terraced duplex for sale,Today,"₦56,000,000","Dutse Junction Off Gwarinpa Express, Gwarinpa..."
4,3 bedroom house for sale,Today,"₦75,000,000","Asokoro District, Abuja"


In [9]:
# Checking the Price variable
df_clean[df_clean['Price'].str.contains('annum')]

Unnamed: 0,Title,Date,Price,Address
1711,4 bedroom detached duplex for sale,22 Sep 2022,"₦53,000,000 per annum","Sahara Estate, Lokogoma District, Abuja"


In [10]:
# Cleaning the per annum
df_clean['Price'] = df_clean['Price'].replace(' per annum','',regex=True)
df_clean[df_clean['Price'].str.contains('annum')]

Unnamed: 0,Title,Date,Price,Address


In [11]:
# Checking the Price variable
df_clean[df_clean['Price'].str.contains('approx')].head()

Unnamed: 0,Title,Date,Price,Address
572,4 bedroom detached duplex for sale,12 Sep 2022,"$1,100,000 approx. ₦789,508,308","Asokoro District, Abuja"
591,5 bedroom detached duplex for sale,20 Aug 2022,"$1,000,000 approx. ₦717,734,826","Wuye, Abuja"
697,8 bedroom detached duplex for sale,12 Sep 2022,"$3,700,000 approx. ₦2,655,618,856","Guzape District, Abuja"
769,8 bedroom detached duplex for sale,17 Jul 2022,"$4,700,000 approx. ₦3,373,353,681","Asokoro District, Abuja"
798,8 bedroom detached duplex for sale,06 Sep 2022,"$3,700,000 approx. ₦2,655,618,856","Guzape District, Abuja"


In [12]:
# Reverse spliting the Price column to retain only the price with Naira
df_clean['Price'] =df_clean['Price'].apply(lambda x:pd.Series(x.split()[::-1]))[0]

In [13]:
# Checking the price variable
df_clean[df_clean['Price'].str.contains('approx')].head()

Unnamed: 0,Title,Date,Price,Address


In [14]:
# Stripping all characters on the price columns except numbers
df_clean['Price'] = df_clean['Price'].replace('[^0-9]','',regex=True)
df_clean.head()

Unnamed: 0,Title,Date,Price,Address
0,4 bedroom terraced duplex for sale,21 Aug 2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...
1,6 bedroom house for sale,01 May 2022,400000000,"Apo, Abuja"
2,Block of flats for sale,12 Sep 2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja"
3,3 bedroom terraced duplex for sale,Today,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa..."
4,3 bedroom house for sale,Today,75000000,"Asokoro District, Abuja"


In [15]:
# Checking data type of price
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5020 entries, 0 to 5019
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Title    5020 non-null   object
 1   Date     5020 non-null   object
 2   Price    5020 non-null   object
 3   Address  5020 non-null   object
dtypes: object(4)
memory usage: 157.0+ KB


In [16]:
#Converting price data type to integer
df_clean['Price'] = df_clean['Price'].astype('int64')
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5020 entries, 0 to 5019
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Title    5020 non-null   object
 1   Date     5020 non-null   object
 2   Price    5020 non-null   int64 
 3   Address  5020 non-null   object
dtypes: int64(1), object(3)
memory usage: 157.0+ KB


#### Extracting major districts of abuja from location

In [17]:
# Checking data frame
df_clean.head()

Unnamed: 0,Title,Date,Price,Address
0,4 bedroom terraced duplex for sale,21 Aug 2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...
1,6 bedroom house for sale,01 May 2022,400000000,"Apo, Abuja"
2,Block of flats for sale,12 Sep 2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja"
3,3 bedroom terraced duplex for sale,Today,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa..."
4,3 bedroom house for sale,Today,75000000,"Asokoro District, Abuja"


In [18]:
# Reverse spliting the Address column to retain only the area in Abuja
df_clean['Location'] = df_clean['Address'].apply(lambda x:pd.Series(x.split(',')[::-1]))[1]
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location
0,4 bedroom terraced duplex for sale,21 Aug 2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura
1,6 bedroom house for sale,01 May 2022,400000000,"Apo, Abuja",Apo
2,Block of flats for sale,12 Sep 2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi
3,3 bedroom terraced duplex for sale,Today,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa
4,3 bedroom house for sale,Today,75000000,"Asokoro District, Abuja",Asokoro District


#### cleaning date up, removing today and yesterday, and converting it to datetime variable

In [19]:
# Checking dataframe
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location
0,4 bedroom terraced duplex for sale,21 Aug 2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura
1,6 bedroom house for sale,01 May 2022,400000000,"Apo, Abuja",Apo
2,Block of flats for sale,12 Sep 2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi
3,3 bedroom terraced duplex for sale,Today,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa
4,3 bedroom house for sale,Today,75000000,"Asokoro District, Abuja",Asokoro District


In [20]:
# Stripping the blank spaces between date
df_clean['Date'] = df_clean['Date'].replace(' ','',regex=True)
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location
0,4 bedroom terraced duplex for sale,21Aug2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura
1,6 bedroom house for sale,01May2022,400000000,"Apo, Abuja",Apo
2,Block of flats for sale,12Sep2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi
3,3 bedroom terraced duplex for sale,Today,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa
4,3 bedroom house for sale,Today,75000000,"Asokoro District, Abuja",Asokoro District


In [21]:
# Converting Today and Yesterday in date column to dates
TodayDate = datetime.date.today().strftime('%d%b%Y')
YesterdayDate = (datetime.date.today() - datetime.timedelta(days=1)).strftime('%d%b%Y')
df_clean['Date'] = df_clean['Date'].replace('Today',TodayDate,regex=True)
df_clean['Date'] = df_clean['Date'].replace('Yesterday',YesterdayDate,regex=True)
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location
0,4 bedroom terraced duplex for sale,21Aug2021,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura
1,6 bedroom house for sale,01May2022,400000000,"Apo, Abuja",Apo
2,Block of flats for sale,12Sep2022,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi
3,3 bedroom terraced duplex for sale,02Oct2022,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa
4,3 bedroom house for sale,02Oct2022,75000000,"Asokoro District, Abuja",Asokoro District


In [22]:
# Converting date variable to python standard date format
df_clean['Date'] = pd.to_datetime(df_clean['Date'],format='%d%b%Y')
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location
0,4 bedroom terraced duplex for sale,2021-08-21,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura
1,6 bedroom house for sale,2022-05-01,400000000,"Apo, Abuja",Apo
2,Block of flats for sale,2022-09-12,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi
3,3 bedroom terraced duplex for sale,2022-10-02,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa
4,3 bedroom house for sale,2022-10-02,75000000,"Asokoro District, Abuja",Asokoro District


#### Extracting number of bedrooms from title

In [23]:
# Creating a column for number of bedroom
df_clean['Bedroom'] = df_clean['Title'].str.split(' ', expand=True)[0]
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location,Bedroom
0,4 bedroom terraced duplex for sale,2021-08-21,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura,4
1,6 bedroom house for sale,2022-05-01,400000000,"Apo, Abuja",Apo,6
2,Block of flats for sale,2022-09-12,190000000,"Behind Sandralina Hotel, Jabi, Abuja",Jabi,Block
3,3 bedroom terraced duplex for sale,2022-10-02,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa,3
4,3 bedroom house for sale,2022-10-02,75000000,"Asokoro District, Abuja",Asokoro District,3


In [24]:
# Checking the value counts of the bedroom variable
df_clean['Bedroom'].value_counts()

4                2130
5                1181
3                 667
6                 368
2                 173
7                 163
8                 118
9                  66
Block              36
10                 32
12                 15
1                  14
24                  5
15                  5
16                  5
40                  4
11                  4
14                  4
20                  3
18                  3
17                  2
35                  2
36                  2
27                  2
60                  2
50                  1
13                  1
31                  1
Detached            1
62                  1
19                  1
300                 1
25                  1
Semi-detached       1
100                 1
30                  1
32                  1
83                  1
56                  1
Name: Bedroom, dtype: int64

In [25]:
# Limiting the number of houses to those with 1 -10 bedrooms
no_of_bedrooms = ['1','2','3','4','5','6','7','8','9','10']
df_clean = df_clean[df_clean['Bedroom'].isin(no_of_bedrooms)]
df_clean = df_clean.reset_index(drop=True)
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location,Bedroom
0,4 bedroom terraced duplex for sale,2021-08-21,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura,4
1,6 bedroom house for sale,2022-05-01,400000000,"Apo, Abuja",Apo,6
2,3 bedroom terraced duplex for sale,2022-10-02,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa,3
3,3 bedroom house for sale,2022-10-02,75000000,"Asokoro District, Abuja",Asokoro District,3
4,4 bedroom semi-detached duplex for sale,2022-08-08,75000000,"News Engineering, Dawaki, Gwarinpa, Abuja",Gwarinpa,4


In [26]:
df_clean['Bedroom'].value_counts()

4     2130
5     1181
3      667
6      368
2      173
7      163
8      118
9       66
10      32
1       14
Name: Bedroom, dtype: int64

#### Extracting property type from title

In [28]:
# Checking dataframe
df_clean.head()

Unnamed: 0,Title,Date,Price,Address,Location,Bedroom
0,4 bedroom terraced duplex for sale,2021-08-21,42500000,Get Your Keys With Just 50% Initial Deposit/m...,Kaura,4
1,6 bedroom house for sale,2022-05-01,400000000,"Apo, Abuja",Apo,6
2,3 bedroom terraced duplex for sale,2022-10-02,56000000,"Dutse Junction Off Gwarinpa Express, Gwarinpa...",Gwarinpa,3
3,3 bedroom house for sale,2022-10-02,75000000,"Asokoro District, Abuja",Asokoro District,3
4,4 bedroom semi-detached duplex for sale,2022-08-08,75000000,"News Engineering, Dawaki, Gwarinpa, Abuja",Gwarinpa,4
