# Analyzing the Lagos State Real Estate Market I
In this project, we will be analyzing the Lagos real estate market in order to understand the relationship between certain important features and price. Our dataset has been collected from the property website `PropertyPro.ng`.

This notebook will focus on data cleaning. The data cleaning on this dataset is extensive, and conducted with a mind toward preserving data and making as much data as possible, available for analysis.

Let's start by importing the data.

In [1]:
import warnings
warnings.simplefilter(action='ignore')

import numpy as np
import pandas as pd
import re

In [2]:
df = pd.read_csv('lagos_listings.csv')
df.drop(columns=['Unnamed: 0'], inplace=True)
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet
0,Osapa London Lekki Lagos,"4,000,000/year","Updated 16 Feb 2023, Added 04 Jan 2023",FOR RENT: Luxury 4 Bedroom duplex available f...,Newly Built,4 beds4 baths5 Toilets
1,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets
2,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets
3,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets
4,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18338 entries, 0 to 18337
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Location            18338 non-null  object
 1   Price_Period        18338 non-null  object
 2   Date_Added_Updated  18338 non-null  object
 3   Description         18338 non-null  object
 4   Serviced            7308 non-null   object
 5   Bed_Bath_Toilet     16289 non-null  object
dtypes: object(6)
memory usage: 859.7+ KB


### Data Cleaning
There are over 18,000 records in this dataset. 2 features contain null values - Serviced and Bed_Bath_Toilet. We will analyze further to know the reason for the nulls and how to handle them.

Let's start by checking for duplicates.

In [4]:
df.duplicated().sum()

937

The dataset has 937 duplicates. Let's examine further.

In [5]:
df.loc[df.duplicated()].head(20)

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet
66,Thera Peace Zone Estate Sangotedo Ajah Lagos,"2,000,000/year",Added 17 Feb 2023,FOR RENT: Available Now! SARAH'S PARADISE APA...,FurnishedServicedNewly Built,2 beds2 baths3 Toilets
110,Thera Peace Zone Estate Sangotedo Ajah Lagos,"2,000,000/year",Added 17 Feb 2023,FOR RENT: Available Now! SARAH'S PARADISE APA...,FurnishedServicedNewly Built,2 beds2 baths3 Toilets
132,Ikota Lekki Lagos,3000000,"Updated 14 Feb 2023, Added 20 Dec 2022",FOR RENT: Newly built 3 bedroom terrace duple...,ServicedNewly Built,3 beds3 baths4 Toilets
154,Thera Peace Zone Estate Sangotedo Ajah Lagos,"2,000,000/year",Added 17 Feb 2023,FOR RENT: Available Now! SARAH'S PARADISE APA...,FurnishedServicedNewly Built,2 beds2 baths3 Toilets
176,Parkview Estate Ikoyi Lagos,"10,000,000/year","Updated 19 Feb 2023, Added 08 Feb 2023",FOR RENT: FOR RENT: Nicely Finished & Service...,Serviced,3 beds3 baths4 Toilets
198,Osapa London Lekki Lagos,"4,000,000/year","Updated 16 Feb 2023, Added 04 Jan 2023",FOR RENT: Luxury 4 Bedroom duplex available f...,Newly Built,4 beds4 baths5 Toilets
220,Osapa London Lekki Lagos,"4,000,000/year","Updated 16 Feb 2023, Added 04 Jan 2023",FOR RENT: Luxury 4 Bedroom duplex available f...,Newly Built,4 beds4 baths5 Toilets
242,Osapa London Lekki Lagos,"4,000,000/year","Updated 16 Feb 2023, Added 04 Jan 2023",FOR RENT: Luxury 4 Bedroom duplex available f...,Newly Built,4 beds4 baths5 Toilets
264,Osapa London Lekki Lagos,"4,000,000/year","Updated 16 Feb 2023, Added 04 Jan 2023",FOR RENT: Luxury 4 Bedroom duplex available f...,Newly Built,4 beds4 baths5 Toilets
274,Beechwood Estate Ajah Lagos,52000000,"Updated 18 Feb 2023, Added 31 Jan 2023",FOR SALE: ** Distressed Sale *** We are pleas...,ServicedNewly Built,4 beds4 baths5 Toilets


It looks like these are duplicates. Let's drop them.

In [6]:
df.drop_duplicates(keep='last', ignore_index=True, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17401 entries, 0 to 17400
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Location            17401 non-null  object
 1   Price_Period        17401 non-null  object
 2   Date_Added_Updated  17401 non-null  object
 3   Description         17401 non-null  object
 4   Serviced            6699 non-null   object
 5   Bed_Bath_Toilet     15352 non-null  object
dtypes: object(6)
memory usage: 815.8+ KB


#### Rent v. Sale?
Now that we've removed duplicate entries, we can move on to further cleaning.

If we review the Description feature, we'll note that some listings are for sale and some are for rent, some are residential listings and some are commercial listings. These categories are important descriptors in determining price, so we will create a column to identify which variable a record belongs to.

In [7]:
df['Description'].head(12)

0      FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...
1      FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...
2      FOR RENT: A basic 2 bedroom apartment in a lo...
3      FOR RENT: Lovely 2 bedrooms flat in opebi, up...
4      FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...
5      FOR RENT: TO LET: Tastefully Finished 3 bedro...
6      FOR RENT: RENT: Well Finished & Fully Service...
7      FOR RENT: WAREHOUSE FOR LEASE. Direct lease/r...
8      FOR RENT: Urgent lease Urgent lease Standard ...
9      FOR RENT: FOR LEASE ADENIYI JONES Newly renov...
10     FOR RENT: FOR RENT: Well Furnished 4 Bedroom ...
11     FOR RENT: TO LET: Luxury Finished 2 bedroom f...
Name: Description, dtype: object

In [8]:
def listing_type(row):
    if 'FOR RENT' in row:
        return 'Rent'
    elif 'FOR SALE' in row:
        return 'Sale'
    else:
        return 'Unknown'

df['Listing_Type'] = df['Description'].apply(listing_type)
df['Listing_Type'].value_counts()

Rent       17245
Sale         143
Unknown       13
Name: Listing_Type, dtype: int64

From the above, we can see there are 50 `Unknown` Listing Type records. Due to their small number, we can choose to drop them. However, let's check them to determine why.

In [9]:
df.loc[df['Listing_Type'] == 'Unknown']

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type
5304,Mijl Residence & Villas Ilasan Lekki Lagos,"90,000/day","Updated 19 Feb 2023, Added 01 Feb 2023",FOR SHORTLET: Brand new luxurious furnished 4...,Newly BuiltFurnishedServiced,,Unknown
5365,Mijl Residence & Villas Ilasan Lekki Lagos,"80,000/day","Updated 19 Feb 2023, Added 08 Feb 2023",FOR SHORTLET: Brand new luxurious furnished 3...,Newly BuiltFurnishedServiced,,Unknown
5553,Mijl Residence & Villas Ilasan Lekki Lagos,"70,000/day","Updated 19 Feb 2023, Added 08 Feb 2023",FOR SHORTLET: Brand new luxurious furnished 2...,Newly BuiltFurnishedServiced,,Unknown
5574,"First Unity Estate , Badore Badore Ajah Lagos","40,000/day","Updated 19 Feb 2023, Added 18 Jan 2022",FOR SHORTLET: This 3 bedroom apartment is ele...,FurnishedServiced,,Unknown
5700,Spar Road Ikate Lekki Lagos,"75,000/day","Updated 14 Feb 2023, Added 04 May 2022",FOR SHORTLET: 2-Beds | 24/7 Electricity | Ope...,,,Unknown
12378,Mijl Residence & Villas Ilasan Lekki Lagos,"90,000/day","Updated 19 Feb 2023, Added 01 Feb 2023",FOR SHORTLET: Brand new luxurious furnished 4...,Newly BuiltFurnishedServiced,4 beds4 baths5 Toilets,Unknown
13638,Life Camp Abuja,"70,000/day","Updated 19 Feb 2023, Added 14 Dec 2022",FOR SHORTLET: This Clean and secure haven loc...,FurnishedServiced,4 beds4 baths5 Toilets,Unknown
13720,First Unity Estate Badore Ajah Lagos,15000,"Updated 19 Feb 2023, Added 17 Feb 2023",FOR SHORTLET: Experience the ultimate in comf...,FurnishedServiced,1 beds1 baths1 Toilets,Unknown
14161,"First Unity Estate , Badore Badore Ajah Lagos","40,000/day","Updated 19 Feb 2023, Added 18 Jan 2022",FOR SHORTLET: This 3 bedroom apartment is ele...,FurnishedServiced,3 beds2 baths3 Toilets,Unknown
15422,Mijl Residence & Villas Ilasan Lekki Lagos,"80,000/day","Updated 19 Feb 2023, Added 08 Feb 2023",FOR SHORTLET: Brand new luxurious furnished 3...,Newly BuiltFurnishedServiced,3 beds3 baths4 Toilets,Unknown


We see a different type of rental, the daily rental or shortlet. We'll amend our function and attempt to classify listing types again.

In [10]:
def listing_type(row):
    if 'FOR RENT' in row:
        return 'Rent'
    elif 'FOR SALE' in row:
        return 'Sale'
    elif 'FOR SHORTLET' in row:
        return 'Shortlet'
    else:
        return 'Unknown'

df['Listing_Type'] = df['Description'].apply(listing_type)
df['Listing_Type'].value_counts()

Rent        17245
Sale          143
Shortlet       13
Name: Listing_Type, dtype: int64

#### Price and Period
This feature includes price and the period for that price. Given that the listings are a mixture of rentals and sales, cleaning this column will require some care.

The Price and period feature in combined. We know that some properties are for sale, some for rent and rentals may be for different period ranges. We want to get the annual rental price for all rental properties and the sale price for all sale properties.

We're going with annual rent because this how rent is usually paid in Lagos, although we have seen some recent innovation toward monthly rent payments.

In [11]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent


In [12]:
df['Price'] = df['Price_Period'].str.split('/').str[0]
df['Price'] = df['Price'].apply(lambda x: ''.join(map(str, x.split(','))))
df['Price'] = df['Price'].astype(float)
df['Price'].head()

0    4000000.0
1    5500000.0
2    1200000.0
3    1500000.0
4    8000000.0
Name: Price, dtype: float64

In [13]:
df['Price'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 17401 entries, 0 to 17400
Series name: Price
Non-Null Count  Dtype  
--------------  -----  
17401 non-null  float64
dtypes: float64(1)
memory usage: 136.1 KB


In [14]:
df['Period'] = df['Price_Period'].str.split('/').str[-1]
df['Period'].value_counts()

year             16239
sqm                310
month              101
day                 62
3,000,000           32
                 ...  
6,500,000,000        1
1,800                1
560,000,000          1
3,300,000,000        1
450,000,000          1
Name: Period, Length: 150, dtype: int64

The period column presents a unique challenge. Because of the different nature of the listings, period is not provided. We'll need to review further to determine how we will proceed.

In [15]:
df.loc[df['Period'] == 'sqm'].head(10)

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period
187,Idowu Taylor Victoria Island Lagos,550/sqm,"Updated 15 Feb 2023, Added 01 Mar 2022",FOR RENT: Double Glazed Curtain Wall Systems ...,Serviced,0 beds0 baths0 Toilets,Rent,550.0,sqm
206,Akin Adesola Victoria Island Lagos,800/sqm,"Updated 15 Feb 2023, Added 01 Mar 2022",FOR RENT: Raised Floor for underground cablin...,Serviced,beds baths Toilets,Rent,800.0,sqm
320,Adeola Odeku Victoria Island Lagos,500/sqm,"Updated 15 Feb 2023, Added 19 Jan 2023",FOR RENT: Fully serviced premium office space...,,0 beds0 baths0 Toilets,Rent,500.0,sqm
374,Akin Adesola Victoria Island Lagos,"115,000/sqm","Updated 19 Feb 2023, Added 06 Apr 2022",FOR RENT: Exclusive office space is available...,Serviced,0 beds0 baths0 Toilets,Rent,115000.0,sqm
483,Ikeja Lagos,"3,000/sqm","Updated 19 Feb 2023, Added 01 Oct 2022",FOR RENT: Very clean warehouse measuring 6300...,,0 beds0 baths0 Toilets,Rent,3000.0,sqm
491,Acme Road Ogba Lagos,"3,000/sqm","Updated 19 Feb 2023, Added 21 Sep 2022","FOR RENT: This Warehouse is 90,000 sqft on La...",,0 beds0 baths0 Toilets,Rent,3000.0,sqm
714,Lekki Lagos,"185,000/sqm","Updated 18 Feb 2023, Added 26 Sep 2022",FOR RENT: Various Shop spaces are available w...,,0 beds0 baths0 Toilets,Rent,185000.0,sqm
717,Ikate Lekki Lagos,"70,000/sqm","Updated 18 Feb 2023, Added 20 Oct 2022",FOR RENT: A commercial property suitable for ...,,0 beds0 baths0 Toilets,Rent,70000.0,sqm
939,Alaka Iponri Surulere Lagos,"20,000/sqm","Updated 17 Feb 2023, Added 12 Nov 2022",FOR RENT: Nice and well maintained open plan ...,Serviced,0 beds0 baths0 Toilets,Rent,20000.0,sqm
1053,Lekki Phase 1 Lekki Lagos,"100,000/sqm","Updated 16 Feb 2023, Added 16 Feb 2023",FOR RENT: 21sqm and 23sqm shops (UPSTAIRS & D...,,0 beds0 baths0 Toilets,Rent,100000.0,sqm


The sqm filings are commercial listings. We need to check if their sizes are contained in the description column, so we can determine how to use them.

In [16]:
commercial = df.loc[df['Period'] == 'sqm']
sqm_index = list()
for index, row in commercial.iterrows():
    if 'sqm' in row['Description']:
        sqm_index.append(index)
    elif 'sqft' in row['Description']:
        sqm_index.append(index)
    elif 'square' in row['Description']:
        sqm_index.append(index)

len(set(sqm_index))

125

In [17]:
df_sqm_index = list()
for index, row in df.iterrows():
    if 'sqm' in row['Description']:
        df_sqm_index.append(index)
    elif 'sqft' in row['Description']:
        df_sqm_index.append(index)
    elif 'square' in row['Description']:
        df_sqm_index.append(index)

len(set(df_sqm_index))

475

We find an interesting issue. The number of listings qualifying for our condition in the overall dataset is significantly larger than those in the subset of the dataset where period is `sqm`.

We'll review further to determine how to handle this discrepancy

Due to the fact that there are some listings in the overall dataset without a value for `Period` yet are clearly sold either by square foot or meter, we'll be cleaning the overall dataset index instead.

This time, we'll be working in the description column. We'll use regex to extract sqm or sqft substrings in the dataset.

In [18]:
df['Size'] = df['Description'].apply(lambda x: re.findall('\d+ sqm|\d+sqm|\d+ sqft|\d+sqft', x) if 'sq' in x else 'Unknown')
df['Size'].value_counts().head()

Unknown     16910
[]             55
[300sqm]       14
[200sqm]       12
[500sqm]       10
Name: Size, dtype: int64

In [19]:
df['Size'] = df['Size'].apply(lambda x: ''.join(map(str, x)))
df['Size'].value_counts().head()

Unknown    16910
              55
300sqm        14
200sqm        12
500sqm        10
Name: Size, dtype: int64

In [20]:
df.loc[df['Size'] == ''].head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size
21,"Along Agege Motor Way, Between Ladipo Bus Stop...","81,000,000/year","Updated 19 Feb 2023, Added 30 Sep 2022",FOR RENT: Warehouse for lease A shared compou...,,0 beds0 baths0 Toilets,Rent,81000000.0,year,
479,Lekki Scheme 2 Ajah Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR LEASE: Fully fenced with gate c...,,0 beds0 baths0 Toilets,Rent,4000000.0,year,
493,Lekki Scheme 2 Ajah Lagos,"7,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR LEASE: Fully fenced with gate c...,,0 beds0 baths0 Toilets,Rent,7000000.0,year,
937,Oshodi Apapa Express Way Lagos Oshodi Expressw...,"20,000,000/year","Updated 17 Feb 2023, Added 16 Feb 2023",FOR RENT: Nice and well located concrete pave...,,beds baths Toilets,Rent,20000000.0,year,
2029,Ff Millennium Towers Ligali Ayorinde Victoria ...,"110,000,000/year","Updated 14 Feb 2023, Added 30 Aug 2022",FOR RENT: Brokerfield presents this completel...,FurnishedServiced,beds baths10 Toilets,Rent,110000000.0,year,


We see that we need to expand our regex pattern to capture more variations of square meter!

In [21]:
df['Size'] = df['Description'].apply(
    lambda x: re.findall('\d+ sqm|\d+sqm|\d+ sqft|\d+sqft|\d+square|\d+ square|\d+sqr|\d+ sqr|\d+,\d+ sqm|\d+,\d+ square', x) if 'sq' in x else 'Unknown')
df['Size'] = df['Size'].apply(lambda x: ''.join(map(str, x)))
df['Size'].value_counts().head()

Unknown    16910
              31
300sqm        14
200sqm        12
500sqm        10
Name: Size, dtype: int64

There are still some empties!!

In [22]:
df.loc[df['Size'] == ''].head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size
2029,Ff Millennium Towers Ligali Ayorinde Victoria ...,"110,000,000/year","Updated 14 Feb 2023, Added 30 Aug 2022",FOR RENT: Brokerfield presents this completel...,FurnishedServiced,beds baths10 Toilets,Rent,110000000.0,year,
4850,Ikoyi Lagos,"9,000,000/year","Updated 20 Feb 2023, Added 28 Jan 2023",FOR RENT: 3 bedroom Condominium with a bq fr ...,Newly Built,,Rent,9000000.0,year,
5227,Adeniyi Jones Ikeja Lagos,800000,Added 19 Feb 2023,FOR RENT: 45sq open space directly on Adeniyi...,ServicedNewly Built,,Rent,800000.0,800000,
5323,"Off Ayo Afolabi Bus Stop, Aboru Ipaja Lagos","300,000/year","Updated 19 Feb 2023, Added 04 May 2022",FOR RENT: Very spacious land. Good for church...,,,Rent,300000.0,year,
5503,"Budland Street, Akiode Berger Ojodu Lagos","1,500,000/year","Updated 19 Feb 2023, Added 06 Sep 2022",FOR RENT: 75sq meters hall with 4 rooms and a...,,,Rent,1500000.0,year,


In [23]:
df['Size'] = df['Description'].apply(
    lambda x: re.findall('\d+ \w+|\d+\w+|'
                         '\d+,\d+ \w+|\d+,\d+\w+|'
                         '\d+ per sq|\d+,\d+ per sq|'
                         '\d+per sq|\d+,\d+per sq|'
                         '\d+/\w+|\d+,\d+/\w+|'
                         '\d+k per sq', x) if 'sq' in x else 'Unknown')
df['Size'] = df['Size'].apply(lambda x: ''.join(map(str, x)))
df['Size'].value_counts().head()

Unknown    16910
220sqm         6
1000sqm        6
250sqm         5
350sqm         5
Name: Size, dtype: int64

In [24]:
df.loc[df['Size'] == '']

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size
5323,"Off Ayo Afolabi Bus Stop, Aboru Ipaja Lagos","300,000/year","Updated 19 Feb 2023, Added 04 May 2022",FOR RENT: Very spacious land. Good for church...,,,Rent,300000.0,year,
6905,Marshy Hill Estate Badore Ajah Lagos,"1,200,000/year",Added 15 Feb 2023,FOR RENT: Esquisitely finished apartment with...,,2 beds2 baths3 Toilets,Rent,1200000.0,year,
6949,"Walter Carrington Crescent, Victoria Island Vi...","14,000,000/year","Updated 15 Feb 2023, Added 26 Jul 2022",FOR RENT: Unfurnished three bed overlooking t...,Serviced,beds baths Toilets,Rent,14000000.0,year,
17227,Directly On Major Road Admiralty Lekki Phase 1...,"7,000,000/sqm","Updated 18 Jul 2022, Added 23 Jun 2022",FOR RENT: Shop's Space for Rent: we have diff...,,0 beds0 baths9 Toilets,Rent,7000000.0,sqm,


From our analysis, only 1 record (at index 17227) has no square meter. It seems to be a range of commercial properties, sold at NGN7m per sqm. We'll use 0 for this value

In [25]:
df.at[17227, 'Size'] = 0
df['Size'].iloc[17227]

0

#### Square Meter v. Square Feet?
Let's clean the Size column. We need to clean this column because we'll be using it to update our price column where applicable.

In [26]:
df['Size'].value_counts(dropna=False).head()

Unknown    16910
220sqm         6
1000sqm        6
250sqm         5
350sqm         5
Name: Size, dtype: int64

In [27]:
def clean_size(row):
    row = str(row)
    if re.findall(r'(\d+) sq', row):
        result = re.findall(r'(\d+) sq', row)[0]
    elif re.findall(r'(\d+)sq', row):
        result = re.findall(r'(\d+)sq', row)[0]
    else:
        result = row
    return result

df['Size_Num'] = df['Size'].apply(clean_size)

In [28]:
df['Size_Num'].value_counts().head()

Unknown    16910
200           18
500           18
300           15
250           11
Name: Size_Num, dtype: int64

We'll do some manual cleaning of the remainder erroneous values. In cleaning, we'll consider the Price_Period columns and determine whether the record already provides for annual rent. We're cleaning the size column, so we can multiply it by sqm rent in order to get annual rent. If we already have annual rent, this will be a non-issue.

Let's Consider this Size Number - 3 bedroom

In [29]:
size_discrepancies = ['50k30', '70k5k', '15000 per',
                      '110k800 sm', '60k10k', '150k20k', '50k20k10',
                      '8500 per', '75k190', '230 and130', '1 ocean60k',
                      '1 Rent70000 per', '2,325000']

In [30]:
for index, row in df.iterrows():
    for i in size_discrepancies:
        if i in str(row['Size_Num']):
            print(index, row['Price_Period'], row['Description'])

4889 55,000/sqm  FOR RENT: Well renovated and service office space measuring 230 and130 square meters on t.. Read more 
10441 2,352,000/year  FOR RENT: Rent N2,325,000 at 75,000/sqm Caution 300,000 Service charge 837,000 exclusive.. Read more 
10447 2,000,000/year  FOR RENT: Newly Built shop space) office space at Lekki phrase 1 Rent #70,000 per square .. Read more 
12101 60,000/sqm  FOR RENT: Open office space for rent admaralty way lekki 1 ocean view 60k per sqm.. Read more 
15151 50,000/sqm  FOR RENT: Rent Office space Amount 50k per sqm Sc 20k per sqm Agency 10%.. Read more 
15186 50,000/sqm  FOR RENT: For rent Office Space Rent 50k per square meter Service charge: 30% .. Read more 
15225 150,000/sqm  FOR RENT: Rent Shops space Amount 150k per square meter Sc 20k per square mete.. Read more 
15369 60,000/sqm  FOR RENT: For rent Office space Rent 60k per square meter Sc 10k per square me.. Read more 
15464 110,000/sqm  FOR RENT: For rent Office space Rent 110k per sqm Square meter: 

In [31]:
df.at[4889, 'Size_Num'] = 230
df.at[15464, 'Size_Num'] = 800
df.at[15899, 'Size_Num'] = 190

In [32]:
size_0s = ['50k30', '70k5k', '15000 per', '60k10k',
           '150k20k', '50k20k10', '8500 per',
           '1 ocean60k', '1 Rent70000 per', '2,325000']

In [33]:
for index, row in df.iterrows():
    for i in size_discrepancies:
        if i in str(row['Size_Num']):
            df.at[index, 'Size_Num'] = 0

Further cleaning of the size_num column below:

In [34]:
for index, row in df.iterrows():
    if '3 bedroom' in str(row['Size_Num']):
        print(index, row['Price_Period'], row['Description'])

4850 9,000,000/year  FOR RENT: 3 bedroom Condominium with a bq fr rent Tennis and squash court Swimming poo.. Read more 
16091 9,000,000/year  FOR RENT: 3 bedroom Condominium with a bq fr rent Tennis and squash court Swimming poo.. Read more 
16262 9,000,000/year  FOR RENT: 3 bedroom Condominium with a bq fr rent Tennis and squash court Swimming poo.. Read more 


This seems like duplicates. Let's investigate further.

In [35]:
df.iloc[[4850, 16091, 16262]]

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size,Size_Num
4850,Ikoyi Lagos,"9,000,000/year","Updated 20 Feb 2023, Added 28 Jan 2023",FOR RENT: 3 bedroom Condominium with a bq fr ...,Newly Built,,Rent,9000000.0,year,3 bedroom,3 bedroom
16091,Ikoyi Lagos,"9,000,000/year","Updated 30 Nov 2022, Added 23 Oct 2022",FOR RENT: 3 bedroom Condominium with a bq fr ...,,0 beds0 baths0 Toilets,Rent,9000000.0,year,3 bedroom,3 bedroom
16262,Ikoyi Lagos,"9,000,000/year","Updated 30 Nov 2022, Added 11 Nov 2022",FOR RENT: 3 bedroom Condominium with a bq fr ...,,0 beds0 baths0 Toilets,Rent,9000000.0,year,3 bedroom,3 bedroom


It seems like duplicates so we'll drop 2.

In [36]:
df.drop(labels=[16091,16262], axis=0, inplace=True)

We'll update Size and Size_num to 0.

In [37]:
df.at[4850, 'Size'] = 0
df.at[4850, 'Size_Num'] = 0

In [38]:
df.at[4850, 'Size_Num']

0

In [39]:
for index, row in df.iterrows():
    if '340436' in str(row['Size_Num']):
        print(index, row['Price_Period'], row['Description'])

15410 30,000/sqm  FOR RENT: For rent.. 3404.36 sqm Office Building with Warehouse. Price: 30000/sqm .. Read more 


In [40]:
df.at[15410, 'Size_Num'] = 3404.36
df.at[15410, 'Size_Num']

3404.36

In [41]:
for index, row in df.iterrows():
    if '3032800' in str(row['Size_Num']):
        print(index, row['Price_Period'], row['Description'])

12571 1,000,000  FOR RENT: JV (Development Lease) - Ikoyi. We have a land measuring 3032.800 square mete.. Read more 


In [42]:
df.at[12571, 'Size_Num'] = 3032.800
df.at[12571, 'Size_Num']

3032.8

In [43]:
df.at[6245, 'Size_Num'] = 0
df.at[6245, 'Size_Num']

0

In [44]:
df['Size_Num'] = df['Size_Num'].str.strip()

In [46]:
df['Size_Num'] = df['Size_Num'].apply(lambda x: np.nan if x == 'Unknown' or x == '' else float(x))
df['Size_Num'].value_counts().head()

200.0    18
500.0    18
300.0    15
250.0    11
100.0    10
Name: Size_Num, dtype: int64

In [47]:
for index, row in df.iterrows():
    if 'sqft' in row['Description']:
        print(index, row['Price_Period'], row['Description'])

483 3,000/sqm  FOR RENT: Very clean warehouse measuring 6300 sqft for lease in a ikeja for 3000 per sqm.. Read more 
491 3,000/sqm  FOR RENT: This Warehouse is 90,000 sqft on Land size of 6 Acres at ACME ROAD OGBA, PRICE #.. Read more 
4880 15,000,000/year  FOR RENT: Semi direct...Massive Warehouse measuring 9000 sqft good for storage at Ago pala.. Read more 
5288 2,000/sqm  FOR RENT: 60,000 sqft warehouse for lease off Oregun Ikeja road, N2K per sqft.. Read more 
6605 20,000,000/year  FOR RENT: A warehouse of 10000sqft for lease in amuwo.. Read more 
12903 2,000/sqm  FOR RENT: WAREHOUSE......Letting @Amuwo odofin. 18,600sqft 25,000sqft 21,000sqft 8000.. Read more 
13008 2,200,000/sqm  FOR RENT: 2,200 per sqft Warehouse Size ....33,000 sqft.. Read more 
13293 20,000,000/year  FOR RENT: A warehouse of 15000 sqft with a large compound both up and down for lease at am.. Read more 
13326 2,000  FOR RENT: Sizes of 19,000sqft, 8,,[email protected] N2,000 per sqft .. Read more 
15898 3,500/sq

In [48]:
df.at[483, 'Size_Num'] = 585.2892
df.at[491, 'Size_Num'] = 8361.2736
df.at[5288, 'Size_Num'] = 5574.1824
df.at[12903, 'Size_Num'] = 1727.9965
df.at[13008, 'Size_Num'] = 3065.8003
df.at[13326, 'Size_Num'] = 1765.1578
df.at[15898, 'Size_Num'] = 689.9909
df.at[17166, 'Size_Num'] = 241.5479

In [50]:
df['Size_Num'].value_counts(dropna=False).head()

NaN      16930
200.0       18
500.0       18
300.0       15
250.0       11
Name: Size_Num, dtype: int64

In [51]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size,Size_Num
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,Unknown,
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,Unknown,
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,Unknown,
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,Unknown,
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,Unknown,


In [52]:
df.Period.value_counts().head()

year         16237
sqm            310
month          101
day             62
3,000,000       32
Name: Period, dtype: int64

Let's get the annual value for listings in sqm, month and day.

In [53]:
for index, row in df.iterrows():
    if row['Period'] == 'sqm':
        df.at[index, 'Price_1'] = row['Price'] * row['Size_Num']

We'll use Price_1 where calculation has been done to replace the sqm values in the price column.

In [54]:
for index, row in df.iterrows():
    if row['Price'] < row['Price_1']:
        df.at[index, 'Price'] = row['Price_1']

Let's do the same for month and day.

In [55]:
for index, row in df.iterrows():
    if row['Period'] == 'month':
        df.at[index, 'Price'] = row['Price'] * 12

for index, row in df.iterrows():
    if row['Period'] == 'day':
        df.at[index, 'Price'] = row['Price'] * 365

In [56]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Size,Size_Num,Price_1
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,Unknown,,
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,Unknown,,
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,Unknown,,
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,Unknown,,
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,Unknown,,


We'll drop the cleaning columns - Size, Size_Num, Price_1

In [57]:
df.drop(columns=['Size', 'Size_Num', 'Price_1'], inplace=True)
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year


#### Cleaning Beds, Baths and Toilets
We can now move on to cleaning the Bed_Bath_Toilet feature. There are about 2000 null values in this feature. Let's analyze.

In [58]:
df['Bed_Bath_Toilet'].info()

<class 'pandas.core.series.Series'>
Int64Index: 17399 entries, 0 to 17400
Series name: Bed_Bath_Toilet
Non-Null Count  Dtype 
--------------  ----- 
15350 non-null  object
dtypes: object(1)
memory usage: 787.9+ KB


In [59]:
df['Bed_Bath_Toilet'].value_counts(dropna=False).head()

0 beds0 baths0 Toilets    5140
NaN                       2049
3 beds3 baths4 Toilets    1467
4 beds4 baths5 Toilets    1398
2 beds2 baths3 Toilets    1228
Name: Bed_Bath_Toilet, dtype: int64

In [60]:
for index, value in df['Bed_Bath_Toilet'].items():
    if value is np.nan:
        df.at[index, 'Beds'] = value
    else:
        bbt_list = value.split(' ')
        df.at[index, 'Beds'] = bbt_list[0]
        df.at[index, 'Baths'] = bbt_list[1][-1]
        df.at[index, 'Toilets'] = bbt_list[2][-1]

df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,3,3,4
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,4,5,5
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,2,2,2
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,2,1,1
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,5,6,7


#### Location! Location!! Location!!!
Location is definitely an important factor affecting price, especially in the Lagos real estate market. We'll create a City feature in order to determine whether a location is on the Island or the Mainland. This segmentation is also important for price analysis.

In [61]:
df['Location'].head()

0    Bakare Estate Chevron Lekki Lagos
1        Phase 2 Estate, Gbagada Lagos
2                   Oregun Ikeja Lagos
3                    Opebi Ikeja Lagos
4                    Ikota Lekki Lagos
Name: Location, dtype: object

In [62]:
df['City'] = df['Location'].str.split(' ').str[-2]
df['City'] = df['City'].str.strip()

df['City'].value_counts().head()

Lekki    5480
Ajah     1567
Ikoyi    1464
Ojodu    1075
Ikeja    1007
Name: City, dtype: int64

Let's clean the city column to provide clarity on location. For example, if we review the Location feature, we will see that Island in City represents the Victoria Island Area. We'll be updating this value along with Egba and Odofin

In [63]:
df.loc[df['City'] == 'Island'].head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City
29,Victoria Island Lagos,"25,000,000/year","Updated 19 Feb 2023, Added 07 Feb 2023",FOR RENT: LETTING IN VI Brand new tastefully ...,,4 beds0 baths0 Toilets,Rent,25000000.0,year,4,0,0,Island
40,Heart Of Victoria Island Victoria Island Lagos,"15,000,000/year","Updated 18 Feb 2023, Added 16 Feb 2023",FOR RENT: 4 bedroom House for rent Heart Of V...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,15000000.0,year,3,3,4,Island
62,Victoria Island Lagos,"20,000,000/year","Updated 18 Feb 2023, Added 10 Feb 2023",FOR RENT: 4 BEDROOM TERRACE DUPLEX IN VICTORI...,FurnishedNewly Built,4 beds5 baths5 Toilets,Rent,20000000.0,year,4,5,5,Island
81,Victoria Island Lagos,"12,000,000/year","Updated 18 Feb 2023, Added 14 Feb 2023",FOR RENT: A 3 bedroom Apartment For Rent Pric...,ServicedNewly Built,3 beds4 baths4 Toilets,Rent,12000000.0,year,3,4,4,Island
89,Victoria Island Lagos,"15,000,000/year","Updated 18 Feb 2023, Added 07 Feb 2023",FOR RENT: 5 BEDROOM PENTHOUSE IN VICTORIA ISL...,FurnishedNewly Built,5 beds6 baths6 Toilets,Rent,15000000.0,year,5,6,6,Island


In [64]:
for index, value in df['City'].items():
    if value == 'Island':
        df.at[index, 'City'] = 'Victoria Island'
    elif value == 'Egba':
        df.at[index, 'City'] = 'Abule Egba'
    elif value == 'Odofin':
        df.at[index, 'City'] = 'Amuwo Odofin'

df['City'].value_counts().head()

Lekki    5480
Ajah     1567
Ikoyi    1464
Ojodu    1075
Ikeja    1007
Name: City, dtype: int64

Outliers exist. Apo Katampe Ext and Life Camp are not in locations in Lagos. We'll drop those records from the dataset.

In [65]:
df.loc[df['City'] == 'Camp']

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City
5532,Brains And Hammers Estate Life Camp Abuja,150000000,"Updated 19 Feb 2023, Added 15 Dec 2022",FOR SALE: Furnished 4 Bedroom Duplex - 24Hrs ...,ServicedFurnished,,Sale,150000000.0,150000000,,,,Camp
13638,Life Camp Abuja,"70,000/day","Updated 19 Feb 2023, Added 14 Dec 2022",FOR SHORTLET: This Clean and secure haven loc...,FurnishedServiced,4 beds4 baths5 Toilets,Shortlet,25550000.0,day,4.0,4.0,5.0,Camp
16053,Life Camp Abuja,22500000,"Updated 19 Feb 2023, Added 30 Jan 2023",FOR SALE: A fully automated 4 bedroom flat in...,,4 beds5 baths5 Toilets,Sale,22500000.0,22500000,4.0,5.0,5.0,Camp
17235,Brains And Hammers Estate Life Camp Abuja,150000000,"Updated 19 Feb 2023, Added 15 Dec 2022",FOR SALE: Furnished 4 Bedroom Duplex - 24Hrs ...,ServicedFurnished,4 beds4 baths5 Toilets,Sale,150000000.0,150000000,4.0,4.0,5.0,Camp


In [66]:
df.loc[df['City'] == 'Apo']

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City
4081,Apo Zone D Apo Abuja,280000000,"Updated 18 Feb 2023, Added 12 Jun 2022",FOR SALE: For Sale 6 Bedroom Detached Duplex ...,Newly Built,,Sale,280000000.0,280000000,,,,Apo
16398,Apo Zone D Apo Abuja,280000000,"Updated 18 Feb 2023, Added 12 Jun 2022",FOR SALE: For Sale 6 Bedroom Detached Duplex ...,Newly Built,6 beds6 baths7 Toilets,Sale,280000000.0,280000000,6.0,6.0,7.0,Apo


In [67]:
df.loc[df['City'] == 'Ext']

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City
15971,Katampe Ext Abuja,"9,000,000/year","Updated 18 Feb 2023, Added 01 Feb 2023",FOR RENT: Well built world-class smart standa...,FurnishedServicedNewly Built,4 beds5 baths5 Toilets,Rent,9000000.0,year,4,5,5,Ext


In [68]:
apo_index = list(df.loc[df['City'] == 'Camp'].index)
ext_index = list(df.loc[df['City'] == 'Ext'].index)
camp_index = list(df.loc[df['City'] == 'Apo'].index)


In [69]:
df.drop(index=apo_index, inplace=True)
df.drop(index=ext_index, inplace=True)
df.drop(index=camp_index, inplace=True)

Now, let's segment cities/ records into Island and Mainland

In [70]:
df.City.value_counts().head()

Lekki    5480
Ajah     1567
Ikoyi    1464
Ojodu    1075
Ikeja    1007
Name: City, dtype: int64

Our island list comprises cities in Lagos Island Local Government and Eti-Osa Local Government Areas of Lagos State.

In [71]:
island_list = ['Lekki', 'Ajah', 'Ikoyi', 'Victoria Island']

In [72]:
df['Location_Area'] = df['City'].apply(lambda x:'Island' if x in island_list else 'Mainland')
df[['City', 'Location_Area']].head()

Unnamed: 0,City,Location_Area
0,Lekki,Island
1,Gbagada,Mainland
2,Ikeja,Mainland
3,Ikeja,Mainland
4,Lekki,Island


In [73]:
df['Location_Area'].value_counts(normalize=True)

Island      0.540133
Mainland    0.459867
Name: Location_Area, dtype: float64

The dataset seems well split between Island and Mainland. This is great.

In [74]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City,Location_Area
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,3,3,4,Lekki,Island
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,4,5,5,Gbagada,Mainland
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,2,2,2,Ikeja,Mainland
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,2,1,1,Ikeja,Mainland
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,5,6,7,Lekki,Island


Let's clean the Serviced column. From our analysis below, it seems we have 3 options from the Serviced Column - Newly Built, Serviced, and/or Furnished. We'll provide 1/0 column values for these options.

In [75]:
df['Serviced'].value_counts(dropna=False)

NaN                             10699
Newly Built                      2367
Serviced                         1727
ServicedNewly Built              1256
Furnished                         550
FurnishedServicedNewly Built      280
FurnishedNewly Built              252
FurnishedServiced                 249
Newly BuiltFurnishedServiced        6
ServicedFurnished                   3
Newly BuiltFurnished                2
ServicedNewly BuiltFurnished        1
Name: Serviced, dtype: int64

In [76]:
df['Newly Built'] = df['Serviced'].apply(lambda x:1 if 'Newly Built' in str(x) else 0)
df['Furnished'] = df['Serviced'].apply(lambda x:1 if 'Furnished' in str(x) else 0)
df['Serviced_1'] = df['Serviced'].apply(lambda x:1 if 'Serviced' in str(x) else 0)

In [77]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City,Location_Area,Newly Built,Furnished,Serviced_1
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,3,3,4,Lekki,Island,1,0,1
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,4,5,5,Gbagada,Mainland,0,0,0
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,2,2,2,Ikeja,Mainland,0,0,0
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,2,1,1,Ikeja,Mainland,1,0,0
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,5,6,7,Lekki,Island,1,0,0


#### What's Up with these Dates?
We've come to the final cleaning column - Dates!
We'll split this column into 2 - Added and Updated.

In [78]:
df['Date_Added'] = df['Date_Added_Updated'].str.split(',').str[-1]
df['Date_Updated'] = df['Date_Added_Updated'].str.split(',').str[0]

df[['Date_Added', 'Date_Updated']].head()

Unnamed: 0,Date_Added,Date_Updated
0,Added 14 Feb 2023,Updated 19 Feb 2023
1,Added 15 Feb 2023,Updated 19 Feb 2023
2,Added 04 Jan 2023,Updated 19 Feb 2023
3,Added 09 Dec 2022,Updated 19 Feb 2023
4,Added 09 Feb 2023,Updated 19 Feb 2023


In [79]:
df['Date_Added'] = df['Date_Added'].str.strip().str[6:]
df['Date_Added'].head()

0    14 Feb 2023
1    15 Feb 2023
2    04 Jan 2023
3    09 Dec 2022
4    09 Feb 2023
Name: Date_Added, dtype: object

In [80]:
# Done using reverse indexing to avoid losses where records dont have explicit update dates
df['Date_Updated'] = df['Date_Updated'].str.strip().str[-11:]
df['Date_Updated'].head()

0    19 Feb 2023
1    19 Feb 2023
2    19 Feb 2023
3    19 Feb 2023
4    19 Feb 2023
Name: Date_Updated, dtype: object

In [81]:
df['Date_Added'] = pd.to_datetime(df['Date_Added'])
df['Date_Updated'] = pd.to_datetime(df['Date_Updated'])

df[['Date_Added', 'Date_Updated']].head()

Unnamed: 0,Date_Added,Date_Updated
0,2023-02-14,2023-02-19
1,2023-02-15,2023-02-19
2,2023-01-04,2023-02-19
3,2022-12-09,2023-02-19
4,2023-02-09,2023-02-19


#### Cleaning Complete!
Let's drop the irrelevant columns and export the cleaned dataframe into local device

In [82]:
df.head()

Unnamed: 0,Location,Price_Period,Date_Added_Updated,Description,Serviced,Bed_Bath_Toilet,Listing_Type,Price,Period,Beds,Baths,Toilets,City,Location_Area,Newly Built,Furnished,Serviced_1,Date_Added,Date_Updated
0,Bakare Estate Chevron Lekki Lagos,"4,000,000/year","Updated 19 Feb 2023, Added 14 Feb 2023",FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,ServicedNewly Built,3 beds3 baths4 Toilets,Rent,4000000.0,year,3,3,4,Lekki,Island,1,0,1,2023-02-14,2023-02-19
1,"Phase 2 Estate, Gbagada Lagos","5,500,000/year","Updated 19 Feb 2023, Added 15 Feb 2023",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,,4 beds5 baths5 Toilets,Rent,5500000.0,year,4,5,5,Gbagada,Mainland,0,0,0,2023-02-15,2023-02-19
2,Oregun Ikeja Lagos,"1,200,000/year","Updated 19 Feb 2023, Added 04 Jan 2023",FOR RENT: A basic 2 bedroom apartment in a lo...,,2 beds2 baths2 Toilets,Rent,1200000.0,year,2,2,2,Ikeja,Mainland,0,0,0,2023-01-04,2023-02-19
3,Opebi Ikeja Lagos,"1,500,000/year","Updated 19 Feb 2023, Added 09 Dec 2022","FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Newly Built,2 beds1 baths1 Toilets,Rent,1500000.0,year,2,1,1,Ikeja,Mainland,1,0,0,2022-12-09,2023-02-19
4,Ikota Lekki Lagos,"8,000,000/year","Updated 19 Feb 2023, Added 09 Feb 2023",FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Newly Built,5 beds6 baths7 Toilets,Rent,8000000.0,year,5,6,7,Lekki,Island,1,0,0,2023-02-09,2023-02-19


In [83]:
df.drop(labels=['Price_Period', 'Date_Added_Updated', 'Serviced',
                'Bed_Bath_Toilet', 'Period'], axis=1, inplace=True)
df.head()

Unnamed: 0,Location,Description,Listing_Type,Price,Beds,Baths,Toilets,City,Location_Area,Newly Built,Furnished,Serviced_1,Date_Added,Date_Updated
0,Bakare Estate Chevron Lekki Lagos,FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,Rent,4000000.0,3,3,4,Lekki,Island,1,0,1,2023-02-14,2023-02-19
1,"Phase 2 Estate, Gbagada Lagos",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,Rent,5500000.0,4,5,5,Gbagada,Mainland,0,0,0,2023-02-15,2023-02-19
2,Oregun Ikeja Lagos,FOR RENT: A basic 2 bedroom apartment in a lo...,Rent,1200000.0,2,2,2,Ikeja,Mainland,0,0,0,2023-01-04,2023-02-19
3,Opebi Ikeja Lagos,"FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Rent,1500000.0,2,1,1,Ikeja,Mainland,1,0,0,2022-12-09,2023-02-19
4,Ikota Lekki Lagos,FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Rent,8000000.0,5,6,7,Lekki,Island,1,0,0,2023-02-09,2023-02-19


In [84]:
df.rename(columns={'Serviced_1': 'Serviced'}, inplace=True)
df.head()

Unnamed: 0,Location,Description,Listing_Type,Price,Beds,Baths,Toilets,City,Location_Area,Newly Built,Furnished,Serviced,Date_Added,Date_Updated
0,Bakare Estate Chevron Lekki Lagos,FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,Rent,4000000.0,3,3,4,Lekki,Island,1,0,1,2023-02-14,2023-02-19
1,"Phase 2 Estate, Gbagada Lagos",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,Rent,5500000.0,4,5,5,Gbagada,Mainland,0,0,0,2023-02-15,2023-02-19
2,Oregun Ikeja Lagos,FOR RENT: A basic 2 bedroom apartment in a lo...,Rent,1200000.0,2,2,2,Ikeja,Mainland,0,0,0,2023-01-04,2023-02-19
3,Opebi Ikeja Lagos,"FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Rent,1500000.0,2,1,1,Ikeja,Mainland,1,0,0,2022-12-09,2023-02-19
4,Ikota Lekki Lagos,FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Rent,8000000.0,5,6,7,Lekki,Island,1,0,0,2023-02-09,2023-02-19


In [85]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17392 entries, 0 to 17400
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Location       17392 non-null  object        
 1   Description    17392 non-null  object        
 2   Listing_Type   17392 non-null  object        
 3   Price          17392 non-null  float64       
 4   Beds           15345 non-null  object        
 5   Baths          15345 non-null  object        
 6   Toilets        15345 non-null  object        
 7   City           17392 non-null  object        
 8   Location_Area  17392 non-null  object        
 9   Newly Built    17392 non-null  int64         
 10  Furnished      17392 non-null  int64         
 11  Serviced       17392 non-null  int64         
 12  Date_Added     17392 non-null  datetime64[ns]
 13  Date_Updated   17392 non-null  datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int64(3), object(8)
memory usage: 2.

We'll convert some object columns into the appropriate float datatype.

In [86]:
categorical_variables = ['Beds', 'Baths', 'Toilets', 'Newly Built', 'Furnished', 'Serviced']

In [87]:
for i in categorical_variables:
    for index, value in df[i].items():
        if value == '':
            df.at[index, i] = np.nan
        if value == 's':
            df.at[index, i] = np.nan

In [88]:
df[categorical_variables] = df[categorical_variables].astype(float)

In [89]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17392 entries, 0 to 17400
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Location       17392 non-null  object        
 1   Description    17392 non-null  object        
 2   Listing_Type   17392 non-null  object        
 3   Price          17392 non-null  float64       
 4   Beds           14755 non-null  float64       
 5   Baths          14738 non-null  float64       
 6   Toilets        14793 non-null  float64       
 7   City           17392 non-null  object        
 8   Location_Area  17392 non-null  object        
 9   Newly Built    17392 non-null  float64       
 10  Furnished      17392 non-null  float64       
 11  Serviced       17392 non-null  float64       
 12  Date_Added     17392 non-null  datetime64[ns]
 13  Date_Updated   17392 non-null  datetime64[ns]
dtypes: datetime64[ns](2), float64(7), object(5)
memory usage: 2.5+ MB


In [90]:
df.head()

Unnamed: 0,Location,Description,Listing_Type,Price,Beds,Baths,Toilets,City,Location_Area,Newly Built,Furnished,Serviced,Date_Added,Date_Updated
0,Bakare Estate Chevron Lekki Lagos,FOR RENT: BRAND NEW WELL SPACED DUPLEX IN A S...,Rent,4000000.0,3.0,3.0,4.0,Lekki,Island,1.0,0.0,1.0,2023-02-14,2023-02-19
1,"Phase 2 Estate, Gbagada Lagos",FOR RENT: RELATIVELY NEW 4 BEDROOM DUPLEX!!! ...,Rent,5500000.0,4.0,5.0,5.0,Gbagada,Mainland,0.0,0.0,0.0,2023-02-15,2023-02-19
2,Oregun Ikeja Lagos,FOR RENT: A basic 2 bedroom apartment in a lo...,Rent,1200000.0,2.0,2.0,2.0,Ikeja,Mainland,0.0,0.0,0.0,2023-01-04,2023-02-19
3,Opebi Ikeja Lagos,"FOR RENT: Lovely 2 bedrooms flat in opebi, up...",Rent,1500000.0,2.0,1.0,1.0,Ikeja,Mainland,1.0,0.0,0.0,2022-12-09,2023-02-19
4,Ikota Lekki Lagos,FOR RENT: FOR RENT: Luxury 5 Bedroom Detached...,Rent,8000000.0,5.0,6.0,7.0,Lekki,Island,1.0,0.0,0.0,2023-02-09,2023-02-19


In [91]:
df.to_csv('lag_listings_clean.csv', index=False)