# Web Scraping Nigeria Housing Data

---


Data is not readily available in Nigeria, therefore data professionals in this part of the world have to utilize web scraping to bridge the gap. In this project, we would be collecting data for a house prices project in Nigeria and scraping from Property Pro website  and answer business questions such as :

- Average Price of Homes by Location
- Most Popular Type of Homes

- Property Pro: https://www.propertypro.ng/


Data such as Description of the home/apartment, location and price would be collected from each of the websites and API. Information like are not included because Size of the rooms are not provided in the websites, also from further research size is determined by the owners of the properties. 

The most popular way to web scrap is through BeautifulSoup, a Python Package used for extracting data from HTML and XML contents. I will use this method and introduce extracting data through APIs in this project. 

BeautifulSoup is a Python library for pulling data out of HTML and XML files. Docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/


In [11]:
import pandas as pd 
from bs4 import BeautifulSoup
import re
import requests

## GET URL 

In [12]:
url = 'https://www.propertypro.ng/property-for-rent?'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

response = requests.get(url,headers=headers)
# print(response.status_code)

200


In [13]:
# res = response.content
soup = BeautifulSoup(response.content, "html.parser")

In [17]:
print(soup.title)
print(soup.title.string)


<title>21308+ House, Flats &amp; Office For Rent in  Nigeria. | PropertyPro Nigeria</title>
21308+ House, Flats & Office For Rent in  Nigeria. | PropertyPro Nigeria


In [36]:
listings = []

for i in range(1,10):

    url = f'https://www.propertypro.ng/property-for-rent?page={i}'
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

    response = requests.get(url,headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")


    cards = soup.find_all('div', class_='single-room-sale listings-property')

    for card in cards:
        var = card.find_all('h4')
        description = var[0].get_text()
        location = var[1].get_text()

        # extract price 
        # price and house_type are located in h3 element
        # find_all h3 and assign to var_2
        var_2 =  card.find_all('h3')
        house_type = var_2[0].get_text()
        price = var_2[1].get_text()

        listings.append([description, house_type, location, price])
    

In [38]:
df = pd.DataFrame(listings, columns=['Description', 'House_Type', 'Location', 'Price'])
df.head()

Unnamed: 0,Description,House_Type,Location,Price
0,5 BEDROOM HOUSE FOR RENT,5 Bedroom Semi Detached Duplex,Banana Island Ikoyi Lagos,"₦ 30,000,000/year"
1,4 BEDROOM HOUSE FOR RENT,4 Bedroom Terrace Duplex,Lekki Lagos,"₦ 9,500,000/year"
2,4 BEDROOM HOUSE FOR RENT,Serviced 4 Bedroom Terraced Duplex,Old Ikoyi Lagos,"₦ 20,000,000/year"
3,2 BEDROOM HOUSE FOR RENT,2 Bedroom Apartment,Ikate Lekki Lagos,"₦ 5,000,000/year"
4,COMMERCIAL PROPERTY FOR RENT,1 Room Office Space,"Liberty Road, Ibadan Oyo","₦ 400,000/year"


In [40]:
df.shape

(450, 4)

In [41]:
df.to_csv('House.csv', index=True)

In [42]:
df.to_csv('house_datas.csv', index=False)

In [44]:
df['Description'].unique()

array(['5 BEDROOM HOUSE FOR RENT', '4 BEDROOM HOUSE FOR RENT',
       '2 BEDROOM HOUSE FOR RENT', 'COMMERCIAL PROPERTY FOR RENT',
       '10 BEDROOM HOUSE FOR RENT', '1 BEDROOM HOUSE FOR RENT',
       '3 BEDROOM HOUSE FOR RENT', '2 BEDROOM FLAT / APARTMENT FOR RENT',
       '3 BEDROOM FLAT / APARTMENT FOR RENT',
       '4 BEDROOM FLAT / APARTMENT FOR RENT',
       '1 BEDROOM FLAT / APARTMENT FOR RENT', 'FLAT / APARTMENT FOR RENT',
       'CO WORKING SPACE FOR RENT', '6 BEDROOM HOUSE FOR RENT',
       '7 BEDROOM HOUSE FOR RENT', '8 BEDROOM HOUSE FOR RENT',
       'HOUSE FOR RENT', 'LAND FOR RENT', '9 BEDROOM HOUSE FOR RENT',
       '3 BEDROOM COMMERCIAL PROPERTY FOR RENT',
       '4 BEDROOM COMMERCIAL PROPERTY FOR RENT'], dtype=object)