### Web Scraping 101 

#### Kindly ensure you have the legal rights to scrape and use data from a site before doing so. Propertypro is more flexible about this as seen in the terms and conditions page however Nigeria Property center is not. Check below for more: 

#### https://www.propertypro.ng/terms
#### https://nigeriapropertycentre.com/terms-of-use 

#### Import requests-html for making request to a website and scraping. re for regular expressions

In [29]:
import requests, re
from requests_html import HTMLSession

#### Make a request to the website and extract its content (page source)

In [30]:
r=requests.get("https://www.propertypro.ng/property-for-rent?search=gbagada")
c=r.content

#### Create a Session and make a request to the website and extract its content (page source)

In [31]:
session = HTMLSession()
r = session.get(f'https://www.propertypro.ng/property-for-rent/in/lagos/gbagada')

#### Find all properties on the page

In [32]:
properties = r.html.find('div.single-room-text')
properties

[<Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>]

#### To learn more about HTML tags check the link >>> https://www.w3schools.com/tags/default.asp

###### .

#### Collect property features on the page.For this add the index of the feature at the end of the code e.g. [0] for bed, [2] for bath.

#### This first method is not ideal because the location of the specific feature might change. For example Bed may not come first

In [33]:
properties[2].find('div.fur-areea')[0].text.split()

['3', 'beds', 'baths', 'Toilets']

#### This second method uses regular expressions and is a better way to collect feature information, because it checks for the feature before collecting. If the feature does not exist it will give a none. For more on regular expressions check https://www.w3schools.com/python/python_regex.asp 

In [34]:
re.findall("..bath",properties[2].find('div.fur-areea')[0].text)[0].strip()

's bath'

#### You can change div and class below to search for something else. 

In [35]:
properties[3].find('div.fur-areea')[0].text

'0 beds 0 baths 0 Toilets'

#### Websites typically have a structure which allows for easy automation. For example location and page number can eaasily be changed and the website will respond accordingly. Try changing the location below and page number to surulere and page 2 respectively.

In [36]:
# https://www.propertypro.ng/property-for-rent?search=gbagada&page=1

#### There is a slight challenge from above. You will need to get the total number of pages. This can be calculated using total number of items divided by number of listing on each page. The listing is written within a text so this has to be extracted using regular expressions. 

In [37]:
r.html.find('div.property-sale-number')[0].text

'Result 1 - 20 of 6172\nSort By\nMost Recent\nLowest Price\nHighest Price\nBeds'

In [38]:
re.findall("\d+",r.html.find('div.property-sale-number')[0].text)

['1', '20', '6172']

In [39]:
re.findall("\d+",r.html.find('div.property-sale-number')[0].text)[2]

'6172'

In [40]:
items = int(re.findall("\d+",r.html.find('div.property-sale-number')[0].text)[2])
listings = 20
page_nr = int(items/listings)
page_nr

308

#### The code below is used to extract the details on just one page per location. You can update the code to scrape multiple pages and locations or write your own code. Please as indicated in the instructions, do not scrape multiple locations or pages until off peak hours (>6pm) to avoid overloading the site.


In [46]:
# session = HTMLSession()

In [56]:
l = []
locations = ['gbagada']

for place in locations:
    #base_url="https://www.propertypro.ng/property-for-rent?search="+place+ "&auto=&type=&bedroom=&max_price="
    r = session.get(f'https://www.propertypro.ng/property-for-rent/in/lagos/'+place)
    
    properties = r.html.find('div.single-room-text')

    for i in list(range(0,len(properties))):
        d={}
        d["location"] = place
        try:
            d["specific_location"] = properties[i].find('h4')[0].text
        except (IndexError,TypeError,AttributeError):
            d["location"] = None
        try:
            d["features"]= properties[i].find('div.fur-areea')[0].text 
        except (AttributeError,IndexError) as e:
            d["features"]= None
        try:
            d["bed"]= int(re.findall("..bed",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["bed"]= 0 
        try:
            d["bath"]= int(re.findall("..bath",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["bath"]= 0
        try:
            d["toilet"]= int(re.findall("..Toilet",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["toilet"]= 0
        try:
            d["description"]= properties[i].find('h3.listings-property-title2')[0].text
        except (IndexError,TypeError,AttributeError) as e:
            d["description"]= None

        try:
            d["price"]= properties[i].find('h3.listings-price')[0].text.replace("₦ ","").replace(",","")
        except (IndexError,TypeError,AttributeError) as e:
            d["price"] = None
        l.append(d)

#### Convert output to dataframe

In [57]:
import pandas as pd
ld = pd.DataFrame(l)
ld

Unnamed: 0,location,specific_location,features,bed,bath,toilet,description,price
0,gbagada,Off Jagumolu str. BARIGA Oworonshoki Gbagada L...,1 beds 1 baths 1 Toilets,1,1,1,Tastefully finished executive service miniflat...,800000/year
1,gbagada,Gbagada Lagos,3 beds baths Toilets,3,0,0,Newly Built And Luxury 3 Bedroom Flats In An E...,3000000/year
2,gbagada,Peace estate Gbagada Lagos,0 beds 0 baths 0 Toilets,0,0,0,Room self contain,500000/year
3,gbagada,Phase 2 Gbagada Lagos,4 beds 4 baths 4 Toilets,4,4,4,4BEDROOM TERRACED DUPLEX (SELF COMPOUND),4000000
4,gbagada,Millenuim Ups Gbagada Lagos,3 beds 3 baths 3 Toilets,3,3,3,DECENT 3BEDROOM FLAT,2500000
5,gbagada,Millenuim Ups Gbagada Lagos,3 beds 3 baths 3 Toilets,3,3,3,A NEWLY BUILT & TASTEFULLY FINISHED MODERN 3BE...,2500000
6,gbagada,Medina Gbagada Lagos,2 beds 2 baths 2 Toilets,2,2,2,DECENT & COMPACT 2BEDROOM FLAT FOR RENT!!,1200000
7,gbagada,Millenuim Ups Gbagada Lagos,3 beds 3 baths 3 Toilets,3,3,3,RELATIVELY NEW & TASTEFULLY FINISHED 3BEDROOM ...,3000000
8,gbagada,The Venice Sangotedo Lagos,3 beds 3 baths 4 Toilets,3,3,4,3 Bedroom Terrace + BQ,
9,gbagada,Millenuim Ups Gbagada Lagos,3 beds 3 baths 3 Toilets,3,3,3,A NEWLY BUILT & TASTEFULLY FINISHED 3BEDROOM FLAT,3000000


In [44]:
ld.to_csv("House_prices_gbagada.csv")