### Web Scraping 101 

#### Kindly ensure you have the legal rights to scrape and use data from a site before doing so. Propertypro is more flexible about this as seen in the terms and conditions page however Nigeria Property center is not. Check below for more: 

#### https://www.propertypro.ng/terms
#### https://nigeriapropertycentre.com/terms-of-use 

#### Import requests-html for making request to a website and scraping. re for regular expressions

In [1]:
import requests, re
from requests_html import HTMLSession



#### Make a request to the website and extract its content (page source)

In [2]:
r=requests.get("https://www.propertypro.ng/property-for-rent?search=gbagada")
c=r.content

#### Create a Session and make a request to the website and extract its content (page source)

In [3]:
session = HTMLSession()
r = session.get(f'https://www.propertypro.ng/property-for-rent/in/lagos/gbagada')

#### Find all properties on the page

In [4]:
properties = r.html.find('div.single-room-text')
properties

[<Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>,
 <Element 'div' class=('single-room-text',)>]

#### To learn more about HTML tags check the link >>> https://www.w3schools.com/tags/default.asp

###### .

#### Collect property features on the page.For this add the index of the feature at the end of the code e.g. [0] for bed, [2] for bath.

#### This first method is not ideal because the location of the specific feature might change. For example Bed may not come first

In [18]:
properties[2].find('div.fur-areea')[0].text.split()

['4', 'beds', '4', 'baths', '4', 'Toilets']

#### This second method uses regular expressions and is a better way to collect feature information, because it checks for the feature before collecting. If the feature does not exist it will give a none. For more on regular expressions check https://www.w3schools.com/python/python_regex.asp 

In [19]:
re.findall("..bath",properties[2].find('div.fur-areea')[0].text)[0].strip()

'4 bath'

#### You can change div and class below to search for something else. 

In [20]:
properties[3].find('div.fur-areea')[0].text

'1 beds 1 baths 2 Toilets'

#### Websites typically have a structure which allows for easy automation. For example location and page number can eaasily be changed and the website will respond accordingly. Try changing the location below and page number to surulere and page 2 respectively.

In [None]:
https://www.propertypro.ng/property-for-rent?search=gbagada&page=1

#### There is a slight challenge from above. You will need to get the total number of pages. This can be calculated using total number of items divided by number of listing on each page. The listing is written within a text so this has to be extracted using regular expressions. 

In [10]:
r.html.find('div.property-sale-number')[0].text

'Result 1 - 20 of 5526\nSort By\nMost Recent\nLowest Price\nHighest Price\nBeds'

In [11]:
re.findall("\d+",r.html.find('div.property-sale-number')[0].text)

['1', '20', '5526']

In [12]:
re.findall("\d+",r.html.find('div.property-sale-number')[0].text)[2]

'5526'

In [13]:
items = int(re.findall("\d+",r.html.find('div.property-sale-number')[0].text)[2])
listings = 20
page_nr = int(items/listings)
page_nr

276

#### The code below is used to extract the details on just one page per location. You can update the code to scrape multiple pages and locations or write your own code. Please as indicated in the instructions, do not scrape multiple locations or pages until off peak hours (>6pm) to avoid overloading the site.


In [None]:
#session = HTMLSession()

In [127]:
l = []
locations = ['gbagada']

for place in locations:
    #base_url="https://www.propertypro.ng/property-for-rent?search="+place+ "&auto=&type=&bedroom=&max_price="
    r = session.get(f'https://www.propertypro.ng/property-for-rent/in/lagos/'+place)
    
    properties = r.html.find('div.single-room-text')

    for i in list(range(0,len(properties))):
        d={}
        d["location"] = place
        try:
            d["specific_location"] = properties[i].find('h4')[0].text
        except (IndexError,TypeError,AttributeError):
            d["location"] = None
        try:
            d["features"]= properties[i].find('div.fur-areea')[0].text 
        except (AttributeError,IndexError) as e:
            d["features"]= None
        try:
            d["bed"]= int(re.findall("..bed",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["bed"]= 0 
        try:
            d["bath"]= int(re.findall("..bath",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["bath"]= 0
        try:
            d["toilet"]= int(re.findall("..Toilet",properties[i].find('div.fur-areea')[0].text)[0].strip()[0][0])
        except (IndexError,TypeError,AttributeError,ValueError) as e:
            d["toilet"]= 0
        try:
            d["description"]= properties[i].find('h3.listings-property-title2')[0].text
        except (IndexError,TypeError,AttributeError) as e:
            d["description"]= None

        try:
            d["price"]= properties[i].find('h3.listings-price')[0].text.replace("₦ ","").replace(",","")
        except (IndexError,TypeError,AttributeError) as e:
            d["price"] = None
        l.append(d)

#### Convert output to dataframe

In [128]:
import pandas as pd
ld = pd.DataFrame(l)
ld

Unnamed: 0,bath,bed,description,features,location,price,specific_location,toilet
0,4,4,Tastefully Finished 4Bedroom Terraced Duplex i...,4 beds 4 baths 4 Toilets,gbagada,65000000.0,Phase 1Phase 1 Gbagada Lagos,4
1,3,3,Newly renovated 3bedroom flat,3 beds 3 baths 4 Toilets,gbagada,3000000.0,Phase 1 Gbagada Lagos,4
2,0,4,Tastefully finished and serviced 4bedroom Terr...,4 beds baths Toilets,gbagada,2800000.0,Gbagada Lagos,0
3,0,0,Spacious miniflat,beds baths Toilets,gbagada,500000.0,Ifako Gbagada Gbagada Lagos,0
4,0,0,Standard self-contained,0 beds 0 baths 0 Toilets,gbagada,400000.0,...Gbagada Lagos,0
5,5,4,Renovated 5 bedroom office space at Atunrase g...,4 beds 5 baths 5 Toilets,gbagada,3000000.0,Atunrase Medina Gbagada Lagos,5
6,0,0,Lovely miniflat,beds baths Toilets,gbagada,700000.0,Gbagada Lagos,0
7,4,4,Executive and tastefully finished 4bedroom dup...,4 beds 4 baths 5 Toilets,gbagada,3000000.0,Medina Gbagada Lagos,5
8,0,4,An executive 4bedroom Terrance duplex,4 beds baths Toilets,gbagada,3800000.0,Atunrase Medina Gbagada Lagos,0
9,5,5,Luxury 5 bedroom detached duplex with bq,5 beds 5 baths 6 Toilets,gbagada,,Lekki Lekki Lagos,6


In [None]:
ld.to_csv("House_prices_gbagada.csv")