# Kenya Real Estate Prices Prediction

## Task: Web Scraping Real Estate Data

### Objectives

Scrape Real Estate data from [buyrentkenya.com](https://www.buyrentkenya.com):

* Collect house data
* Collect apartment data
* Collect land data

Save the data in csv files

Import useful libraries:

In [1]:
import pandas as pd
import numpy as np
from selenium import webdriver
import time

Create a function that will go throgh all the pages on the website after extracting the data

In [2]:
def scrape(webpage):
    # Create lists where we will store all the scraped data
    links = []
    bedroom_data = []
    bathroom_data = []
    prices = []
    size = []
    title = []
    locations = []
    
    # page number
    page = 1

    # Get the webpage where we will begin to scrape the data.
    driver = webdriver.Chrome(executable_path='/Selenium Drivers/chromedriver.exe')
    driver.get(webpage)
    time.sleep(20)

    while True:
        # Get all the properties on the current page
        properties = driver.find_elements_by_css_selector("div[class='mb-3 w-full']")

        # Iterate through the properties to get all the information
        for house in properties:
            # link to property
            url = house.find_element_by_css_selector("a[class='text-black no-underline block']").get_attribute('href')
            links.append(url)

            # property description
            description  = house.find_element_by_css_selector("a[class='no-underline text-black']").text
            title.append(description)

            # number of bedrooms
            try:
                bedrooms = house.find_element_by_css_selector("span[data-cy='card-beds']").text
                bedroom_data.append(bedrooms)
            except:
                bedroom_data.append(np.nan)

            # number of bathrooms
            try:
                bathrooms = house.find_element_by_css_selector("div[data-cy='card-baths']")
                bathroom_data.append(bathrooms.text)

            except:
                bathroom_data.append(np.nan)

            # location
            location = house.find_element_by_css_selector("p[class='text-md md:text-sm font-normal text-grey-darker mt-1 md:mt-0']").text
            locations.append(location)

            # price
            price = house.find_element_by_css_selector("a[class='no-underline']").text
            prices.append(price)

            # Get size of property in as Area per metre squared
            try:
                area = house.find_element_by_css_selector("span[data-cy='card-area']")
                size.append(area.text)

            except:
                size.append(np.nan)

        print(f'Page {page} Done!')

        try:
            # go to next page
            next_page = driver.find_element_by_css_selector("li[class='page-item pagination-next-nav ']")
            next_page.click()
            page += 1
            time.sleep(25)

        except Exception as E:
            print(E)
            driver.quit()
            break
    
    # create dictionary that will be transformed into pandas data frame
    data_dict = {'Title': title, 'URL': links, 'Bedrooms':bedroom_data, 'Bathrooms': bathroom_data, 'Price': prices,
                     'Size': size, 'Location':locations}
    # create dataframe
    df = pd.DataFrame(data_dict)
    
    return df

## Scrape house data:

Begin by inserting the webpage with data on houses for sale

In [2]:
webpage = 'https://www.buyrentkenya.com/houses-for-sale'

In [4]:
webpage = 'https://www.buyrentkenya.com/houses-for-sale'
df_houses = scrape(webpage)

Page 1 Done!
Page 2 Done!
Page 3 Done!
Page 4 Done!
Page 5 Done!
Page 6 Done!
Page 7 Done!
Page 8 Done!
Page 9 Done!
Page 10 Done!
Page 11 Done!
Page 12 Done!
Page 13 Done!
Page 14 Done!
Page 15 Done!
Page 16 Done!
Page 17 Done!
Page 18 Done!
Page 19 Done!
Page 20 Done!
Page 21 Done!
Page 22 Done!
Page 23 Done!
Page 24 Done!
Page 25 Done!
Page 26 Done!
Page 27 Done!
Page 28 Done!
Page 29 Done!
Page 30 Done!
Page 31 Done!
Page 32 Done!
Page 33 Done!
Page 34 Done!
Page 35 Done!
Page 36 Done!
Page 37 Done!
Page 38 Done!
Page 39 Done!
Page 40 Done!
Page 41 Done!
Page 42 Done!
Page 43 Done!
Page 44 Done!
Page 45 Done!
Page 46 Done!
Page 47 Done!
Page 48 Done!
Page 49 Done!
Page 50 Done!
Page 51 Done!
Page 52 Done!
Page 53 Done!
Page 54 Done!
Page 55 Done!
Page 56 Done!
Page 57 Done!
Page 58 Done!
Page 59 Done!
Page 60 Done!
Page 61 Done!
Page 62 Done!
Page 63 Done!
Page 64 Done!
Page 65 Done!
Page 66 Done!
Page 67 Done!
Page 68 Done!
Page 69 Done!
Page 70 Done!
Page 71 Done!
Page 72 Done!
P

In [5]:
df_houses.to_csv('kenya_house_data.csv', index=False)

## Scrape apartment data:

Next, we will insert the web page with data on apartments for sale

In [4]:
df_apartments = scrape('https://www.buyrentkenya.com/flats-apartments-for-sale')

Page 1 Done!
Page 2 Done!
Page 3 Done!
Page 4 Done!
Page 5 Done!
Page 6 Done!
Page 7 Done!
Page 8 Done!
Page 9 Done!
Page 10 Done!
Page 11 Done!
Page 12 Done!
Page 13 Done!
Page 14 Done!
Page 15 Done!
Page 16 Done!
Page 17 Done!
Page 18 Done!
Page 19 Done!
Page 20 Done!
Page 21 Done!
Page 22 Done!
Page 23 Done!
Page 24 Done!
Page 25 Done!
Page 26 Done!
Page 27 Done!
Page 28 Done!
Page 29 Done!
Page 30 Done!
Page 31 Done!
Page 32 Done!
Page 33 Done!
Page 34 Done!
Page 35 Done!
Page 36 Done!
Page 37 Done!
Page 38 Done!
Page 39 Done!
Page 40 Done!
Page 41 Done!
Page 42 Done!
Page 43 Done!
Page 44 Done!
Page 45 Done!
Page 46 Done!
Page 47 Done!
Page 48 Done!
Page 49 Done!
Page 50 Done!
Page 51 Done!
Page 52 Done!
Page 53 Done!
Page 54 Done!
Page 55 Done!
Page 56 Done!
Page 57 Done!
Page 58 Done!
Page 59 Done!
Page 60 Done!
Page 61 Done!
Page 62 Done!
Page 63 Done!
Page 64 Done!
Page 65 Done!
Page 66 Done!
Page 67 Done!
Page 68 Done!
Page 69 Done!
Page 70 Done!
Page 71 Done!
Page 72 Done!
P

Save dataframe to csv file

In [5]:
df_apartments.to_csv('Apartment_data.csv', index=False)

## Scrape land data

Finally, we will scrape data on land that is for sale and save it in a csv file

In [3]:
df_land = scrape('https://www.buyrentkenya.com/land-for-sale')

Page 1 Done!
Page 2 Done!
Page 3 Done!
Page 4 Done!
Page 5 Done!
Page 6 Done!
Page 7 Done!
Page 8 Done!
Page 9 Done!
Page 10 Done!
Page 11 Done!
Page 12 Done!
Page 13 Done!
Page 14 Done!
Page 15 Done!
Page 16 Done!
Page 17 Done!
Page 18 Done!
Page 19 Done!
Page 20 Done!
Page 21 Done!
Page 22 Done!
Page 23 Done!
Page 24 Done!
Page 25 Done!
Page 26 Done!
Page 27 Done!
Page 28 Done!
Page 29 Done!
Page 30 Done!
Page 31 Done!
Page 32 Done!
Page 33 Done!
Page 34 Done!
Page 35 Done!
Page 36 Done!
Page 37 Done!
Page 38 Done!
Page 39 Done!
Page 40 Done!
Page 41 Done!
Page 42 Done!
Page 43 Done!
Page 44 Done!
Page 45 Done!
Page 46 Done!
Page 47 Done!
Page 48 Done!
Page 49 Done!
Page 50 Done!
Page 51 Done!
Page 52 Done!
Page 53 Done!
Page 54 Done!
Page 55 Done!
Page 56 Done!
Page 57 Done!
Page 58 Done!
Page 59 Done!
Page 60 Done!
Page 61 Done!
Page 62 Done!
Page 63 Done!
Page 64 Done!
Page 65 Done!
Page 66 Done!
Page 67 Done!
Page 68 Done!
Page 69 Done!
Page 70 Done!
Page 71 Done!
Page 72 Done!
P

In [4]:
df_land.to_csv('Land_data.csv', index=False)

## Authors

<a href="https://www.linkedin.com/in/molomunyansanga/">Molo Munyansanga</a> is a Data Science enthusiast with certificates in Statistics, Data Science and Machine Learning. He is also enrolled in the Deep Learning Specialization by DeepLearning.AI

## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description      |
| ----------------- | ------- | ------------- | ----------------------- |
| 2022-04-14        | 1.6       | Molo. M       | Created Notebook and Completed Tasks         |