# Kenya Real Estate Prices Prediction

## Task: Collecting Demographic Data

### Objectives

Download demographic data from the Kenya 2019 Census and web scrape details about Counties and their Constituencies:

* Download dataset containing demographic data
* Using Selenium, extract County and Constituency information

Import useful libraries:

In [1]:
import pandas as pd
import numpy as np
from selenium import webdriver
import time

Silence some of the selenium usage warnings about outdated function use.

In [2]:
import warnings
warnings.filterwarnings('ignore')

Acquire dataset containing demographic data from the [Kenya 2019 Census](https://data.humdata.org/dataset/fa58ed8d-1daa-48b6-bae1-19746c32c85f) report.

In [3]:
df = pd.read_csv('https://data.humdata.org/dataset/fa58ed8d-1daa-48b6-bae1-19746c32c85f/resource/82f909ce-7358-48da-9639-8fe9c3318251/download/2019-population_census-report-per-county.csv')
df.head()

Unnamed: 0,County,Total_Population19,Male populatio 2019,Female population 2019,Households,Av_HH_Size,LandArea,Population Density,Population in 2009,Pop_change,Intersex population 2019
0,Baringo,666763,336322,330428,142518,5,10976,61,555561,111202,13
1,Bomet,875689,434287,441379,187641,5,2531,346,724186,151503,23
2,Bungoma,1670570,812146,858389,358796,5,3024,552,1630934,39636,35
3,Busia,893681,426252,467401,198152,5,1696,527,488075,405606,28
4,Elgeyo-Marakwet,454480,227317,227151,99861,5,3032,150,369998,84482,12


In [4]:
df.to_csv('Population_by_county.csv')

Next, we will scrape data about counties and their constituencies in order to match them with the real estate data locations.

We will use the following webpage: [Fortune of Africa](https://fortuneofafrica.com/kenya/kenya-counties-and-constituencies/)

In [5]:
webpage = 'https://fortuneofafrica.com/kenya/kenya-counties-and-constituencies/'

Get webpage using selenium webdriver

In [6]:
driver = webdriver.Chrome('/Selenium Drivers/chromedriver.exe')
driver.get(webpage)

Store an array of the counties and their constituencies

In [8]:
# Get all the properties on the current page
page = driver.find_element_by_css_selector("div[class='entry-content clearfix']")
counties = page.find_elements_by_css_selector("p")

# Show the number of counties
print(f'Number of counties: {len(counties)}')

Number of counties: 47


Iterate through all the counties and save the information in a dictionary

In [9]:
print(f'These are the counties and their constituencies \n')

# dictionary to save county data
county_dict = dict()

for county in counties:
    info = county.text.split('Constituencies:')
    
    # county name
    county = info[0].strip().split('.')[1]
    print(f'County: {county}')
    
    # constituency
    const = info[1].strip().split(',')
    print(f'Constituencies: {const}, \n')
    
    # save the info in dictionary
    county_dict[f'{county}'] = const
    

These are the counties and their constituencies 

County: Baringo County
Constituencies: ['Baringo East', ' Baringo West', ' Baringo Central', ' Mochongoi', ' Mogotio', ' Eldama Ravine.'], 

County: Bomet County
Constituencies: ['Sotik', ' Chepalungu', ' Bomet East', ' Bomet Central', ' Konoin'], 

County: Bungoma County
Constituencies: ['Mt. Elgon', ' Sirisia', ' Kabuchia', ' Bumula', ' Kandunyi', ' Webuye', ' Bokoli', ' Kimilili', ' Tongaren'], 

County: Busia County
Constituencies: ['Teso North', ' Teso South', ' Nambale', ' Matayos', ' Butula', ' Funyula', ' Budalangi'], 

County: Elgeyo/Marakwet County
Constituencies: ['Marakwet East', ' Marakwet West', ' Keiyo East', ' Keiyo South'], 

County:  Embu County
Constituencies: ['Manyatta', ' Runyejes', ' Gachoka', ' Siakago'], 

County: Garissa County
Constituencies: ['TaveDujis', ' Balambala', ' Lagdera', ' Dadaad', ' Fafi', ' Ijara'], 

County: Homa Bay County
Constituencies: ['Kasipul', ' Kabondo', ' Karachuonyo', ' Rangwe', ' Homa

Save dictionary information in pandas dictionary

In [10]:
df = pd.DataFrame.from_dict(county_dict, orient='index').T
df

Unnamed: 0,Baringo County,Bomet County,Bungoma County,Busia County,Elgeyo/Marakwet County,Embu County,Garissa County,Homa Bay County,Isiolo County,Kajiado County,...,Siaya County,Taita Taveta County,Tana River County,Tharaka Nithi County,Trans Nzoia County,Turkana County,Uasin Gishu County,Vihiga County,Wajir County,West Pokot County
0,Baringo East,Sotik,Mt. Elgon,Teso North,Marakwet East,Manyatta,TaveDujis,Kasipul,Isiolo North,Kajiado Central,...,Ugenya,Taveta,Garsen,Nithi,Kwanza,Turkana North,Eldoret East,Vihiga,Wajir North,Kapenguri
1,Baringo West,Chepalungu,Sirisia,Teso South,Marakwet West,Runyejes,Balambala,Kabondo,Isiolo South,Kajiado North,...,Ugunja,Wundanyi,Galole,Maara,Endebess,Turkana West,Eldoret North and Eldoret South.,Sabatia,Wajir East,Sigor
2,Baringo Central,Bomet East,Kabuchia,Nambale,Keiyo East,Gachoka,Lagdera,Karachuonyo,,Kajiado South,...,Alego Usonga,Mwatate,Bura,Tharaka,Saboti,Turkana Central,,Hamisi,Tarbaj,Kacheliba
3,Mochongoi,Bomet Central,Bumula,Matayos,Keiyo South,Siakago,Dadaad,Rangwe,,,...,Gem,Voi,,,Kiminini,Loima,,Emuhaya,Wajir West,Poko South
4,Mogotio,Konoin,Kandunyi,Butula,,,Fafi,Homabay Town,,,...,Bondo,,,,Cherenganyi,Turkana South,,Luanda,Eldas,
5,Eldama Ravine.,,Webuye,Funyula,,,Ijara,Ndhiwa,,,...,Rarieda,,,,,Turkana East.,,,Wajir South,
6,,,Bokoli,Budalangi,,,,Mbita,,,...,,,,,,,,,,
7,,,Kimilili,,,,,Gwassi,,,...,,,,,,,,,,
8,,,Tongaren,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,


Save dataframe to csv file

In [11]:
df.to_csv('county_constituencies.csv')

In [12]:
# quit selenium driver
driver.quit()

## Authors

<a href="https://www.linkedin.com/in/molomunyansanga/">Molo Munyansanga</a> is a Data Science enthusiast with certificates in Statistics, Data Science and Machine Learning. He is also enrolled in the Deep Learning Specialization by DeepLearning.AI

## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description      |
| ----------------- | ------- | ------------- | ----------------------- |
| 2022-04-12        | 1.3       | Molo. M       | Created Notebook and Completed Tasks         |