## Cleaning the house data scraped from buyrentkenya
Procedure:
- Rename columns
- Remove link from phone number
- Change dtype of "Price" column from object to float, and remove KSh. prefix
- Load clean data into houses.csv file to be loaded into a Snowflake warehouse for business transformations

In [15]:
import pandas as pd

df = pd.read_csv("docs/buyrentkenya (2).csv")
df.head()

Unnamed: 0,fill-current href,relative href,font-semibold,flex,Location,text-md,Contact
0,[object Object],https://www.buyrentkenya.com/listings/4-bedroo...,4 Bed House with En Suite at Manyanja Road,"KSh 9,500,000","4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...,https://wa.me/+254703106112?text=Hi%2C+I+am+in...
1,[object Object],https://www.buyrentkenya.com/listings/7-bedroo...,7 Bed House with En Suite at Lavington,"KSh 130,000,000",Lavington,"House for sale in Lavington, owashika road. Wh...",https://wa.me/+254701054650?text=Hi%2C+I+am+in...
2,[object Object],https://www.buyrentkenya.com/listings/6-bedroo...,6 Bed House with En Suite in Garden Estate,"KSh 130,000,000","Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...,https://wa.me/+254117331608?text=Hi%2C+I+am+in...
3,[object Object],https://www.buyrentkenya.com/listings/6-bedroo...,6 Bed Villa with En Suite in Kitisuru,"KSh 110,000,000","Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...,https://wa.me/+254722033335?text=Hi%2C+I+am+in...
4,[object Object],https://www.buyrentkenya.com/listings/4-bedroo...,4 Bed Townhouse with En Suite at Langata,"KSh 36,900,000",Langata,4 bedroom apartment for sale langata,https://wa.me/+254712605101?text=Hi%2C+I+am+in...


In [16]:
# Drop unnecessary columns
df.drop(columns=["fill-current href", "relative href"], inplace=True)
df.head()

Unnamed: 0,font-semibold,flex,Location,text-md,Contact
0,4 Bed House with En Suite at Manyanja Road,"KSh 9,500,000","4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...,https://wa.me/+254703106112?text=Hi%2C+I+am+in...
1,7 Bed House with En Suite at Lavington,"KSh 130,000,000",Lavington,"House for sale in Lavington, owashika road. Wh...",https://wa.me/+254701054650?text=Hi%2C+I+am+in...
2,6 Bed House with En Suite in Garden Estate,"KSh 130,000,000","Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...,https://wa.me/+254117331608?text=Hi%2C+I+am+in...
3,6 Bed Villa with En Suite in Kitisuru,"KSh 110,000,000","Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...,https://wa.me/+254722033335?text=Hi%2C+I+am+in...
4,4 Bed Townhouse with En Suite at Langata,"KSh 36,900,000",Langata,4 bedroom apartment for sale langata,https://wa.me/+254712605101?text=Hi%2C+I+am+in...


In [17]:
# Push data to start with index = 1
df.index = df.index + 1 

In [18]:
# Rename our columns
df.rename(columns={"font-semibold": "name", "flex": "price", "text-md": "description", "Contact": "contact"}, inplace=True)
df.head()

Unnamed: 0,name,price,Location,description,contact
1,4 Bed House with En Suite at Manyanja Road,"KSh 9,500,000","4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...,https://wa.me/+254703106112?text=Hi%2C+I+am+in...
2,7 Bed House with En Suite at Lavington,"KSh 130,000,000",Lavington,"House for sale in Lavington, owashika road. Wh...",https://wa.me/+254701054650?text=Hi%2C+I+am+in...
3,6 Bed House with En Suite in Garden Estate,"KSh 130,000,000","Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...,https://wa.me/+254117331608?text=Hi%2C+I+am+in...
4,6 Bed Villa with En Suite in Kitisuru,"KSh 110,000,000","Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...,https://wa.me/+254722033335?text=Hi%2C+I+am+in...
5,4 Bed Townhouse with En Suite at Langata,"KSh 36,900,000",Langata,4 bedroom apartment for sale langata,https://wa.me/+254712605101?text=Hi%2C+I+am+in...


In [19]:
# Remove the KSh prefix and commas from df['price'] column\
df["price"] = df["price"].str.replace('KSh', '').str.replace(',', '')
df.head()

Unnamed: 0,name,price,Location,description,contact
1,4 Bed House with En Suite at Manyanja Road,9500000,"4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...,https://wa.me/+254703106112?text=Hi%2C+I+am+in...
2,7 Bed House with En Suite at Lavington,130000000,Lavington,"House for sale in Lavington, owashika road. Wh...",https://wa.me/+254701054650?text=Hi%2C+I+am+in...
3,6 Bed House with En Suite in Garden Estate,130000000,"Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...,https://wa.me/+254117331608?text=Hi%2C+I+am+in...
4,6 Bed Villa with En Suite in Kitisuru,110000000,"Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...,https://wa.me/+254722033335?text=Hi%2C+I+am+in...
5,4 Bed Townhouse with En Suite at Langata,36900000,Langata,4 bedroom apartment for sale langata,https://wa.me/+254712605101?text=Hi%2C+I+am+in...


In [20]:
# Remove the link elements from contacts
df['contact'] = df['contact'].str.replace('https://wa.me/', '')
df.head()

Unnamed: 0,name,price,Location,description,contact
1,4 Bed House with En Suite at Manyanja Road,9500000,"4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...,+254703106112?text=Hi%2C+I+am+interested+in+yo...
2,7 Bed House with En Suite at Lavington,130000000,Lavington,"House for sale in Lavington, owashika road. Wh...",+254701054650?text=Hi%2C+I+am+interested+in+yo...
3,6 Bed House with En Suite in Garden Estate,130000000,"Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...,+254117331608?text=Hi%2C+I+am+interested+in+yo...
4,6 Bed Villa with En Suite in Kitisuru,110000000,"Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...,+254722033335?text=Hi%2C+I+am+interested+in+yo...
5,4 Bed Townhouse with En Suite at Langata,36900000,Langata,4 bedroom apartment for sale langata,+254712605101?text=Hi%2C+I+am+interested+in+yo...


In [21]:
filt = df.loc[1]['contact']
filt

'+254703106112?text=Hi%2C+I+am+interested+in+your+property+4+bedroom+Houses+on+https://www.buyrentkenya.com%2Flistings%2F4-bedroom-house-for-sale-donholm-3800484'

After long consideration, I don't think I can extract the contact from the WhatsApp URL, so I'll just drop that column

In [22]:
df.drop(columns=['contact'], inplace=True)
df.head()

Unnamed: 0,name,price,Location,description
1,4 Bed House with En Suite at Manyanja Road,9500000,"4, Manyanja Road, Donholm",4 Bed Townhouse with Ensuite in Greenfields Es...
2,7 Bed House with En Suite at Lavington,130000000,Lavington,"House for sale in Lavington, owashika road. Wh..."
3,6 Bed House with En Suite in Garden Estate,130000000,"Garden Estate, Roysambu",6 BEDROOM HOUSE FOR SALE IN THOME- GARDEN ESTA...
4,6 Bed Villa with En Suite in Kitisuru,110000000,"Kitisuru, Westlands",6 bedroom villa for sale in Kitisuru at Kshs ...
5,4 Bed Townhouse with En Suite at Langata,36900000,Langata,4 bedroom apartment for sale langata


In [23]:
# Changing the datatype of the 'prices' column.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2497 entries, 1 to 2497
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         2497 non-null   object
 1   price        2471 non-null   object
 2   Location     2497 non-null   object
 3   description  2497 non-null   object
dtypes: object(4)
memory usage: 78.2+ KB


In [24]:
df['price'] = df['price'].astype(float)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2497 entries, 1 to 2497
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         2497 non-null   object 
 1   price        2471 non-null   float64
 2   Location     2497 non-null   object 
 3   description  2497 non-null   object 
dtypes: float64(1), object(3)
memory usage: 78.2+ KB


Cleaning is complete, loading into a .csv file.

In [25]:
df.to_csv('docs/houses.csv')