## Scraping Property Access PH through Selenium
This notebook was prepared by Adem Inovejas, Christopher Lim, Czarina Tiu, and Uriel Grace Magtibay (students of DATA102 S11 Y2022-2023).  

In your notebook, make sure you have the following details outlined:
- The website scraped
- Date and time when the data was collected
- What were the challenges encountered? You may narrate or illustrate this in the notebook.
- Do you think the collected data contains any personally identifiable information (PII)?
- Conclude with your key learnings and findings.

## Importing Libaries

In [73]:
from bs4 import BeautifulSoup
import pandas as pd
import os
import datetime

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager


## Setup

In [47]:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

In [48]:
url = "https://propertyaccess.ph/offer/sale"
driver.get(url)
print(driver.page_source)

<html lang="en-US"><head>
    <title>Access the best Philippine properties available in the market, from house and lot to condominiums · PropertyAccess Philippines</title><meta data-n-head="ssr" charset="utf-8"><meta data-n-head="ssr" name="viewport" content="width=device-width, initial-scale=1"><meta data-n-head="ssr" data-hid="description" name="description" content=""><meta data-n-head="ssr" name="robots" content="index, follow"><meta data-n-head="ssr" data-hid="og:title" name="og:title" content="Access the best Philippine properties available in the market, from house and lot to condominiums · PropertyAccess Philippines"><meta data-n-head="ssr" data-hid="og:description" name="og:description" content="PropertyAccess provides you with easy access to the best Philippine properties, from house and lot to condominiums, and to the life and style you truly deserve. Access your dream home - made simpler with PropertyAccess."><meta data-n-head="ssr" data-hid="og:image" name="og:image" conte

## Data Collection

In [74]:
num_pages = 5
i = 2
all_products=[]
# The link of the first page is unique li[2], succeding pages li[3] 
link = '/html/body/div/div[2]/div/div/div/div[2]/div[3]/div/ul/li[2]/a'
for p in range(num_pages):  
    products_container = driver.find_element(by="xpath", value='//div[@class="list-product page-content"]')
    products = products_container.find_elements(by="xpath", value='.//div[@class="product-wrapper"]')
    for product in products:
        name = product.find_element(by="xpath", value='.//div[@class="name"]').text
        author = product.find_element(by="xpath", value='.//span[@class="author-name"]').text
        
        # Not all properties have building property 
        try:
            building = product.find_element(by="xpath", value='.//span[@class="building"]').text
        except: 
            building = "N/A"
        
        address =  product.find_element(by="xpath", value='.//div[@class="address"]').text
        price =  product.find_element(by="xpath", value='.//div[@class="price"]').text
        
        # Not all properties have facilities or amenities 
        try:
            facilities = [p.text for p in product.find_element(by="xpath", value='.//div[@class="facilities line-clamp lc-2"]').find_elements(by="tag name", value="div")]
        except:
            facilities = "N/A"
            
        date = product.find_element(by="xpath", value='.//div[@class="date"]').text
        url = product.find_element(by="xpath", value='.//div[@class="product-img"]').find_element(by="tag name", value="a").get_attribute('href')
        scrapetime = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        
        all_products.append((name, author, building, address, price, facilities, date, url, scrapetime))
        
        
    next_page = driver.find_element("xpath", link).get_attribute("href");
    driver.get(next_page)  
    link = '/html/body/div/div[2]/div/div/div/div[2]/div[3]/div/ul/li[3]/a'
    

print('Number of products:', len(all_products))
#print (driver.find_element("xpath", '/html/body/div/div[2]/div/div/div/div[2]/div[3]/div/ul/li[2]/a').get_attribute("href"))

Number of products: 100


In [76]:
#print(all_products)
df = pd.DataFrame(all_products)
df.columns = ['Name', 'Author', 'Building', 'Address', 'Price', 'Facilities', 'Date', 'Url', 'Scrapetime']
df

Unnamed: 0,Name,Author,Building,Address,Price,Facilities,Date,Url,Scrapetime
0,"3 Bedroom Condo in Aurelia Residences, Taguig",Shang Properties,Aurelia Residences,"McKinley Parkway, Taguig, Metro Manila","from ₱ 107,300,000","[24-Hour Security, CCTV, Entertainment Area, F...",Published on: 17/07/2022,https://propertyaccess.ph/property/3-bedroom-c...,2022-09-26 18:00:53
1,3 Bedroom Condo in Shang Residences at Wack Wa...,Shang Properties,Shang Residences at Wack Wack,"Wack Wack Road, Mandaluyong, Metro Manila","from ₱ 54,500,000","[24-Hour Security, CCTV, Club House, Entertain...",Published on: 17/07/2022,https://propertyaccess.ph/property/3-bedroom-c...,2022-09-26 18:00:53
2,"2BR Condo in Residences at The Galleon, Pasig",Ortigas Land,Residences at The Galleon,"ADB Avenue, Pasig, Metro Manila","from ₱ 41,500,000","[Fitness Center, Swimming Pool, Lounge, Entert...",Published on: 1/06/2022,https://propertyaccess.ph/property/2br-condo-i...,2022-09-26 18:00:53
3,"Penthouse in Residences at The Galleon, Pasig",Ortigas Land,Residences at The Galleon,"ADB Avenue, Pasig, Metro Manila","from ₱ 111,500,000","[Fitness Center, Swimming Pool, Lounge, Entert...",Published on: 1/06/2022,https://propertyaccess.ph/property/penthouse-i...,2022-09-26 18:00:54
4,"3 Bedroom Condo in Aurelia Residences, Taguig",Shang Properties,Aurelia Residences,"McKinley Parkway, Taguig, Metro Manila","from ₱ 181,302,240","[24-Hour Security, CCTV, Entertainment Area, F...",Published on: 11/05/2022,https://propertyaccess.ph/property/3-bedroom-c...,2022-09-26 18:00:54
...,...,...,...,...,...,...,...,...,...
95,"Studio Condo, Cagayan de Oro",Crissy Angeles,Vista Residences,"Limketkai Drive, Cagayan de Oro, Northern Mind...","₱ 3,500,000","[24-Hour Security, CCTV, Fitness Center, Funct...",Published on: 21/09/2022,https://propertyaccess.ph/property/studio-cond...,2022-09-26 18:01:36
96,"Residential Lot, Calabarzon",Alyssa Barroso,,Calabarzon,"₱ 7,700,000",,Published on: 21/09/2022,https://propertyaccess.ph/property/residential...,2022-09-26 18:01:37
97,"3 Bedroom Condo in Avida Cityflex Towers, Taguig",Christine Li,Avida Cityflex Towers,"Taguig, Metro Manila","₱ 15,000,000","[Grand Lobby, Fitness Center, Play Area, Garde...",Published on: 21/09/2022,https://propertyaccess.ph/property/3-bedroom-c...,2022-09-26 18:01:37
98,"Office Space, Makati",RBK Property Consultants Inc.,,"V.A. Rufino Street, Makati, Metro Manila","₱ 35,000,000","[24-Hour Security, CCTV, Elevators, Fiber read...",Published on: 20/09/2022,https://propertyaccess.ph/property/office-spac...,2022-09-26 18:01:37


In [46]:
driver.close()

## Data Cleaning

## Exporting the Data

## Challenges

## Conclusion