> # Scraping domain.com.au

We will scrape the website domain.com.au to extract different features of current rental properties including rental price, number of bedrooms, bathrooms, parkings, suburb and postcode, etc. The URL links of each property will also be recorded for reference.

> ### Import libraries and functions

In [29]:
import pandas as pd
%run ../scripts/'scrape domain.com.py'


> ### Start scraping website

In [30]:
# Initialise the first page of property listings
url = "https://www.domain.com.au/rent/melbourne-region-vic/?excludedeposittaken=1&page=1"
properties = []
page_count = 1

In [31]:
while url:  # stop when there are no more pages to scrape
    print("Begin page", page_count)    

    # Fetch and parse HTML of the current page
    soup = get_soup(url)    # function from scrape domain.com

    # Get a list of all property listings on the page
    listings = soup.find_all('div', {'class': 'css-qrqvvg'})

    # Extract information of each property
    for property in listings:
        price = get_price(property)
        if not price:   # exclude properties that do not have a rental price         
            continue     # usually are properties under application

        suburb, postcode = get_suburb(property)
        bedrooms, bathrooms, parkings = get_features(property)
        property_type = property.find('span', class_='css-693528').text.strip()     

        # Look into the property's page to find additional features
        link = property.find('a', href = True)
        property_url = link['href']
        property_soup = get_soup(property_url)

        additional_features = get_additional_features(property_soup)
        
        # Store the extracted data in a dictionary
        property_data = {
            'price (AUD per week)': price,
            'bedrooms': bedrooms,
            'bathrooms': bathrooms,
            'parkings': parkings,
            'property type': property_type,
            'suburb': suburb,
            'postcode': postcode,
            'additional features': additional_features,
            'property url': property_url
        }

        # Append the dictionary to the list of properties
        properties.append(property_data)
    url = get_next_url(soup)
    page_count += 1

Begin page 1
Begin page 2
Begin page 3
Begin page 4
Begin page 5
Begin page 6
Begin page 7
Begin page 8
Begin page 9
Begin page 10
Begin page 11
Begin page 12
Begin page 13
Begin page 14
Begin page 15
Begin page 16
Begin page 17
Begin page 18
Begin page 19
Begin page 20
Begin page 21
Begin page 22
Begin page 23
Begin page 24
Begin page 25
Begin page 26
Begin page 27
Begin page 28
Begin page 29
Begin page 30
Begin page 31
Begin page 32
Begin page 33
Begin page 34
Begin page 35
Begin page 36
Begin page 37
Begin page 38
Begin page 39
Begin page 40
Begin page 41
Begin page 42
Begin page 43
Begin page 44
Begin page 45
Begin page 46
Begin page 47
Begin page 48
Begin page 49
Begin page 50


In [33]:
# Convert list of dictionaries to a pandas DataFrame
df = pd.DataFrame(properties)

# Write the DataFrame to a CSV file
df.to_csv('../data/landing/properties.csv', index=False)

print('Data successfully written to properties.csv')

Data successfully written to properties.csv
