Skip to content

ankursoni/web-scraper-trademe-co-nz-property

Repository files navigation

web-scraper-trademe-co-nz-property

Build codecov Code style: black License

Web scraper for https://trademe.co.nz/property

Built With

  • Python v3.9.13
  • Flask
  • Docker
  • Helm chart & Kubernetes

Getting Started

To get a local copy up and running, follow these simple example steps.

Prerequisites

Python v3.9.13
Docker (optional)
Helm chart and Kubernetes (optional)

Setup

# create a virtual environment
# assuming you have "python3 --version" = "Python 3.9.13" installed in the current terminal session
python3 -m venv ./venv

# activate virtual environment
# for macos or linux
source ./venv/bin/activate
# for windows
.\venv\Scripts\activate

# upgrade pip
python -m pip install --upgrade pip

# install python dependencies
pip install -r requirements.txt

# lint python code
pylint ./search

Install

  1. Run as cli:
# argument 1 = city
# argument 2 = total number of pages
# argument 3 = true or false (default) to do search 'with detail' or 'without detail'
# argument 4 = output file (default = result.psv)
# argument 5 = true or false (default) to enable debug mode logging
# e.g. python -m search.main <city> <total pages> <true or false> <file.psv> <true or false>
python -m search.main auckland 1 false result.psv
  1. Or, run as web api server:
FLASK_ENV=development python3 -m search.app
  1. Or, build and run in docker container:
# build docker image
docker build -t webscrappe-trademe-co-nz-property:mvp .

# run docker container
docker run -d -p 8080:8080 --name webscrappe webscrappe-trademe-co-nz-property:mvp

# stop and remove docker container
docker stop webscrappe
docker rm webscrappe
  1. Or, run in kubernetes cluster:
# upgrade or install helm chart, if not preset
cd .deploy/helm
helm upgrade -i webscrappe-trademe-co-nz-property webscrappe-trademe-co-nz-property \
	-n webscrappe --create-namespace

# stop and remove helm chart and namespace
helm uninstall webscrappe-trademe-co-nz-property -n webscrappe
kubectl delete namespace webscrappe

Usage

When running as a command line interface (cli):

# argument 1 = city
# argument 2 = total number of pages
# argument 3 = true or false (default) to do search 'with detail' or 'without detail'
# argument 4 = output file (default = result.psv)
# argument 5 = true or false (default) to enable debug mode logging
# e.g. python -m search.main <city> <total pages> <true or false> <file.psv> <true or false>
python -m search.main auckland 1 false result.psv

NOTE: the output result.psv file needs to be imported with a custom Delimiter or Separator type - | in the import CSV wizard.

When running as an api, use the following endpoints:

  1. 'http://{domain name}:{port}/search-without-detail/{city}/{total number of pages}' searching without property detail.
# example
curl http://localhost:8080/search-without-detail/auckland/1
  1. 'http://{domain name}:{port}/search-with-detail/{city}/{total number of pages}' searching with property detail.
# example
curl http://localhost:8080/search-with-detail/auckland/1

Mapping of columns

An example property search page lists property as follows: property-search The fields in PSV are mapped as follows:

  • title
    property-search-title
  • address
    property-search-address
  • number_of_bedrooms
    property-search-bedrooms
  • number_of_bathrooms
    property-search-bathrooms
  • number_of_parking_lots
    property-search-parking-lots
  • number_of_living_areas
    property-search-living-areas
  • floor_area_sqm
    property-search-floor-area
  • land_area_sqm
    property-search-land-area
  • asking_price
    property-search-asking-price

Another example property detail page shows property as follows: property-detail The fields in PSV are mapped as follows:

  • property_type
    property-detail-property-type
  • parking_type
    property-detail-parking-type
  • in_the_area
    property-detail-in-the-area
  • property_id
    property-detail-property-id
  • broadband_options
    property-detail-broadband-options
  • description
    property-detail-description

Another example property detail page shows property as follows: property-detail-2 More fields in PSV are mapped as follows:

  • rateable_value
    property-detail-2-rateable-value
  • agency_reference
    property-detail-2-agency-reference

Run tests

# run unit tests
pytest -v --cov=search

Authors

👤 Ankur Soni

Github

LinkedIn

Twitter

Show your support

Give a ⭐️ if you like this project!

📝 License

This project is MIT licensed.