About

** CURRENTLY BROKEN XPATHS (TO BE FIXED) ** Scraping Finn housing/work ads with Python and requests. Work in progress.

Scraping different subdomains within finn (see parameters.yml). E.g. housing ads, project ads, work ads. Each different subdomain requires a different set of xpaths, though there are many common denominators (see src/xpaths.py).

Only tested on Python 3.11

CSV example

Log example

Setup

mkdir scrapes
mkdir logs
pip install -r requirements.txt

Parameters

Adjust parameters in parameters.yml.
daily_scrape: If true scraper only scrapes the daily adds.
finn_sub_urls: Which part of finn to scrape. A different CSV is created for all the different sub urls.

To run

python src/finn_scraper.py

Checklist

Add detail to headers.
Add sleep timer and folder etc to parameters.yml.
Custom queries instead of binary daily/not daily scrape.
Reduce line length across project.
Checking if all requests yields code 200.
Process data function for html->text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

About

Setup

Parameters

To run

Checklist

Files

README.md

Latest commit

History

README.md

File metadata and controls

About

Setup

Parameters

To run

Checklist