** CURRENTLY BROKEN XPATHS (TO BE FIXED) ** Scraping Finn housing/work ads with Python and requests. Work in progress.
Scraping different subdomains within finn (see parameters.yml). E.g. housing ads, project ads, work ads. Each different subdomain requires a different set of xpaths, though there are many common denominators (see src/xpaths.py).
Only tested on Python 3.11
mkdir scrapes
mkdir logs
pip install -r requirements.txt
Adjust parameters in parameters.yml
.
daily_scrape: If true scraper only scrapes the daily adds.
finn_sub_urls: Which part of finn to scrape. A different CSV is created for
all the different sub urls.
python src/finn_scraper.py
- Add detail to headers.
- Add sleep timer and folder etc to parameters.yml.
- Custom queries instead of binary daily/not daily scrape.
- Reduce line length across project.
- Checking if all requests yields code 200.
- Process data function for html->text.