About

Scraping Finn housing/work ads with Python and requests. Work in progress.

Scraping different subdomains within finn (see parameters.yml). E.g. housing ads, project ads, work ads. Each different subdomain requires a different set of xpaths, though there are many common denominators (see src/xpaths.py).

Only tested on Python 3.11

CSV example

Log example

Setup

mkdir scrapes
mkdir logs
pip install -r requirements.txt

Parameters

Adjust parameters in parameters.yml.
daily_scrape: If true scraper only scrapes the daily adds.
finn_sub_urls: Which part of finn to scrape. A different CSV is created for all the different sub urls.

To run

python src/finn_scraper.py

Checklist

Add detail to headers.
Add sleep timer and folder etc to parameters.yml.
Custom queries instead of binary daily/not daily scrape.
Reduce line length across project.
Checking if all requests yields code 200.
Process data function for html->text.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
media		media
src		src
.gitignore		.gitignore
README.md		README.md
parameters.yml		parameters.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Setup

Parameters

To run

Checklist

About

Releases

Packages

Languages

JonOlav95/finn_scraper

Folders and files

Latest commit

History

Repository files navigation

About

Setup

Parameters

To run

Checklist

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages