Hemnet Crawler

A crawler for Hemnet, Sweden's largest housing platform, using Scrapy.

I have created a crawler that collects data from the website. First I start with the url of a page containing various house listings. From this page I extract the url for each listing's page, since that's where the useful data are located. As the starting page can have many listings, that can't fit in a single page, the program follows the next_page class in order to get each one of them.

Afterwards inside the parseInnerPage function the data are collected. First the streetName, then the price and then I create a dictionary for each listing containing all it's adsitional data and their labels. In this function I also "clean" the data from whitespaces and unnecessary characters.

All the data for each listing are stored in a dictionary and after the spider finishes collecting the data, the results are stored inside results.json.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
hemnet		hemnet
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hemnet Crawler

About

Releases

Packages

Languages

GeorgeFarao/Hemnet-Crawler

Folders and files

Latest commit

History

Repository files navigation

Hemnet Crawler

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages