Skip to content
This repository has been archived by the owner on May 16, 2022. It is now read-only.

JulienAlardot/challenge-collecting-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Collecting Data

Description

Small group project made at BeCode. The aim was to scrap real estate data from websites and create a Database of more than 10.000 houses for sale. This will be used later in the formation. Objective hoped by the groupe : 80k.

Installation

Main packages used:

  • Selenium
  • Pandas
  • Json
  • Request
  • BeautifulSoup

Usage

To scrap real estate data from websites and create a Dataset

Visuals

Data Columns

Data Distribution

Contributors

The group working on this project is composed of:

We split up the sites to scraps as following :

Site
Immoweb Alain
LogicImmo Julien
ImmoVlan Jeff

We were racing all day and night toward the record of 50 000, (that we have beaten)

Wacky races

Difficulties and improuvements

  • We don't already past the captcha of ImmoVlan in time after trying with rotating headers, using selenium.
  • We had to adapt our data to a common trunk found in the websites. Hence loosing informations.
  • We could try NLP techniques to get more information and filter typos inside the code.
  • Most of the websites used having receive updates, the code as is isn't working anymore.

Timeline

From Monday 3 May to 6 May 2021

How to use

Run the Core.database_gen.load_database() to get the final database (the file is database.csv under Data)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Languages