Before any machine learning task, data has to be processed and get its features extracted. This project contains a python script to extract data from tourism websites, about all trip offers that are presented, and organize it in a mangoDB noSQL database, for ready usage in a machine learning pipeline. The python script had been made with python libraries such as BeautifulSoup for the scraping.
Future Features :
- Javascript scripts for scraping
- Ready containers for scraping data regularly