Skip to content

debba/infoimprese-scraping-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InfoImprese Scraper Tool

Config

Before using for the first time this tool you should create a config.json inside the conf folder. Please take a look to conf/config.example.json or directly clone it.

You can setup an Anti-Captcha API keys in order to skip captcha checks, please follow how to generate keys from this link: https://anti-captcha.com

You can setup fields you want to export. A complete list:

  • "Denominazione",
  • "Sede legale",
  • "Attività",
  • "Sede operativa",
  • "Indirizzo web",
  • "Posta elettronica",
  • "Commercio elettronico",
  • "Chi siamo",
  • "Cosa facciamo",
  • "Classe di fatturato",
  • "Canali di vendita",
  • "Marchi",
  • "Principali paesi di export",
  • "Certificazioni"

You can setup a mode, you can learn about it following the next section.

Modes

You can choose one of the following scraping modes:

  • search_by_name (Ricercando nel Nome in the website)
  • search_by_desc (Ricercando nella Descrizione attività in the website)
  • with_dash (con la Vetrina su infoimprese.it in the website)
  • with_cert (con certificazione di qualità in the website)
  • with_dash (che praticano e-commerce in the website)
  • with_email (che possiedono l'e-mail in the website)
  • with_website (che hanno il sito internet in the website)
  • with_export (che svolgono attività di export in the website)

Usage

usage: main.py [-h] -q QUERY [-m MODE] [-l LOCATION] [-o OUTPUT]

Arguments are:

  • query represents your keyword
  • location represents where you want search
  • mode represent modes (check Modes section)
  • output csv file for storing data

Enjoy :)

Windows User?

You can use exec.bat in order to have a very basic GUI

Credits

Disclaimer: Please Note that this is a research project. I am by no means responsible for any usage of this tool.

About

Scraping from InfoImprese using Anti-Captcha service

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published