Skip to content

chrishein/bora_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BORA Crawler

BORA (Boletín Oficial de la Republica Argentina) Crawler implemented using Scrapy.

BORA is the Official Gazette for Argentina, where the government publishes public or legal notices, including companies incorporation or modifications in their structure and share holders.

More details can be found in this article.

This crawler saves the following information for each notice:

  • id: Notice ID in the BORA website
  • company: Name of the company
  • date: Date of publication
  • type: Type of publication. Eg: company constitution, company modification, etc.
  • content: Text of publication

The content of the publication contains unstructured text and must be further processed in order to extract data.

Running the Spider

To run the spider and save the crawled items in JSON use:

scrapy crawl bora -o items_bora.json -a start_date=YYY-mm-dd -a end_date=YYY-mm-dd

start_date and end_date are optional, with default values 2011-01-01 and current date respectively.

Deploying to Scrapinghub

When deploying to Scrapinghub, make sure you use the scrapy stack, as explained here in order to avoid SSL errors.

License

Distributed under the MIT License. See LICENSE file for further details.

About

BORA (Boletín Oficial de la Republica Argentina) Crawler implemented using Scrapy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages