Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

CUTLER Data Crawlers

This is the official repository of data crawlers and parsers developed for CUTLER project. In this repo you will find the crawlers and their technical documentation. Please, refer also to the User Manual for Data Crawling Software.

A fairly detailed description of data sources and crawlers is available in deliverables D3.2 and D3.3 accessible via the Deliverabels page of the project website.

Project Structure

Crawlers

The crawlers are grouped in different folders according to the type of data crawled:

  • Economic contains crawlers and other software related to economic data as well as instructions to run those
  • Environmental contains crawlers and other software related to environmental data as well as instructions to run those
  • Social contains crawlers and other software related to social data as well as instructions to run those

Crawlers have been implemented using different programming languages (R, python, javascript, java). Crawlers are used to inject data either to a Hadoop Distributed File System (HDFS) or ElasticSearch. However, most of the crawlers can also be used as stand-alone. You can find more specific documentation under the different folders listed above.

Deployment in Hadoop:

General information on the deployment in Hadoop can be found in the following folder

  • HadoopDeployment: scripts, configuration files and instructions related to data injestion into/from Hadoop HDFS