Skip to content
No description, website, or topics provided.
Python JavaScript Java Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Economic
Environmental
HadoopDeployment
Social
README.md

README.md

CUTLER Data Crawlers

This is the official repository of data crawlers and parsers developed for CUTLER project.

Project Structure

The crawlers are grouped according to the type of data crawled:

  • Economic contains crawlers and other software related to economic data as well as instructions to run those
  • Environmental contains crawlers and other software related to environmental data as well as instructions to run those
  • Social contains crawlers and other software related to social data as well as instructions to run those

Crawlers have been implemented using different programming languages (R, python, javascript, java). Crawlers are used to inject data either to a Hadoop Distributed File System (HDFS) or ElasticSearch. However, most of the crawlers can also be used as stand-alone. You can find more specific documentation under the different folders.

General information on the deployment in Hadoop:

  • HadoopDeployment: scripts, configuration files and instructions related to data injestion into/from Hadoop HDFS
You can’t perform that action at this time.