Skip to content

RemoteControl/Hypertext-Corpus-Initiative

 
 

Repository files navigation

Hypertext Corpus Initiative

Welcome to the Hypertext Corpus Initiative (HCI) project.

This project consist in the following components:

  • HCI core
  • HCI crawler

HCI core

TBD

HCI crawler

The HCI crawler implemented as a Scrapy project. For more information see: http://jiminy.medialab.sciences-po.fr/hci/index.php/Scrapy_implementation_proposal

Code is in hcicrawler/ directory.

Requirements

Requirements:

  • Scrapy >= 0.14
  • pymongo >= 2.0