Skip to content

TBFY/harvester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

                                        Java Maven Build Status Release Status GitHub Issues License

Basic Overview

Download articles and legal documents from public procurement sources:

  • Tender and contract data of European government bodies from OpenOpps via API or Amazon-S3 bucket (credentials are required)
  • Legislative texts via JRC-Acquis dataset.
  • Public procurement notices via TED dataset.

And index them into SOLR to perform complex queries and visualize results through Banana.

Quick Start

  1. Install Docker and Docker-Compose

  2. Clone this repo

    git clone https://github.com/TBFY/harvester.git
    
  3. Move into src/test/docker directory.

  4. Run Solr and Banana by: docker-compose up -d

  5. You should be able to monitor the progress by: docker-compose logs -f

  6. A Solr Admin site should be available at: http://localhost:8983/solr

  7. Rename the configuration file: src/test/resources/credentials.properties.sample to src/test/resources/credentials.properties (if you have credentials, update its content)

  8. Download and extract TED articles from ftp://guest:guest@ted.europa.eu/daily-packages/ and save them at: input/ted

  9. Move into base directory and run our harvester by: ./test TEDHarvester

  10. A dashboard with results should be available at: http://localhost:8983/solr/banana

Take a look at all our harvesters here: src/test/java/harvest/.

Lastest Stable Release

Step 1. Add the JitPack repository to your build file

        <repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Step 2. Add the dependency

        <dependency>
	    <groupId>com.github.TBFY</groupId>
	    <artifactId>harvester</artifactId>
	    <version>last-stable-release-version</version>
	</dependency>

Contributing

Please take a look at our contributing guidelines if you're interested in helping!