GitHub - TBFY/harvester: Download articles and legal documents on public procurement.

Basic Overview

Download articles and legal documents from public procurement sources:

Tender and contract data of European government bodies from OpenOpps via API or Amazon-S3 bucket (credentials are required)
Legislative texts via JRC-Acquis dataset.
Public procurement notices via TED dataset.

And index them into SOLR to perform complex queries and visualize results through Banana.

Quick Start

Install Docker and Docker-Compose

Clone this repo

git clone https://github.com/TBFY/harvester.git

Move into src/test/docker directory.
Run Solr and Banana by: docker-compose up -d
You should be able to monitor the progress by: docker-compose logs -f
A Solr Admin site should be available at: http://localhost:8983/solr
Rename the configuration file: src/test/resources/credentials.properties.sample to src/test/resources/credentials.properties (if you have credentials, update its content)
Download and extract TED articles from ftp://guest:guest@ted.europa.eu/daily-packages/ and save them at: input/ted
Move into base directory and run our harvester by: ./test TEDHarvester
A dashboard with results should be available at: http://localhost:8983/solr/banana

Take a look at all our harvesters here: src/test/java/harvest/.

Lastest Stable Release

Step 1. Add the JitPack repository to your build file

        <repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Step 2. Add the dependency

        <dependency>
	    <groupId>com.github.TBFY</groupId>
	    <artifactId>harvester</artifactId>
	    <version>last-stable-release-version</version>
	</dependency>

Contributing

Please take a look at our contributing guidelines if you're interested in helping!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
input		input
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
jitpack.yml		jitpack.yml
launch		launch
logo.png		logo.png
pom.xml		pom.xml
test		test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input

input

src

src

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

jitpack.yml

jitpack.yml

launch

launch

logo.png

logo.png

pom.xml

pom.xml

test

test

Repository files navigation

Basic Overview

Quick Start

Lastest Stable Release

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

TBFY/harvester

Folders and files

Latest commit

History

Repository files navigation

Basic Overview

Quick Start

Lastest Stable Release

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages