This is a framework to help you write an Agent within the Event Data system. You're free to write any agent any way you want, but this framework takes advantage of the Crossref Event Data services, such as Artifacts and the Percolator system. An Agent written with this framework doesn't actually do much heavy-lifting at all: it simply collects data from somewhere, packages it up into an Input Bundle, and passes it to the Percolator. The Percolator does the rest: extracting DOIs from various inputs, fetching webpages, verifying DOIs, submitting Evidence Records and Agents, etc.
If you want to write an Agent, you should familiarise yourself with the Percolator's Input Bundle format.
This Framework allows an Agent to describe itself in terms of 'schedule' and 'runner' activities. Schedule activities are run on a schedule, runners are run on agent start-up, and restarted if they stop. The Agent supplies a list of required Artifacts, and these are retrieved by the Framework and passed to the schedule or runner function.
This allows the implementation of several patterns:
Reddit Agent: Sheduled scan, getting a copy of the domain list Artifact each time, then iterating over every domain, making a query for each.
Newsfeed Agent: Scheduled scan, getting a copy of the RSS Feed Artifact each time, iterating over every newsfeed.
Twitter: Connect to the Twitter streaming API, transform each input into an Action in an input bundle, send that on.
Wikipedia: Connect to the Wikipedia Event Stream, transform ecah input into an Action in an input bundle, send that on.
Create a definition
hashmap and pass it to the org.crossref.event-data-agent-framework.core/run
function. Definition should be an object with the following fields:
:agent-name
:schedule
: A seq of{:fun :seconds :name :required-artifacts}
hashmaps. The function will be run on the schedule and reported via status logging. The function takes a channel onto which complete Input Packages should be sent.:runners
: A seq of{:fun :name :required-artifacts}
hashmaps. Each function will be run in a thread and the Health Service will be notified of a heartbeat as long as it keeps running. The function takes a channel onto which complete Input Packages should be sent.
The Agent automatically reports to the Health Service. Given the configuraion of «agent-name», it logs:
«agent-name»/service/running
- logs all the time that the service is running, once per minute.«agent-name»/artifact/«artifact-name»
- logs 1 every time an artifact of the given name is fetched.«agent-name»/«scheduled-f-name»/start
- logs 1 every time a scheduled function is run.«agent-name»/«scheduled-f-name»/stop
- logs 1 every time a scheduled function stops running.«agent-name»/«runner-f-name»/heartbeat
- logs once a minute for the duration that a run function is run until it dies (which it never should).
The following environment variables should be set:
KAFKA_BOOTSTRAP_SERVERS
- list of Apache Kafka endpoints, per Kafka Producer docs. e.g.localhost:9092,otherhost:9092
JWT_TOKEN
- JWT authorized to deposit events with the source name that this agent produces.
An Agent should start itself by calling org.crossref.event-data-agent-framework.core/run
with its agent definition.
## External Services
An Agent sends Evidence Records and status updates to an Apache Kafka instance. A Kafka endpoint is required.
To use use a local repository when developing new functionality against agents:
lein clean && lein test && lein uberjar && rm -rf ~/.m2/repository/org.crossref & lein localrepo install target/uberjar/org.crossref.event-data-agent-framework-0.1.18-SNAPSHOT-standalone.jar org.crossref.event-data-agent-framework "0.1.18-SNAPSHOT2"