Skip to content

Part of the DataCite Analyitics Service for recording of DOI usage

Notifications You must be signed in to change notification settings

datacite/keeshond

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keeshond - DataCite Usage Analytics

This is part of the DataCite Usage Analytics service.

Event Tracking

This has a public API that is the main endpoint for the DataCite Tracker Events are stored within a Clickhouse database and then statistics according to COUNTER can be calculated.

Setup

  1. Install the tracking script DataCite Tracker
  2. Configure using appropriate details
  3. Results should be sent to the /api/metric end point
  4. You can use the check api endpoint /api/check/{repo_id} to see if results are being recorded, it returns 200 and the timestamp of the last event if successful.

COUNTER Usage Report Generation

Based on the data stored in Clickhouse and statistics that can be generated, usage reports in the format of SUSHI Json can be generated. This can then be sent through to the DataCite Reports API for storage and processing into DataCite Event Data

Development

Requirements:

  • Go 1.19

General config

Configuration is taken from the environment

  • ANALYTICS_DATABASE_HOST - Clickhouse database URL
  • ANALYTICS_DATABASE_USER - Clickhouse user
  • ANALYTICS_DATABASE_PASSWORD - Clickhouse password
  • ANALYTICS_DATABASE_DBNAME - Clickhouse database name

Event Tracking Web Server

Web tracking Config

  • VALIDATE_DOI_EXISTENCE - Can enable/disable DOI existence validation for event tracking - default to true.
  • VALIDATE_DOI_URL - Can enable/disable DOI URL validation for event tracking - default to false.
  • DATACITE_API_URL - This is used only when storing events as part of DOI validation
  • JWT_PUBLIC_KEY - This is used on authenticated endpoints to validate valid DataCite JWTs

Running locally

# Start the http server
go run cmd/web/main.go

Docker

# Build the Docker image
$ docker build -f ./docker/web/Dockerfile -t keeshondweb .
# and you can run the image with the following command
$ docker run -p 8081:8081 --rm -ti keeshondweb

Usage Report Generation - Worker

This is triggered via a worker script, note that this will automatically submit the usage report to the Usage Reports API.

Report specific config

The variables needed for the report generation are taken from Environment variables

  • REPO_ID - The unique tracking id for a repository, this is used for which stats to collect. This is assigned by DataCite.
  • BEGIN_DATE - The reporting period start date, typically this will be the start of a month.
  • END_DATE - The reporting perioid end date, typically this will be the end of a month.
  • PLATFORM - The name or identifier of the platform that the usage is from.
  • PUBLISHER - The name of publisher of the dataset
  • PUBLISHER_ID - The identifier of publisher of the dataset

In addition a valid DataCite JWT will need to be supplied for authentication and submission to the Usage Reports API.

  • DATACITE_JWT - Valid JWT with correct permissions. This is assigned by DataCite.

Running Locally

A report can be triggered using the worker version of the application.

e.g. Note: Assumes general config has been setup i.e. clickhouse database connection

REPO_ID=datacite.demo BEGIN_DATE=2022-01-01 END_DATE=2022-12-31 PLATFORM=datacite PUBLISHER="datacite demo" PUBLISHER_ID=datacite.demo go run cmd/worker/main.go

Running via docker container

Note: Assumes general config has been setup i.e. clickhouse database connection
# Build worker image
docker build -f ./docker/worker/Dockerfile -t keeshondworker .

# Run docker with env vars
docker run --network="host" --env REPO_ID=datacite.demo --env BEGIN_DATE=2022-01-01 --env END_DATE=2022-12-31 --env PLATFORM=datacite --env PUBLISHER="datacite demo" --env PUBLISHER_ID=datacite.demo keeshondworker
# Connect to the local docker Clickhouse database container
clickhouse client --user=keeshond --password=keeshond