Recommender Metrics Framework

A framework for generating statistics, metrics, KPIs, and graphs for Recommender Systems

Preprocessor

RS metrics

Dependencies

Install Conda from here. Tested on conda v 4.10.3.
Run from terminal: conda env create -f environment.yml
Run from terminal: conda activate rsmetrics
Run from terminal: chmod +x ./preprocessor.py ./preprocessor_common.py ./rsmetrics.py

Usage

Usage of the Batch System

Configure ./preprocessor_common.py, ./preprocessor.py and ./rsmetrics.py by editting the config.yaml or providing another with -c.
Run from terminal: ./preprocessor_common.py in order to gather users and resources and store them in the Datastore:

./preprocessor_common.py # this will ingest users and resources [from scratch] by retrieving the data from 'marketplace_rs' provider (which is specified in the config file
./preprocessor_common.py -p marketplace_rs # equivalent to first one
./preprocessor_common.py -p marketplace_rs --use-cache # equivalent to first one but use the cache file to read resources instead of downloading them via the EOSC Marketplace
./preprocessor_common.py -p athena # currently is not working since users collection only exist in 'marketplace_rs'

Run from terminal: ./preprocessor.py -p <provider> in order to gather user_actions and recommendations from the particular provider and store them in the Datastore:

./preprocessor.py # this will ingest user_actions and recommendations [from scratch] by retrieving the data from 'marketplace_rs' provider (which is specified in the config file
./preprocessor.py -p marketplace_rs # equivalent to first one
./preprocessor.py -p athena # same procedure as the first one but for 'athena' provider

Run from terminal: ./rsmetrics.py -p <provider> in order to gather the respective data (users, resources, user_actions and recommendations), calculate statistics and metrics and store them in the Datastore, concerning that particular provider:

./rsmetrics.py # this will calculate and store statistics and metrics concerning data (users, resources, user_actions and recommendations) concerning the specified provider (which by default is 'marketplace_rs')
./rsmetrics.py -p marketplace_rs # equivalent to first one
./rsmetrics.py -p athena # same procedure as the first one for 'athena' provider

A typical rsmetrics.py command for a monthly report, would be:

./rsmetrics.py -p provider -s $(date +"%Y-%m-01") -e $(date +"%Y-%m-%d") -t "$(date +"%B %Y")"

Usage of the Streaming System

Run from terminal ./rs-stream.py in order to listen to the stream for new data, process them, and store them in the Datastore, concerning that particular provider:

./rs-stream.py -a username:password -q host:port -t user_actions -d ""mongodb://localhost:27017/datastore"" -p provider_name

Reporting

The reporting script generates an evalutation report in html format automatically served from a spawed localserver default: localhost:8080 and automatically opens the default browser to present the report.

To execute the script issue:

chmod u+X ./report.py
report.py

The script will automatically look for evaulation result files in the default folder ./data and will output the report in the default folder: ./report

Additional script usage with parameters

The report.py script can be used with the --input parameter: a path to a folder that the results from the evaluation process have been generated (default folder:./data). The report script can also take an --output parameter: a path to an output folder where the generated report will be served automatically.

Note: the script copies to the output folder all the necessary files such as pre_metrics.json, metrics.json as well as report.html.prototype renamed to index.html

usage: report.py [-h] [-i STRING] [-o STRING] [-a STRING] [-p STRING]

Generate report

optional arguments:
  -h, --help            show this help message and exit
  -i STRING, --input STRING
                        Input folder
  -o STRING, --output STRING
                        Output report folder
  -a STRING, --address STRING
                        Address to bind and serve the report
  -p STRING, --port STRING
                        Port to bind and serve the report

Utilities

Get item catalog script (./get_catalog.py)

This script contacts EOSC Marketplace remote service api and generates a csv with a list of all available items of a specific catalog (e.g. services, datasets, trainings, publications, data_sources, ), their name, id and url

To execute the script issue:

chmod u+x ./get_catalog.py
./get_catalog.py -u https://remote.example.foo -c service -b 100 -l 2000 -o `my-catalog.csv`

Arguments:

-u or -url: the endpoint url of the marketplace search service
-o or --output: this is the output csv file (e.g. ./service_catalog.csv or ./training_catalog.csv) - optional
-b or --batch: because search service returns results with pagination this configures the batch for each retrieval (number of items per request) - optional
-l or --limit: (optional) the user can specify a limit of max items to be retrieves (this is handy for large catalogs if you want to receive a subset) - optional
-c or --category: the category of list of items you want to retrieve
-d or --datastore: mongodb destination database uri to store the results into (e.g. mongodb://localhost:27017/rsmetrics) - optional
-p or --providers: state in a comma-separated list wich providers (engines) handle the items of the specific category currently supported category types for marketplace:
service
training
dataset (this is for items of the DATA catalog)
data_source (this is for items of the DATASOURCES catalog)
publication
guideline (this is for items of the INTEROPERABILITY GUIDELINES catalog)
software
bundle
other

Serve Evaluation Reports as a Service

The webservice folder hosts a simple webservice implemented in Flask framework which can be used to host the report results.

Note: Please make sure you work in a virtual environment and you have already downloaded the required dependencies by issuing pip install -r requirements.txt

The webservice application serves two endpoints

/ : This is the frontend webpage that displays the Report Results in a UI
/api : This api call returns the evaluation metrics in json format

To run the webservice issue:

cd ./webservice
flask run

The webservice by default runs in localhost:5000 you can override this by issuing for example:

flask run -h 127.0.0.1 -p 8080

There is an env variable RS_EVAL_METRIC_SOURCE which directs the webservice to the generated metrics.json file produced after the evaluation process. This by default honors this repo's folder structure and directs to the root /data/metrics.json path

You can override this by editing the .env file inside the /webservice folder, or specificy the RS_EVAL_METRIC_SOURCE variable accordingly before executing the flask run command

Tested with python 3.9

Monitor for entries in the MongoDB collections

A typical example that counts the documents found in user_actions, recommendations, and resources for 1 day ago would be:

./monitor.py -d "mongodb://localhost:27017/rsmetrics" -s "$(date -u -d '1 day ago' '+%Y-%m-%d')" -e "$(date -u '+%Y-%m-%d')"

E-mail send over SMTP for the above example:

./monitor.py -d "mongodb://localhost:27017/rsmetrics" -s "$(date -u -d '1 day ago' '+%Y-%m-%d')" -e "$(date -u '+%Y-%m-%d')" --email "smtp://server:port" sender@domain recipient1@domain recipient2@domain

Export Capacity information for entries in the MongoDB collections

A typical example that counts the documents found in user_actions, recommendations, and resources for 1 year ago would be:

./monitor.py -d "mongodb://localhost:27017/rsmetrics" -s "$(date -u -d '1 day ago' '+%Y-%m-%d')" -e "$(date -u '+%Y-%m-%d')" --capacity

which will return results in CSV format of year,month,user_actions,recommendations

Additionally, capacity can be plotted:

./monitor.py -d "mongodb://localhost:27017/rsmetrics" -s "$(date -u -d '1 day ago' '+%Y-%m-%d')" -e "$(date -u '+%Y-%m-%d')" --capacity --plot

Deployment docs

Installation and configuration documents can be found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommender Metrics Framework

Preprocessor

RS metrics

Dependencies

Usage

Usage of the Batch System

Usage of the Streaming System

Reporting

Additional script usage with parameters

Utilities

Get item catalog script (./get_catalog.py)

Serve Evaluation Reports as a Service

Monitor for entries in the MongoDB collections

Export Capacity information for entries in the MongoDB collections

Deployment docs

About

Releases 8

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 418 Commits
docs		docs
metric_descriptions		metric_descriptions
webservice		webservice
website		website
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
environment.yml		environment.yml
get_catalog.py		get_catalog.py
metrics.py		metrics.py
monitor.py		monitor.py
preprocessor.py		preprocessor.py
preprocessor_common.py		preprocessor_common.py
report.html.prototype		report.html.prototype
report.py		report.py
requirements.txt		requirements.txt
retrieval.py		retrieval.py
rs-stream.py		rs-stream.py
rsmetrics.py		rsmetrics.py

License

ARGOeu/eosc-recommender-metrics

Folders and files

Latest commit

History

Repository files navigation

Recommender Metrics Framework

Preprocessor

RS metrics

Dependencies

Usage

Usage of the Batch System

Usage of the Streaming System

Reporting

Additional script usage with parameters

Utilities

Get item catalog script (./get_catalog.py)

Serve Evaluation Reports as a Service

Monitor for entries in the MongoDB collections

Export Capacity information for entries in the MongoDB collections

Deployment docs

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 6

Languages

Packages