Skip to content


Switch branches/tags


Failed to load latest commit information.
Latest commit message
Commit time



This repository corresponds to the DataSeer web application, which aims at driving the authors of scientific article/manuscripts to the best research data sharing practices, i.e. to ensure that the datasets coming with an article are associated with data availability statement, permanent identifiers and in general requirements regarding Open Science and reproducibility.

Machine learning techniques are used to extract and structure the information of the scientific article, to identify contexts introducting datasets and finally to classify these context into predicted data types and subtypes. These ML predictions are used by the web application to help the authors to described in an efficient and assisted manner the datasets used in the article and how these data are shared with the scientific community.

See the dataseer-ml repository for the machine learning services used by DataSeer web.

Supported article formats are PDF, docx, TEI, JATS/NLM, ScholarOne, and a large variety of additional publisher native XML formats: BMJ, Elsevier staging format, OUP, PNAS, RSC, Sage, Wiley, etc (see Pub2TEI for the list of native publisher XML format covered).

Contacts and licences

Main authors and contact: Nicolas Kieffer, Patrice Lopez (

The development of dataseer-ml is supported by a Sloan Foundation grant, see here.

dataseer-Web is distributed under Apache2 license.


This appliaction is composed of :

  • a REST API to interact with your data stored in MongoDB (localhost:3000/api)
  • a default Front-End app requesting the REST API

Documents, Organizations and Accounts data are stored in MongoDB. Files (PDF, XML and TEI) uploaded on dataseer-web are stored in the server FileSystem



Table of contents

npm i
// NodeJS V16.0


Table of contents

npm run // Display list of available options
npm start // Start headless process with forever (production)
npm start-dev // Start process (development)
npm stop // Stop headless process


Table of contents

Application requires:

  • an instance of mongoDB (by default: running on port 27017 with an app database)


Web Application Configuration

Table of contents

You must create some configurations files (based on *.default files) and fill them with your data :

  • conf/conf.json : global app configuration
  • conf/crisp.json : crisp configuration
  • conf/recaptcha.json : recaptcha configuration
  • conf/smtp.json : smtp configuration
  • conf/userflow.json : userflow configuration
  • conf/services/dataseer-ml.json : dataseer-ml configuration
  • conf/services/dataseer-wiki.json : dataseer-wiki configuration
  • conf/services/repoRecommender.json : repoRecommender configuration
  • conf/services/softcite.json : softcite configuration

JWT Configuration

Table of contents

This application require a private key to create JSON Web Token You must create file conf/private.key and fill it with a random string (a long random string is strongly recommended)


Table of contents

All the files concerning the mails are in the conf/mails directory.

Data Access

Table of contents

Your role defines which data you can access.

  • An unauthenticated user can only access a public URL.
  • A Standard User can only access his own data: document(s), organization(s) & account.
  • An Annotator (also called Moderator in source code) can access all the data of his organizations: document(s), organization(s) and account(s).
  • A Curator (also called Administrator in source code) can access all the data of all organizations: document(s), organization(s) and account(s).