Skip to content

bento-platform/bento_demo_dataset

Repository files navigation

Bento Demo Dataset

Partially synthetic demo dataset for the Bento platform. Requires Python 3.10+

Based partly on data from:

Requirements:

Optionally create a virtual environment, e.g.:

virtualenv -p python3 ./env
source env/bin/activate

To install dependencies run:

pip install -r requirements.txt

Usage:

To run:

python generate_dataset.py

This will write phenopackets to synthetic_phenopackets.json and experiments to synthetic_experiments.json.

Other useful files are available in the /dataset_files directory:

  • config.json: a Katsu config file matching the dataset
  • dats.json: an example DATS file
  • extra_properties_typing.json: to configure typed extra properties
  • mock experiment files in .csv, .jpg, .md, .mp4, .pdf, and .xlsx format

Optional Configuration:

The dataset is a mix of fixed and randomly generated values, random values will be the same across different runs of generate_dataset.py. To change the output, modify any of the values in config/constants.py.

The dataset is generated based on the input file config/individuals.json. You can add (or remove) individuals for different output. Individuals with "id" and "sex" fields only will get fully synthetic metadata, while any values in the "biosamples", "experiments" or "diseases" fields will be copied over unmodified. This allows, for example, generating appropriate metadata for real data files (which may involve, e.g., a particular disease).

Optional Data Files:

The dataset is meant for use with genomic data from the 1000 Genomes Project, and transcriptomics data from the International Human Epigenome Consortium. See here for more details on data files.

About

Partially synthetic demo dataset for Bento.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published