Biodata Generator for C++

Requires C++23 (e.g., -std=c++23 for GCC/Clang, /std:c++latest for MSVC).

API Reference · Usage Guide · Releases

Features

Demographically Plausible Characteristics. Generates random human physical traits using a multi-stage pipeline: height from country/sex-specific Gaussian distributions, BMI from log-normal distributions, and categorical sampling for phenotypic traits.
Seven Physical Traits. Every generated profile includes: height (cm), weight (kg), BMI, eye colour, hair colour, Fitzpatrick skin type, ABO/Rh blood type, and handedness.
Country-Specific Distributions. Height means, BMI values, eye/hair/skin distributions, blood type frequencies, and left-handedness rates all vary by country, reflecting real-world population statistics.
Data-Driven. Height and BMI from NCD-RisC via Our World in Data, blood types from published population studies, phenotypic traits from Katsara & Nothnagel 2019, VISAGE consortium, and Papadatou-Pastou 2020.
Deterministic Seeding. Per-call get_biodata(seed) for reproducible results, generator-level seed() / unseed() for deterministic sequences, and biodata::seed() for replaying a previous generation.
Multi-Instance Support. Construct independent bdg instances with their own data and random engine.
Typed Enumerations. eye_color, hair_color, skin_type, blood_type, and handedness enums with string conversion helpers.

Integration

biodatagen.hpp is the single required file released here. You also need random.hpp in the same directory. Add

#include <dasmig/biodatagen.hpp>

// For convenience.
using bdg = dasmig::bdg;

to the files you want to generate biodata and set the necessary switches to enable C++23 (e.g., -std=c++23 for GCC and Clang).

Additionally you must supply the biodata generator with the resources folder containing full/ and/or lite/ subdirectories with the TSV data file, also available in the release.

Usage

#include <dasmig/biodatagen.hpp>
#include <iostream>

// For convenience.
using bdg = dasmig::bdg;

// Manually load a specific dataset tier if necessary.
bdg::instance().load(dasmig::dataset::lite);  // ~111 countries (best coverage)
// OR
bdg::instance().load(dasmig::dataset::full);  // ~197 countries (gap-filled)

// Generate random biodata (uniform country selection).
auto b = bdg::instance().get_biodata();
std::cout << b << '\n';  // implicit string conversion

// Generate biodata for a specific country.
auto us = bdg::instance().get_biodata("US");
std::cout << "Height: " << us.height_cm << " cm\n";
std::cout << "Weight: " << us.weight_kg << " kg\n";
std::cout << "BMI:    " << us.bmi << "\n";

// Request a specific sex.
auto m = bdg::instance().get_biodata("BR", dasmig::sex::male);

// Access typed enum fields.
std::cout << "Eyes:  " << dasmig::biodata::eye_color_str(b.eyes) << '\n';
std::cout << "Hair:  " << dasmig::biodata::hair_color_str(b.hair) << '\n';
std::cout << "Skin:  " << dasmig::biodata::skin_type_str(b.skin) << '\n';
std::cout << "Blood: " << dasmig::biodata::blood_type_str(b.blood) << '\n';
std::cout << "Hand:  " << dasmig::biodata::handedness_str(b.hand) << '\n';

// Deterministic generation — same seed always produces the same result.
auto seeded = bdg::instance().get_biodata("US", std::uint64_t{42});

// Replay a previous generation using its seed.
auto replay = bdg::instance().get_biodata("US", seeded.seed());

// Seed the engine for a deterministic sequence.
bdg::instance().seed(100);
// ... generate biodata ...
bdg::instance().unseed();  // restore non-deterministic state

// Independent instance — separate data and random engine.
bdg my_gen;
my_gen.load("path/to/resources/lite");
auto c = my_gen.get_biodata("JP");

For the complete feature guide — fields, seeding, enums, and more — see the Usage Guide.

Generation Pipeline

Each call to get_biodata() runs this pipeline:

Sex — 50/50 or forced via sex parameter.
Height — Gaussian distribution using country/sex-specific mean and standard deviation from NCD-RisC anthropometric data.
BMI — Log-normal distribution from country/sex-specific mean, modelling the natural right-skew of BMI.
Weight — Derived: BMI × height_m².
Eye Colour — Categorical sampling from country-specific blue/intermediate/brown distribution.
Hair Colour — Categorical sampling from country-specific black/brown/blond/red distribution.
Skin Type — Categorical sampling from Fitzpatrick I–VI distribution.
Blood Type — Categorical sampling from ABO/Rh frequencies (O+, A+, B+, AB+, O−, A−, B−, AB−).
Handedness — Bernoulli sampling from country-specific left-handedness rate.

Data Sources

Trait	Source	Coverage
Height (mean, SD)	NCD-RisC via OWID	202 countries
BMI (mean)	WHO GHO via OWID	197 countries
Blood type	Published population studies (Wikipedia compilation)	124 countries
Eye colour	Katsara & Nothnagel 2019 + regional estimates	70 countries
Hair colour	VISAGE consortium + regional estimates	62 countries
Skin tone	WHO UV guidance + ethnic composition estimates	81 countries
Handedness	Papadatou-Pastou et al. 2020 meta-analysis	75 countries

Dataset Tiers

Tier	Countries	Description
`lite`	~111	Countries with specific data for at least one phenotypic trait
`full`	~197	All countries with height data; phenotypic gaps filled with regional defaults

Building

# Example
make

# Tests
make test

# Code coverage
make coverage

# API docs
make docs

Compiler Support

Tested with:

Clang 18+ (-std=c++23)
GCC 14+ (-std=c++23)
MSVC 19.38+ (/std:c++latest)

Dependencies

Dependency	Version	Bundled	Purpose
effolkronium/random	1.4.1	Yes (`random.hpp`)	Thread-safe RNG wrapper
Catch2	3.x	Yes (amalgamated)	Unit testing

Related Libraries

Library	Description
name-generator	Culturally appropriate full names
nickname-generator	Gamer-style nicknames
birth-generator	Demographically plausible birthdays
city-generator	Weighted city selection by population
country-generator	Weighted country selection by population
entity-generator	ECS-based entity generation

License

This library is released under the MIT License.

MIT License

Copyright (c) 2020-2026 Diego Dasso Migotto

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
dasmig		dasmig
doc		doc
examples		examples
resources		resources
tests		tests
.clang-tidy		.clang-tidy
.gitignore		.gitignore
Doxyfile		Doxyfile
LICENSE.MIT		LICENSE.MIT
LICENSE_DATA.txt		LICENSE_DATA.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biodata Generator for C++

Features

Integration

Usage

Generation Pipeline

Data Sources

Dataset Tiers

Building

Compiler Support

Dependencies

Related Libraries

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Biodata Generator for C++

Features

Integration

Usage

Generation Pipeline

Data Sources

Dataset Tiers

Building

Compiler Support

Dependencies

Related Libraries

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages