Skip to content

dasmig/biodata-generator

Repository files navigation

Biodata Generator for C++

Biodata Generator for C++

Requires C++23 (e.g., -std=c++23 for GCC/Clang, /std:c++latest for MSVC).

GitHub license CI GitHub Releases GitHub Issues C++23 Header-only Platform Documentation

API Reference · Usage Guide · Releases

Features

  • Demographically Plausible Characteristics. Generates random human physical traits using a multi-stage pipeline: height from country/sex-specific Gaussian distributions, BMI from log-normal distributions, and categorical sampling for phenotypic traits.

  • Seven Physical Traits. Every generated profile includes: height (cm), weight (kg), BMI, eye colour, hair colour, Fitzpatrick skin type, ABO/Rh blood type, and handedness.

  • Country-Specific Distributions. Height means, BMI values, eye/hair/skin distributions, blood type frequencies, and left-handedness rates all vary by country, reflecting real-world population statistics.

  • Data-Driven. Height and BMI from NCD-RisC via Our World in Data, blood types from published population studies, phenotypic traits from Katsara & Nothnagel 2019, VISAGE consortium, and Papadatou-Pastou 2020.

  • Deterministic Seeding. Per-call get_biodata(seed) for reproducible results, generator-level seed() / unseed() for deterministic sequences, and biodata::seed() for replaying a previous generation.

  • Multi-Instance Support. Construct independent bdg instances with their own data and random engine.

  • Typed Enumerations. eye_color, hair_color, skin_type, blood_type, and handedness enums with string conversion helpers.

Integration

biodatagen.hpp is the single required file released here. You also need random.hpp in the same directory. Add

#include <dasmig/biodatagen.hpp>

// For convenience.
using bdg = dasmig::bdg;

to the files you want to generate biodata and set the necessary switches to enable C++23 (e.g., -std=c++23 for GCC and Clang).

Additionally you must supply the biodata generator with the resources folder containing full/ and/or lite/ subdirectories with the TSV data file, also available in the release.

Usage

#include <dasmig/biodatagen.hpp>
#include <iostream>

// For convenience.
using bdg = dasmig::bdg;

// Manually load a specific dataset tier if necessary.
bdg::instance().load(dasmig::dataset::lite);  // ~111 countries (best coverage)
// OR
bdg::instance().load(dasmig::dataset::full);  // ~197 countries (gap-filled)

// Generate random biodata (uniform country selection).
auto b = bdg::instance().get_biodata();
std::cout << b << '\n';  // implicit string conversion

// Generate biodata for a specific country.
auto us = bdg::instance().get_biodata("US");
std::cout << "Height: " << us.height_cm << " cm\n";
std::cout << "Weight: " << us.weight_kg << " kg\n";
std::cout << "BMI:    " << us.bmi << "\n";

// Request a specific sex.
auto m = bdg::instance().get_biodata("BR", dasmig::sex::male);

// Access typed enum fields.
std::cout << "Eyes:  " << dasmig::biodata::eye_color_str(b.eyes) << '\n';
std::cout << "Hair:  " << dasmig::biodata::hair_color_str(b.hair) << '\n';
std::cout << "Skin:  " << dasmig::biodata::skin_type_str(b.skin) << '\n';
std::cout << "Blood: " << dasmig::biodata::blood_type_str(b.blood) << '\n';
std::cout << "Hand:  " << dasmig::biodata::handedness_str(b.hand) << '\n';

// Deterministic generation — same seed always produces the same result.
auto seeded = bdg::instance().get_biodata("US", std::uint64_t{42});

// Replay a previous generation using its seed.
auto replay = bdg::instance().get_biodata("US", seeded.seed());

// Seed the engine for a deterministic sequence.
bdg::instance().seed(100);
// ... generate biodata ...
bdg::instance().unseed();  // restore non-deterministic state

// Independent instance — separate data and random engine.
bdg my_gen;
my_gen.load("path/to/resources/lite");
auto c = my_gen.get_biodata("JP");

For the complete feature guide — fields, seeding, enums, and more — see the Usage Guide.

Generation Pipeline

Each call to get_biodata() runs this pipeline:

  1. Sex — 50/50 or forced via sex parameter.
  2. Height — Gaussian distribution using country/sex-specific mean and standard deviation from NCD-RisC anthropometric data.
  3. BMI — Log-normal distribution from country/sex-specific mean, modelling the natural right-skew of BMI.
  4. Weight — Derived: BMI × height_m².
  5. Eye Colour — Categorical sampling from country-specific blue/intermediate/brown distribution.
  6. Hair Colour — Categorical sampling from country-specific black/brown/blond/red distribution.
  7. Skin Type — Categorical sampling from Fitzpatrick I–VI distribution.
  8. Blood Type — Categorical sampling from ABO/Rh frequencies (O+, A+, B+, AB+, O−, A−, B−, AB−).
  9. Handedness — Bernoulli sampling from country-specific left-handedness rate.

Data Sources

Trait Source Coverage
Height (mean, SD) NCD-RisC via OWID 202 countries
BMI (mean) WHO GHO via OWID 197 countries
Blood type Published population studies (Wikipedia compilation) 124 countries
Eye colour Katsara & Nothnagel 2019 + regional estimates 70 countries
Hair colour VISAGE consortium + regional estimates 62 countries
Skin tone WHO UV guidance + ethnic composition estimates 81 countries
Handedness Papadatou-Pastou et al. 2020 meta-analysis 75 countries

Dataset Tiers

Tier Countries Description
lite ~111 Countries with specific data for at least one phenotypic trait
full ~197 All countries with height data; phenotypic gaps filled with regional defaults

Building

# Example
make

# Tests
make test

# Code coverage
make coverage

# API docs
make docs

Compiler Support

Tested with:

  • Clang 18+ (-std=c++23)
  • GCC 14+ (-std=c++23)
  • MSVC 19.38+ (/std:c++latest)

Dependencies

Dependency Version Bundled Purpose
effolkronium/random 1.4.1 Yes (random.hpp) Thread-safe RNG wrapper
Catch2 3.x Yes (amalgamated) Unit testing

Related Libraries

Library Description
name-generator Culturally appropriate full names
nickname-generator Gamer-style nicknames
birth-generator Demographically plausible birthdays
city-generator Weighted city selection by population
country-generator Weighted country selection by population
entity-generator ECS-based entity generation

License

This library is released under the MIT License.

MIT License

Copyright (c) 2020-2026 Diego Dasso Migotto