# Profiling with Different Studies (DINED Profiler functionality)
In this notebook, we will try to find out where a certain person is positioned within a study. We will do this for two studies, so that we also demonstrate how to convert different data structures to the common DINED format.

<img src="assets/Profiler.png" alt="Exam Chair Dimensions" style="width: 640px;"/>

The dimensions are derived from two dataset of human body measurements:
* DINED delstu2016 - https://doi.org/10.4121/uuid:1411a8ac-5944-41d2-81c1-1edf32106d99
* DINED adults2004 - https://doi.org/10.4121/uuid:1b214a8f-f59c-460f-8eb9-3ef8db5e85ee

## Dataset
We first retrieve the dataset from the 4TU data repository:


In [1]:
!wget -nc -O data.zip https://data.4tu.nl/ndownloader/files/23994179
!unzip -o data.zip delstu2016.csv
!rm data.zip

!wget -nc -O data.zip https://data.4tu.nl/ndownloader/files/24015182
!unzip -o data.zip dined2004.csv
!rm data.zip


--2022-08-11 12:38:45--  https://data.4tu.nl/ndownloader/files/23994179
Resolving data.4tu.nl (data.4tu.nl)... 131.180.141.15
Connecting to data.4tu.nl (data.4tu.nl)|131.180.141.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘data.zip’

data.zip                [ <=>                ]  26,38K  --.-KB/s    in 0,02s   

2022-08-11 12:38:45 (1,44 MB/s) - ‘data.zip’ saved [27010]

Archive:  data.zip
  inflating: delstu2016.csv          
--2022-08-11 12:38:45--  https://data.4tu.nl/ndownloader/files/24015182
Resolving data.4tu.nl (data.4tu.nl)... 131.180.141.15
Connecting to data.4tu.nl (data.4tu.nl)|131.180.141.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘data.zip’

data.zip                [ <=>                ]  56,38K  --.-KB/s    in 0,05s   

2022-08-11 12:38:46 (1,22 MB/s) - ‘data.zip’ saved [57738]

Archive:  data.zip
  inflating: dined2004.csv

Now let's load the measurement descriptions from DINED:

In [48]:
import numpy as np
import pandas as pd
import json

def flatten(l):
    return [item for sublist in l for item in sublist]

class Measure():
    def __init__(self, id: int, name: str, description: str, unit: str):
        self.id = id
        self.name = name
        self.description = description
        self.unit = unit

    def __str__(self):
        return "Measure: " + str(self.id) + " " + str(self.name) + " (" + str(self.unit) + ")"


class MeasureGroup():
    def __init__(self, id: int, name: str, measures: list[Measure]):
        self.id = id
        self.name = name
        self.measures = measures

    def __str__(self):
        lines = ["Measure Group: " + str(self.id) + " " + str(self.name)]
        for m in self.measures:
            lines += [ "\t"+str(m)]
        return "\n\t".join(lines)

    def get_measure(self, id):
        return [ m for m in self.measures if m.id == id]

class MeasureCollection():
    def __init__(self, name: str, description: str, measure_groups: list[MeasureGroup]):
        self.name = name
        self.description = description
        self.measure_groups = measure_groups
    
    def __str__(self):
        lines = ["Measure Collection: " + str(self.name)]
        for m in self.measure_groups:
            lines += [  str(m)]
        return "\n\t".join(lines)

    def get_measure(self, id):
        ms = flatten( [ mg.get_measure(id) for mg in self.measure_groups ])
        if len(ms) == 0:
            raise IndexError("No measure with id", str(id))
        
        return ms[0]

    def get_measure_name(self, id):
        return self.get_measure(id).name

def json_decoder(d):
    id = int(d.get("id"))
    name = d.get("name_en")
    unit = d.get("dimension")
    description = d.get("description_en")
    labels = d.get("labels")

    if labels:
        sorted_labels = sorted(labels, key=lambda x: x.id)
        return MeasureGroup(id,name,sorted_labels)
    else:
        return Measure(id,name,description,unit)

with open("assets/measurements.json") as json_file:
    measure_collection = MeasureCollection("DINED Measures","Measures from the DINED website",None)
    groups = json.load(json_file, object_hook=json_decoder)
    measure_collection.measure_groups = sorted(groups, key=lambda x: x.id)


Let's pretty print the set of DINED measures:

In [13]:
print(measure_collection)
print(measure_collection.measure_groups[-1])

Measure Collection: DINED Measures
	Measure Group: 1 Standing, length/depth
		Measure: 1 Reach height, standing (mm)
		Measure: 2 Stature (mm)
		Measure: 3 Eye height, standing (mm)
		Measure: 4 Shoulder height (mm)
		Measure: 5 Elbow height, standing (mm)
		Measure: 6 Fist height, standing (mm)
		Measure: 8 Hip height (mm)
		Measure: 19 Reach depth (mm)
		Measure: 20 Arm length (mm)
		Measure: 27 Chest depth (mm)
		Measure: 79 Palm height, standing (mm)
		Measure: 81 Crotch height (mm)
	Measure Group: 2 Standing, width/circumference
		Measure: 21 Breadth over the elbows (mm)
		Measure: 22 Shoulder breadth (bi-deltoid) (mm)
		Measure: 24 Hip breadth (mm)
		Measure: 26 Knee breadth (mm)
		Measure: 37 Chest circumference (mm)
		Measure: 38 Waist circumference (mm)
		Measure: 39 Hip circumference (mm)
		Measure: 40 Upperarm circumference (mm)
		Measure: 82 Calf circumference (mm)
		Measure: 83 Thigh circumference (mm)
	Measure Group: 3 Sitting, length/width/depth
		Measure: 9 Thigh cleara

Now we can create the dataframes with common DINED naming.

In [14]:
df_adu = pd.read_csv("dined2004.csv")
df_stu = pd.read_csv("delstu2016.csv")

print(df_adu.columns)
print(df_stu.columns)

Index(['geslacht', 'knijpmax', 'trekken', 'duwen', 'torsie', 'gewicht',
       'lengte', 'vuisth', 'voetl', 'voetbr', 'bovarml', 'ellegrip',
       'elleving', 'schbr', 'ellebbr', 'heupbrzt', 'krzitvll', 'ooghzt',
       'ooghst', 'schhzt', 'schhst', 'ellebhst', 'ellebhzt', 'arml',
       'buikdpte', 'bilknieh', 'bilknie', 'dijbeenh', 'kniehh', 'handl',
       'handbr', 'duimbr', 'wijsvbr', 'oorl', 'grip', 'hakh', 'prohan',
       'suphan', 'exthan', 'flexhan', 'radhan', 'ulnhan', 'wijsfl', 'kinbor',
       'kinacht', 'hoschl', 'hoschr', 'hodrl', 'hodrr', 'leeftijd', 'opstapn'],
      dtype='object')
Index(['Age', 'Gender', 'kuitomtrek403', 'borstomtrek850', 'Chestdepth',
       'Crotchheight', 'Fistheight', 'Hipwithstanding', 'heupomtrek960',
       'lichaamslengte1780', 'dijbeenomtrek550', 'tailleomtrek850',
       'bilvoetlengte', 'bilknieschijflengte', 'bilknieholtelengte',
       'Elbowgriplength', 'Elbowheightsitting', 'breedteoverdeellebogen',
       'Eyeheightsitting', 'Horizon

Read the mapping files for these two datasets.

In [51]:
with open('assets/dined2004_mapping.json') as json_file:
    mapping_adu = json.load(json_file)

with open('assets/delstu2016_mapping.json') as json_file:
    mapping_stu = json.load(json_file)

The Mapping is a simple dictionary mapping the study measure names to DINED measure identifiers.

In [52]:
print(mapping_adu)
print('\n')
print(mapping_stu)

{'geslacht': 0, 'gewicht': 56, 'lengte': 2, 'vuisth': 6, 'voetl': 41, 'voetbr': 42, 'bovarml': 12, 'ellegrip': 31, 'elleving': 19, 'schbr': 22, 'ellebbr': 21, 'heupbrzt': 25, 'krzitvll': 17, 'ooghzt': 16, 'ooghst': 3, 'schhzt': 15, 'schhst': 4, 'ellebhst': 5, 'ellebhzt': 13, 'arml': 20, 'buikdpte': 30, 'bilknieh': 32, 'bilknie': 33, 'dijbeenh': 9, 'kniehh': 14, 'handl': 43, 'handbr': 44, 'duimbr': 45, 'wijsvbr': 46, 'oorl': 55, 'grip': 54, 'leeftijd': 80, 'kinbor': 64, 'kinacht': 63, 'hoschl': 66, 'hoschr': 65, 'hodrl': 61, 'hodrr': 62, 'flexhan': 70, 'exthan': 69, 'radhan': 71, 'ulnhan': 72, 'prohan': 67, 'suphan': 68, 'wijsfl': 73, 'opstapn': 74, 'knijpmax': 57, 'duwen': 58, 'trekken': 59, 'torsie': 60}


{'Age': 80, 'Gender': 0, 'kuitomtrek403': 82, 'borstomtrek850': 37, 'Chestdepth': 27, 'Crotchheight': 81, 'Fistheight': 6, 'Hipwithstanding': 24, 'heupomtrek960': 39, 'lichaamslengte1780': 2, 'dijbeenomtrek550': 83, 'tailleomtrek850': 38, 'bilvoetlengte': 34, 'bilknieschijflengte': 

Let's apply the mapping:

In [115]:
def map_names(df, mapping, measure_collection):
    column_map = {col: measure_collection.get_measure_name(id) for col, id in mapping.items()}
    drop_columns = set(df.columns) - set(column_map.keys())
    df_renamed = df.rename( columns=column_map ).drop(columns=drop_columns)
    return df_renamed


df_adu_dined = map_names(df_adu, mapping_adu, measure_collection)
df_stu_dined = map_names(df_stu, mapping_stu, measure_collection)

In [116]:
df_adu_dined

Unnamed: 0,Sex,Maximum gripping force,Pulling force 1 hand,Pushing force with 2 hands,Torque with two hands,Body mass,Stature,"Fist height, standing",Foot length,Foot breadth,...,Ulnar deviation,Forefinger flexion,Rotation of head toward chest,Rotation of head backwards,Bending of head towards left shoulder,Bending of head towards right shoulder,Rotation of head towards left,Rotation of head towards right,Age,Step-up height
0,1,422,475,628,9,86.3,1728,803,275,109,...,23,53,52,55,34,31,50,80,64,210
1,2,226,235,235,5,65.0,1712,741,258,91,...,22,40,40,50,38,35,60,56,62,170
2,2,245,324,314,5,79.3,1657,723,252,91,...,18,38,20,52,34,25,46,56,59,210
3,1,392,291,284,7,64.9,1641,685,249,87,...,28,60,44,65,36,36,71,62,70,120
4,1,402,383,353,6,85.7,1773,819,270,105,...,22,70,35,42,27,35,51,54,67,210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
745,2,422,298,451,7,69.7,1638,738,239,90,...,23,54,22,55,37,31,75,68,26,170
746,2,412,249,446,7,85.1,1775,794,250,101,...,24,71,24,41,51,47,78,73,28,210
747,1,422,237,368,6,95.7,1773,783,261,101,...,25,52,15,41,37,33,60,56,54,120
748,2,324,210,260,5,60.6,1793,823,254,95,...,27,45,18,37,39,44,78,78,24,120


In [117]:
df_stu_dined

Unnamed: 0,Age,Sex,Calf circumference,Chest circumference,Chest depth,Crotch height,"Fist height, standing",Hip breadth,Hip circumference,Stature,...,Hand thickness,Length pointing finger,Forefinger breadth,Foot breadth,Foot length,Foot circumference,Body mass,"Elbow height, standing","Eye height, standing","Reach height, standing"
0,24,F,500,1179,347,765,799,451,1258,1740,...,28,69,16,98,250,337,109.2,1110.0,1623.0,2050.0
1,19,M,409,1103,309,834,792,401,1208,1873,...,31,78,16,104,275,365,105.0,1211.0,1776.0,2305.0
2,18,F,435,989,260,800,783,408,1175,1725,...,27,71,16,100,250,328,81.3,1077.0,1584.0,2011.0
3,18,F,398,960,265,778,794,383,1119,1694,...,21,68,14,90,247,314,77.8,1064.0,1570.0,2002.0
4,19,M,376,972,224,854,821,345,1000,1840,...,27,75,17,99,272,348,78.7,1163.0,1733.0,2236.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
342,19,F,366,901,232,767,756,344,978,1697,...,26,72,15,90,240,311,61.9,1056.0,1568.0,2046.0
343,18,F,360,901,246,806,769,350,990,1709,...,26,74,14,85,234,313,64.1,1059.0,1598.0,2006.0
344,18,M,339,860,202,812,781,349,969,1788,...,26,72,18,106,251,332,65.5,1101.0,1684.0,2154.0
345,19,M,382,944,233,865,803,356,1004,1845,...,25,79,17,96,282,360,76.8,1227.0,1745.0,2234.0


Now let's see if we can add the metadate from the measures onto the pandas columns.

In [124]:
df_stu_dined.Age.attrs["unit"]="year"
df_stu_dined.Age.attrs


{'unit': 'year'}