## Metrics [[documentation]](https://childproject.readthedocs.io/en/latest/metrics.html)

Annotations are often used to derive measures of quantities of interest: speaker vocalization rates, speech duration, average and peak adult word count, etc. Our package allows researchers to derive a wide array of measures from a set annotations regardless of their source (LENA, VTC/VCM/ALICE aka ACLEW, manual annotations, etc.).

Let's see how this can be done.
As always, we start by loading our project:

In [1]:
from ChildProject.projects import ChildProject
from ChildProject.pipelines.metrics import LenaMetrics, AclewMetrics

project = ChildProject("/mnt/data/vandam-data")

That's it! We can now derive the usual LENA metrics, for instance:

In [2]:
metrics = LenaMetrics(project, set="its")
metrics.extract()

Unnamed: 0,recording_filename,child_id,duration_its,voc_fem_ph,lp_dur,lp_n,wc_adu_ph,wc_mal_ph,wc_fem_ph,avg_voc_dur_chi,...,avg_voc_dur_mal,voc_dur_chi_ph,voc_dur_och_ph,voc_dur_mal_ph,voc_dur_fem_ph,voc_chi_ph,voc_och_ph,voc_mal_ph,avg_voc_dur_fem,lena_CTC
0,BN32_010007.mp3,1,50464512,151.805689,0.447767,0.544608,1372.431145,624.116131,748.315014,1051.931961,...,1569.945055,178674.867598,85687.462905,163065.561795,194414.007214,169.854015,77.900288,103.86705,1280.676692,515


By default, metrics are aggregated over each recording. But we can also group them by child, for instance. This is equivalent here, because we have just one recording for the unique child in the corpus:

In [3]:
metrics = LenaMetrics(project, set="its", by="child_id")
metrics.extract()

Unnamed: 0,child_id,duration_its,voc_fem_ph,lp_dur,lp_n,wc_adu_ph,wc_mal_ph,wc_fem_ph,avg_voc_dur_chi,avg_voc_dur_och,...,avg_voc_dur_mal,voc_dur_chi_ph,voc_dur_och_ph,voc_dur_mal_ph,voc_dur_fem_ph,voc_chi_ph,voc_och_ph,voc_mal_ph,avg_voc_dur_fem,lena_CTC
0,1.0,50464512.0,151.805689,0.447767,0.544608,1372.431145,624.116131,748.315014,1051.931961,1099.96337,...,1569.945055,178674.867598,85687.462905,163065.561795,194414.007214,169.854015,77.900288,103.86705,1280.676692,515.0


We can also derive metrics per time-period (e.g. every hour) and for portions of the recordings comprised within specific parts of the day (for instance, the morning):

In [4]:
metrics = LenaMetrics(project, set="its", from_time="07:00:00", to_time="12:00:00", period="1H")
metrics.extract()

Unnamed: 0,recording_filename,period_start,period_end,child_id,duration_its,voc_fem_ph,lp_dur,lp_n,wc_adu_ph,wc_mal_ph,...,avg_voc_dur_mal,voc_dur_chi_ph,voc_dur_och_ph,voc_dur_mal_ph,voc_dur_fem_ph,voc_chi_ph,voc_och_ph,voc_mal_ph,avg_voc_dur_fem,lena_CTC
0,BN32_010007.mp3,00:00:00,01:00:00,1,0.0,,,,,,...,,,,,,,,,,
1,BN32_010007.mp3,01:00:00,02:00:00,1,0.0,,,,,,...,,,,,,,,,,
2,BN32_010007.mp3,02:00:00,03:00:00,1,0.0,,,,,,...,,,,,,,,,,
3,BN32_010007.mp3,03:00:00,04:00:00,1,0.0,,,,,,...,,,,,,,,,,
4,BN32_010007.mp3,04:00:00,05:00:00,1,0.0,,,,,,...,,,,,,,,,,
5,BN32_010007.mp3,05:00:00,06:00:00,1,0.0,,,,,,...,,,,,,,,,,
6,BN32_010007.mp3,06:00:00,07:00:00,1,0.0,,,,,,...,,,,,,,,,,
7,BN32_010007.mp3,07:00:00,08:00:00,1,3600000.0,204.0,0.560906,0.658824,1391.64,543.87,...,1345.535714,251920.0,158080.0,150700.0,246170.0,222.0,135.0,112.0,1206.715686,53.0
8,BN32_010007.mp3,08:00:00,09:00:00,1,3600000.0,61.0,0.609316,0.649007,1729.27,1417.5,...,1715.891089,145960.0,155130.0,346610.0,73270.0,134.0,139.0,202.0,1201.147541,45.0
9,BN32_010007.mp3,09:00:00,10:00:00,1,3600000.0,0.0,,,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0


Let's now compare the metrics derived from the LENA annotations and the VTC annotations:

In [5]:
lena = LenaMetrics(project, set="its")
lena = lena.extract()
vtc = AclewMetrics(project, vtc="vtc", alice=None, vcm=None)
vtc = vtc.extract()

# remove metrics that are not available in both pipelines
common_metrics = sorted(set(lena.columns)&set(vtc.columns))
lena = lena[common_metrics]
vtc = vtc[common_metrics]

The ALICE set ('None') was not found in the index.
The vcm set ('None') was not found in the index.


In [6]:
vtc

Unnamed: 0,avg_voc_dur_chi,avg_voc_dur_fem,avg_voc_dur_mal,avg_voc_dur_och,child_id,recording_filename,voc_chi_ph,voc_dur_chi_ph,voc_dur_fem_ph,voc_dur_mal_ph,voc_dur_och_ph,voc_fem_ph,voc_mal_ph,voc_och_ph
0,1063.07,958.133568,1258.424329,874.468354,1,BN32_010007.mp3,363.820025,386766.153609,330065.159453,250914.258321,68994.686801,344.487627,199.387641,78.899009


In [7]:
lena

Unnamed: 0,avg_voc_dur_chi,avg_voc_dur_fem,avg_voc_dur_mal,avg_voc_dur_och,child_id,recording_filename,voc_chi_ph,voc_dur_chi_ph,voc_dur_fem_ph,voc_dur_mal_ph,voc_dur_och_ph,voc_fem_ph,voc_mal_ph,voc_och_ph
0,1051.931961,1280.676692,1569.945055,1099.96337,1,BN32_010007.mp3,169.854015,178674.867598,194414.007214,163065.561795,85687.462905,151.805689,103.86705,77.900288


## Custom  metrics

The package supports custom metrics from custom sets of annotation (beyond the LENA, the VTC, ALICE and VCM).
https://childproject.readthedocs.io/en/latest/metrics.html#custom-metrics