In [1]:
# default_exp motivating_example

from nbdev.showdoc import show_doc

%load_ext autoreload
%autoreload 2

In [2]:
#hide
#export
import pandas as pd
import numpy as np
import numexpr as ne

from autoneuro import calculators
from autoneuro import operators
from autoneuro import domains

# Motivating Example

> Processing the Brief VisioSpatial Memory Test from raw data to an age-adjusted score.

## Why do we need this?

The BVMT requires measuring the subjects responses (working diagram of BVMT).

These are recorded as raw values. Here's two individuals' results.

We'll imagine some test results.

### Measured Values

In [3]:
#hide

import yaml

DATA = [{'trial1': 5, 'trial2': 6, 'trial3': 7,
        'delay': 8, 'hits': 6, 'false_pos': 2,
        'copy': 12, 'age': 32},
        {'trial1': 4, 'trial2': 6, 'trial3': 5,
        'delay': 6, 'hits': 4, 'false_pos': 1,
        'copy': 8, 'age': 71}]

P1_data, P2_data = DATA

DATA = pd.DataFrame(DATA, index = ['P1', 'P2']).T

In [4]:
DATA

Unnamed: 0,P1,P2
trial1,5,4
trial2,6,6
trial3,7,5
delay,8,6
hits,6,4
false_pos,2,1
copy,12,8
age,32,71


Which patient is "impaired"?
Which patient is "more impaired"?
We could naively assume that since patient 2 scored worse across all metrics, that they are likely impaired while P1 is normal.

It would also make sense to take into account that P2 is older than P1.
Here's where age-matched control data comes in handy.

### Intermediate Values

First, we need to calculate some intermediate values from these test.
  - `immediate`: the sum of the three trials
  - `regonition`: the number of hits - false-positive recognitions
  - `retention`: ratio of delayed successes and largest of the trial 2 & trial 3 successes


`autoneuro` can facilitate this _with explanations_!
We'll create a `TestCalculator` called `bvmt_derived_calculator`.

In [5]:
#hide

from autoneuro import calculators
from autoneuro import operators

bvmt_config = yaml.full_load(open('data/test_calculators/BVMT.yaml'))
bvmt_derived_calculator = calculators.TestCalculator.from_config(bvmt_config)

In [6]:
bvmt_derived_calculator.explain(P1_data)


Input: 
 delay:8
 false_pos:2
 hits:6
 trial1:5
 trial2:6
 trial3:7
Calculating: ['immediate']
Used Equation: trial1+trial2+trial3 = 18 = immediate 

Calculating: ['retention_denom']
Aggregation: max [trial2, trial3]  = 7 

Calculating: ['retention']
Used Equation: delay/retention_denom = 1.1428571428571428 = retention 

Calculating: ['retention']
Clipped retention to [0, 1] 

Calculating: ['recognition']
Used Equation: hits-false_pos = 4.0 = recognition 

Resulting in: 
 immediate:18.0
 recognition:4.0
 retention:1.0
 retention_denom:7.0


In [7]:
bvmt_derived_calculator.explain(P2_data)

Input: 
 delay:6
 false_pos:1
 hits:4
 trial1:4
 trial2:6
 trial3:5
Calculating: ['immediate']
Used Equation: trial1+trial2+trial3 = 15 = immediate 

Calculating: ['retention_denom']
Aggregation: max [trial2, trial3]  = 6 

Calculating: ['retention']
Used Equation: delay/retention_denom = 1.0 = retention 

Calculating: ['retention']
Clipped retention to [0, 1] 

Calculating: ['recognition']
Used Equation: hits-false_pos = 3.0 = recognition 

Resulting in: 
 immediate:15.0
 recognition:3.0
 retention:1.0
 retention_denom:6.0


### Age Based Normalization

These derived values are what Heaton et al normalized across ages.
Let's take a look at one.

In [8]:
!head -n 19 data/norms/from_kate/heaton_bvmt.yaml

short_name: 'heatonBVMT'
reference: 'Excel Sheet Lookup Conversion'

operations:
  - type: normative_lookup
    measure_col: immediate
    filter_cols: ['age']
    out_name: 'heaton_immediate'

    table:
      - filter: (18 <= age) & (age <= 21)
        mean: 28.74
        std: 4.32
      - filter: (20 <= age) & (age <= 23)
        mean: 28.44
        std: 4.38
      - filter: (24 <= age) & (age <= 25)
        mean: 27.83
        std: 4.48


It is a collection of filters (in this case ages) that define a mean and standard deviation for a group.

For example:

In [9]:
#hide
heaton_config = yaml.full_load(open('data/norms/from_kate/heaton_bvmt.yaml'))
heaton_calculator = calculators.TestCalculator.from_config(heaton_config)

heaton_bvmt_calculator = bvmt_derived_calculator + heaton_calculator

immed = heaton_calculator.operations[0]

NORM_INFO = pd.DataFrame([immed.lookup_norm(P1_data),
                          immed.lookup_norm(P2_data)],
                         columns = ['Filter', 'Mean', 'STD'],
                         index = ['P1', 'P2'])

In [10]:
NORM_INFO

Unnamed: 0,Filter,Mean,STD
P1,(30 <= age) & (age <= 33),26.92,4.64
P2,(71 <= age) & (age <= 74),20.83,5.71


We'd expect P1 to have have an `immediate` score of ~27 +/- 4.5.
We actually saw 18, about 2 SDs below age matched controls.

For P2 we'd expect to see ~21 +/- 5.7. We actually saw 15, about 1.1 SDs below average.

It would seem that P1 is likely impaired while P2 is borderline.

Lets see an explanation.

In [11]:
heaton_bvmt_calculator.explain(P1_data)

Input: 
 age:32
 delay:8
 false_pos:2
 hits:6
 trial1:5
 trial2:6
 trial3:7
Calculating: ['immediate']
Used Equation: trial1+trial2+trial3 = 18 = immediate 

Calculating: ['retention_denom']
Aggregation: max [trial2, trial3]  = 7 

Calculating: ['retention']
Used Equation: delay/retention_denom = 1.1428571428571428 = retention 

Calculating: ['retention']
Clipped retention to [0, 1] 

Calculating: ['recognition']
Used Equation: hits-false_pos = 4.0 = recognition 

Calculating: ['heaton_immediate']
heaton_immediate:
 Matched (30 <= age) & (age <= 33)
 Expecting 26.92 +- 4.64
 Observed: 18.0
 Z: -1.9224137931034488 

Calculating: ['heaton_retention']
heaton_retention:
 Matched (30 <= age) & (age <= 33)
 Expecting 0.962 +- 0.0474
 Observed: 1.0
 Z: 0.8016877637130809 

Calculating: ['heaton_delay']
heaton_delay:
 Matched (30 <= age) & (age <= 33)
 Expecting 10.13 +- 1.6
 Observed: 8.0
 Z: -1.3312500000000005 

Calculating: ['heaton_recognition']
heaton_recognition:
 Matched (30 <= age) & 

In [12]:
heaton_bvmt_calculator.explain(P2_data)

Input: 
 age:71
 delay:6
 false_pos:1
 hits:4
 trial1:4
 trial2:6
 trial3:5
Calculating: ['immediate']
Used Equation: trial1+trial2+trial3 = 15 = immediate 

Calculating: ['retention_denom']
Aggregation: max [trial2, trial3]  = 6 

Calculating: ['retention']
Used Equation: delay/retention_denom = 1.0 = retention 

Calculating: ['retention']
Clipped retention to [0, 1] 

Calculating: ['recognition']
Used Equation: hits-false_pos = 3.0 = recognition 

Calculating: ['heaton_immediate']
heaton_immediate:
 Matched (71 <= age) & (age <= 74)
 Expecting 20.83 +- 5.71
 Observed: 15.0
 Z: -1.0210157618213658 

Calculating: ['heaton_retention']
heaton_retention:
 Matched (71 <= age) & (age <= 74)
 Expecting 0.878 +- 0.143
 Observed: 1.0
 Z: 0.8531468531468532 

Calculating: ['heaton_delay']
heaton_delay:
 Matched (71 <= age) & (age <= 74)
 Expecting 7.93 +- 2.36
 Observed: 6.0
 Z: -0.8177966101694915 

Calculating: ['heaton_recognition']
heaton_recognition:
 Matched (71 <= age) & (age <= 74)
 Exp

### Conlcusion

We can look at these next to each other.

In [13]:
#hide

process_df = heaton_bvmt_calculator.process_dataframe(DATA.T)
cols = ['heaton_immediate', 'heaton_delay', 'heaton_retention']

In [14]:
process_df[cols].T

Unnamed: 0,P1,P2
heaton_immediate,-1.922414,-1.021016
heaton_delay,-1.33125,-0.817797
heaton_retention,0.801688,0.853147


These numbers are in z-scale. Meaning:
 - A "control" will score 0 the standard deviation of scores 1
 - An impaired individual will have a negative value.

While cutoffs can vary by application, anything below -2 is "impaired", between -1 and -2 as "borderline" everything above as "normal".

Therefore we'd say P1 is nearly impaired in their immediate recognition, and borderline in their delayed memory.
P2 is performing within the 1 SD of age-matched controls.