Make sure impact2_engine is on the search path in one way or the other.

In [1]:
import sys
sys.path.append('../../')
import yaml
import numpy as np
from impact2_engine.Profile import Profile

The config specification is a bit excessive. Assume the data has been processed, with renamed columns.

In [2]:
with open('../../impact2_engine/config/profile_config.yml', 'r', encoding = 'utf-8') as stream:
    config = yaml.safe_load(stream)

yaml.dump(config, sys.stdout)

contents:
  CAT:
  - name: SITE_ID
    var: site
  - name: DONOR_SITE_STATUS
    var: status
  - name: GROUP
    var: group
  - name: GENDER
    var: gender
  DEM:
  - var: age
  - var: bmi
  - var: weight
  DTS:
  - format: '%Y-%m-%d'
    name: DONATION_DATE
    var: col_date
  - format: '%Y-%m-%d %H:%M:%S'
    name: PROCEDURE_START
    var: proc_start
  - format: '%Y-%m-%d %H:%M:%S'
    name: PROCEDURE_END
    var: proc_end
  IDS:
  - name: DONOR_NUMBER
    var: don_id
  - name: COLLECTION_NUMBER
    plan: 60000
    var: col_id
  - name: DEVICE_ID
    var: dev_id
  POP:
  - name: ITT
    var: itt
  - name: MITT
    var: mitt
  - name: PP
    var: pp
  SEV:
  - aes:
    - '1.1'
    - '1.2'
    - '1.3'
    - '1.4'
    - '1.5'
    - '1.6'
    - '2.1'
    - '3.1'
    - '3.2'
    - '3.3'
    - '3.4'
    - '3.5'
    - '3.6'
    - '3.7'
    - '4.1'
    - '4.2'
    - '4.3'
    - '5.1'
    - '5.2'
    - '6.1'
    - '6.2'
    - '7.1'
    - '7.2'
    - '7.3'
    - '8.1'
    - '9.1'
    - '10.1'

Instantiate the Profile module, initialized with correct config. It contains both .data and .contents. There are no .missing data in this example.

In [3]:
config['data_path'] = '../../impact2_engine/data/' + config['data_path']
prof = Profile(**config)
prof.missing is None

True

The single interface for all 3 methods REQUIRES `don_ids` as `list['don_id']`. If not specified, by default includes ALL `'don_id'`.

In [4]:
np.random.seed(6)
don_ids = np.random.choice(prof.data['don_id'].unique(), size = 2, replace = False).tolist()
don_ids

['439443', '383902']

The donor demographics info takes the FIRST (sorted by `col_date`) encountered value of each column. No need for MultiIndex here.

In [5]:
prof.summary_dem(don_ids)

Unnamed: 0,donor_id,site,group,status,gender,age,first_weight,first_bmi,first_hct
0,383902,448,B,donated,male,24.0,150.0,22.807169,44.0
1,439443,516,B,naive,female,18.0,228.0,40.38794,43.0


In donor collection summary, every variable has ONLY ONE specific metric, so MultiIndex is basically redundant.

In [6]:
prof.summary_col(don_ids)

Unnamed: 0_level_0,diff,diff,diff,mean,mean,nunique,sum,sum
Unnamed: 0_level_1,bmi,hct,weight,duration_minutes,speed,col_id,all_ae,sig_hyp
don_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
383902,0.760239,7.0,5.0,48.0,14.678127,4,0,0
439443,0.35428,1.0,2.0,56.666667,15.453263,3,1,0


The chronological information is grouped by `'don_id'` and sorted by `'col_date'`.

In [7]:
prof.chronology(don_ids)

Unnamed: 0,don_id,col_id,dev_id,col_date,proc_start,proc_end,duration_minutes,speed,yield,target_vol,actual_vol,weight,bmi,hct,days_total,days_lag,AE
0,439443,5161035631,PCS300-18G152SPG,2020-02-12,2020-02-13 00:48:00,2020-02-13 01:42:00,54.0,16.074074,1.0,868.0,868.0,228.0,40.38794,43.0,0,0,
1,439443,5161036048,PCS300-18F672SPG,2020-02-14,2020-02-14 20:47:00,2020-02-14 21:43:00,56.0,15.535714,0.998852,871.0,870.0,230.0,40.74222,43.0,2,2,1.1
2,439443,5161039557,PCS300-18F672SPG,2020-03-03,2020-03-03 22:33:00,2020-03-03 23:33:00,60.0,14.75,1.0,885.0,885.0,229.0,40.56508,42.0,20,18,
3,383902,4480244087,PCS300-18B299SPG,2020-03-13,2020-03-13 18:55:00,2020-03-13 19:39:00,44.0,16.977273,1.00134,746.0,747.0,150.0,22.807169,44.0,0,0,
4,383902,4480244477,PCS300-18B299SPG,2020-03-15,2020-03-15 19:01:00,2020-03-15 19:51:00,50.0,13.78,1.001453,688.0,689.0,154.0,23.41536,49.0,2,2,
5,383902,4480245355,PCS300-17K561SPG,2020-03-20,2020-03-20 21:03:00,2020-03-20 21:42:00,39.0,16.717949,0.998469,653.0,652.0,150.0,22.807169,51.0,7,5,
6,383902,4480245708,PCS300-17K891SPG,2020-03-22,2020-03-22 21:49:00,2020-03-22 22:48:00,59.0,11.237288,0.998494,664.0,663.0,149.0,22.655122,50.0,9,2,
