### Merging data from WardWatcher (ICNARC) and Philips (ICCA).

Here use the following files:

* 'encounter_summary (1).rpt'  - a tab separated file with output from a simple SQL run on ICCA to extract basic information about patient encounters (ITU stays).

* 'ICNARC 2015-2018 encounterIds and Readmissions.TXT' - a file containing ICNARC patient IDs and the corresponding 'CIS Patient ID', which link to encounterID in Philips.

* 'Philips encounterId Issue List (New).xlsx' - a file documenting known issues with either encounterIds in Philips or CIS Patient IDs in WW. We clean up the IDs using this file before joining the two datasets.

* 'ICNARC_Dataset_2015-2018__clean_.xml' - xml file containing output of ICNARC dataset

* 'ICNARC CMP Dataset Properties.xlsx' - description of variables in the ICNARC dataset

In [1]:
VERBOSE = False ## For reasons of data protection we supress printing of results and data summaries.

In [2]:
from clean_encounterids import *

  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  from pandas._libs import (hashtable as _hashtable,
  from pandas._libs import algos, lib
  from pandas._libs import hashing, tslib
  from pandas._libs import (lib, index as libindex, tslib as libts,
  import pandas._libs.tslibs.offsets as liboffsets
  from pandas._libs import algos as libalgos, ops as libops
  from pandas._libs.interval import (
  from pandas._libs import internals as libinternals
  import pandas._libs.sparse as splib
  import pandas._libs.window as _window
  from pandas._libs import (lib, reduction,
  from pandas._libs import algos as _algos, reshape as _reshape
  import pandas._libs.parsers as parsers
  from pandas._libs import algos, lib, writers as libwriters


In [3]:
icnarc_numbers = clean_icnarc_cis_ids('../ICNARC 2015-2018 encounterIds and Readmissions.TXT', 
                                      '../Philips encounterId Issue List (New).xlsx',
                                    verbose=VERBOSE)

In [4]:
philips_data = clean_philips_encounterids('../encounter_summary (1).rpt', 
                                  '../Philips encounterId Issue List (New).xlsx',
                                  verbose=VERBOSE)

In [5]:
merged_data = join_icnarc_to_philips(philips_data, icnarc_numbers, verbose=VERBOSE)

In [7]:
merged_data = combine_non_unique_encounters(merged_data)

('\n', Index([u'CIS Patient ID_', u'Readmission during this hospital stay_first',
       u'ICNARC number_count', u'ICNARC number_list', u'tNumber_first',
       u'encounterId_original_count', u'encounterId_original_list',
       u'inTime_min', u'outTime_max', u'lengthOfStay (mins)_sum',
       u'gender_first', u'Unit ID_min', u'CIS Patient ID Original_count',
       u'CIS Patient ID Original_list', u'ptCensusId_count',
       u'ptCensusId_list', u'CIS Episode ID_count', u'CIS Episode ID_list',
       u'age_min'],
      dtype='object'))
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]


In [13]:
merged_data['age_min'].values.mean()

60.50744724865536

In [15]:
print sum(merged_data['gender_first']=='Male')
print sum(merged_data['gender_first']=='Feale')
print len(merged_data['gender_first'])

0          Male
1          Male
2        Female
3        Female
4        Female
5          Male
6        Female
7        Female
8        Female
9          Male
10       Female
11          NaN
12       Female
13         Male
14         Male
15       Female
16         Male
17         Male
18         Male
19       Female
20         Male
21          NaN
22         Male
23       Female
24       Female
25       Female
26       Female
27       Female
28         Male
29         Male
         ...   
4804       Male
4805       Male
4806       Male
4807     Female
4808       Male
4809     Female
4810       Male
4811       Male
4812       Male
4813       Male
4814       Male
4815       Male
4816     Female
4817     Female
4818     Female
4819       Male
4820       Male
4821       Male
4822       Male
4823     Female
4824       Male
4825     Female
4826     Female
4827       Male
4828       Male
4829       Male
4830       Male
4831       Male
4832     Female
4833    Unknown
Name: gender_first, Leng