# Revised case normalization for Hirslanden Linde 2018

This jupyter notebook is used to normalize the revised case from DtoD.

Before runing the notebook, raw_data folder need to added to the root directory

The raw data folder can be find here: https://aimedic.sharepoint.com/:f:/s/dev/Ejx_A1dg8gtPumFknOWOh0oBi6ofx9hctYiq3c-0gH9vYA?e=UmcgrS

Normalization:

-  Convert the column names to the name used in the Database
-  Delete cases which is empty in the follow columns VALIDATION_COLS: 'case_id', 'patient_id', 'gender', 'age_years', duration_of_stay',  'pccl',  'drg'
- choose neccessary columns COLS_TO_SELECT: case_id, patient_id, gender, age_years, duration_of_stay, pccl, drg, pd, bfs_code, added_icds, removed_icds, added_chops, removed_chops
- still need to do (TODO):    
    -  Check CHOP upper/lowercase
    -  Check whether the PD changed. If it did, new and old PD are stored together with added and removed ICDs, respectively
    -  Pad case IDs with 0s
    -  Write function to validate cases


In [1]:
import pandas as pd
import os
from dataclasses import dataclass, field
import sys
sys.path.insert(0, '/home/jovyan/work')
sys.path.insert(1, '/home/jovyan/work/src')
sys.path.insert(2, '/home/jovyan/work/src/service')

from service import bfs_cases_db_service as bfs_db

from py.global_configs import *
from py.normalize import normalize

  class BfsCase(Base):


In [2]:
# check all the file name

FILES_TO_ANALYZE.keys()


dict_keys(['Hirslanden Salem 2017', 'Hirslanden Beau Site 2017', 'Hirslanden Linde 2017', 'Hirslanden Linde 2018', 'Hirslanden Salem 2018', 'Hirslanden Beau Site 2018'])

In [3]:
file = FILES_TO_ANALYZE['Hirslanden Linde 2018']
file

FileInfo(path='/home/jovyan/work/src/revised_case_normalization/raw_data/HI-Bern_Salem_Beau Site_Linde.xlsx', hospital_name_db='Hirslanden Linde', year='2018', sheets=['Änderungen_LI_2018'])

In [4]:

df_revised_case_d2d = normalize(file, 0)

Read 10 cases for Hirslanden Linde 2018
TYPES:
case_id             string
patient_id          string
gender              string
age_years            int64
duration_of_stay     int64
pccl                 int64
drg                 string
pd                  string
bfs_code            string
added_icds          string
removed_icds        string
added_chops         string
removed_chops       string
dtype: object


In [5]:
df_revised_case_d2d.head()

Unnamed: 0,case_id,patient_id,gender,age_years,duration_of_stay,pccl,drg,pd,bfs_code,added_icds,removed_icds,added_chops,removed_chops
0,715177,33100584B601FADF,M,90,11,3,G18B,C182,M200,D62,,990410::20180124,
1,716197,DEE16D56D71DD96A,W,72,14,3,I09C,M4805,M200,N184,N183,,
2,721128,34645F6C19043F5B,M,71,14,3,G21B,K565,M200,G819,,,
3,721977,0098EFC426FD8F26,W,84,11,3,I46C,S7201,M200,T840,,,
4,725531,EBF3D9B44B52E53F,M,80,2,3,F59E,I7022,M100,G2090,,,


# Match to the database


In [6]:
# get the case_id from revised_case

revised_case_id = df_revised_case_d2d['case_id'].values
revised_case_id

array(['715177', '716197', '721128', '721977', '725531', '727730',
       '727952', '728588', '731638', '737831'], dtype=object)

In [7]:
# match to the database
revised_case_db = bfs_db.get_bfs_cases_by_ids(revised_case_id)
revised_case_db.head()

Unnamed: 0,drg_cost_weight,aimedic_id,hospital_id,case_id,patient_id,age_years,age_days,gender,duration_of_stay,clinic_id,ventilation_hours,admission_weight,gestation_age,admission_date,admission_type,discharge_date,discharge_type,drg,adrg,pccl
0,1.781,137694,7,715177,33100584B601FADF,90,0,M,11,4,0,0,0,2018-01-22,1,2018-02-02,0,G18B,G18,3
1,2.967,135211,7,716197,DEE16D56D71DD96A,72,0,W,14,4,0,0,0,2018-02-14,1,2018-02-28,0,I09C,I09,3
2,1.318,132057,7,721128,34645F6C19043F5B,71,0,M,14,4,0,0,0,2018-02-19,1,2018-03-05,0,G21B,G21,3
3,1.644,132931,7,721977,0098EFC426FD8F26,84,0,W,11,4,0,0,0,2018-02-24,1,2018-03-07,0,I46C,I46,3
4,1.133,137652,7,725531,EBF3D9B44B52E53F,80,0,M,2,3,0,0,0,2018-04-09,1,2018-04-11,0,F59E,F59,3


In [8]:
# 
print('There are {} out of {} revised cases from DtoD that are matched with the database for {} {}'.format(len(revised_case_db), len(revised_case_id), file.hospital_name_db, file.year))

There are 10 out of 10 revised cases from DtoD that are matched with the database for Hirslanden Linde 2018


In [9]:
# if we find the match cases, then we need to check if the case_id, gender, year....are match

In [10]:
revised_case_db_subset = revised_case_db[['aimedic_id', 'case_id', 'gender', 'age_years']]
revised_case_db_subset.head()

Unnamed: 0,aimedic_id,case_id,gender,age_years
0,137694,715177,M,90
1,135211,716197,W,72
2,132057,721128,M,71
3,132931,721977,W,84
4,137652,725531,M,80
