## Labevents table

#### Table Description
    SUBJECT_ID:Identifies the patient.
    HADM_ID: Identifies the hospital stay.
    ITEMID:Identifier for a single measurement type.
    CHARTTIME:Records the time at which an observation was charted.
    VALUE: contains the value measured for the concept identified by the ITEMID
    VALUENUM: contains the score describing the meaning of the score.
    VALUEUOM:is the unit of measurement for the VALUE.
    FLAG:indicates whether the laboratory value is considered abnormal or not

### Website Link: 
     https://physionet.org/content/mimiciii/1.4/
     https://mimic.mit.edu/docs/iii/tables/labevents/

In [1]:
#import libraries
import os
import gzip
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
file ='C:/Users/zuoch/Downloads/Final semester/DAB304- Health analytics/mimic-iii-clinical-database-1.4/LABEVENTS.csv.gz'
labevents = pd.read_csv(file, compression ='gzip')
labevents

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ITEMID,CHARTTIME,VALUE,VALUENUM,VALUEUOM,FLAG
0,281,3,,50820,2101-10-12 16:07:00,7.39,7.39,units,
1,282,3,,50800,2101-10-12 18:17:00,ART,,,
2,283,3,,50802,2101-10-12 18:17:00,-1,-1.00,mEq/L,
3,284,3,,50804,2101-10-12 18:17:00,22,22.00,mEq/L,
4,285,3,,50808,2101-10-12 18:17:00,0.93,0.93,mmol/L,abnormal
...,...,...,...,...,...,...,...,...,...
27854050,27428435,96443,103219.0,50882,2109-12-30 01:40:00,26,26.00,mEq/L,
27854051,27428436,96443,103219.0,50885,2109-12-30 01:40:00,2.1,2.10,mg/dL,abnormal
27854052,27428437,96443,103219.0,50902,2109-12-30 01:40:00,97,97.00,mEq/L,
27854053,27428438,96443,103219.0,50911,2109-12-30 01:40:00,2,2.00,ng/mL,


In [3]:
# Display basic information about the dataset
labevents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27854055 entries, 0 to 27854054
Data columns (total 9 columns):
 #   Column      Dtype  
---  ------      -----  
 0   ROW_ID      int64  
 1   SUBJECT_ID  int64  
 2   HADM_ID     float64
 3   ITEMID      int64  
 4   CHARTTIME   object 
 5   VALUE       object 
 6   VALUENUM    float64
 7   VALUEUOM    object 
 8   FLAG        object 
dtypes: float64(2), int64(3), object(4)
memory usage: 1.9+ GB


In [4]:
#checking number of the columns and rows
labevents.shape

(27854055, 9)

In [5]:
# Display the first few rows of the datas
labevents.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ITEMID,CHARTTIME,VALUE,VALUENUM,VALUEUOM,FLAG
0,281,3,,50820,2101-10-12 16:07:00,7.39,7.39,units,
1,282,3,,50800,2101-10-12 18:17:00,ART,,,
2,283,3,,50802,2101-10-12 18:17:00,-1,-1.0,mEq/L,
3,284,3,,50804,2101-10-12 18:17:00,22,22.0,mEq/L,
4,285,3,,50808,2101-10-12 18:17:00,0.93,0.93,mmol/L,abnormal


In [6]:
# Overview and Summary Statistics
labevents.describe()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ITEMID,VALUENUM
count,27854060.0,27854060.0,22245030.0,27854060.0,24932840.0
mean,13958520.0,31463.91,149967.5,51062.14,78.5322
std,8057287.0,27141.77,28862.31,209.2624,5512.982
min,1.0,2.0,100001.0,50800.0,-414.0
25%,6980364.0,11205.0,125108.0,50882.0,4.4
50%,13960990.0,22585.0,149849.0,50983.0,18.2
75%,20938290.0,47906.0,175020.0,51250.0,60.0
max,27907650.0,99999.0,199999.0,51555.0,14272000.0


In [7]:
# Check for missing values
print("\nMissing Values:")
print(labevents.isnull().sum())


Missing Values:
ROW_ID               0
SUBJECT_ID           0
HADM_ID        5609021
ITEMID               0
CHARTTIME            0
VALUE             1245
VALUENUM       2921220
VALUEUOM       3100249
FLAG          18009889
dtype: int64


In [8]:
#replace the missing value 

labevents['HADM_ID'].fillna(0, inplace=True)
labevents['VALUE'].fillna(0, inplace=True)
labevents['VALUENUM'].fillna(0, inplace=True)
labevents['VALUEUOM'].fillna(0, inplace=True)
labevents['FLAG'].fillna(0, inplace=True)

In [9]:
#checking missing values after changes
labevents.isnull().sum()

ROW_ID        0
SUBJECT_ID    0
HADM_ID       0
ITEMID        0
CHARTTIME     0
VALUE         0
VALUENUM      0
VALUEUOM      0
FLAG          0
dtype: int64

In [10]:
# Displaying the first few rows of the DataFrame:
labevents.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ITEMID,CHARTTIME,VALUE,VALUENUM,VALUEUOM,FLAG
0,281,3,0.0,50820,2101-10-12 16:07:00,7.39,7.39,units,0
1,282,3,0.0,50800,2101-10-12 18:17:00,ART,0.0,0,0
2,283,3,0.0,50802,2101-10-12 18:17:00,-1,-1.0,mEq/L,0
3,284,3,0.0,50804,2101-10-12 18:17:00,22,22.0,mEq/L,0
4,285,3,0.0,50808,2101-10-12 18:17:00,0.93,0.93,mmol/L,abnormal


In [None]:
#save clean data into csv file

csv_path = 'labevents_table_clean.csv'
labevents.to_csv(csv_path,
                   index = False)