## Description of shape and features of the table 

In [22]:
import os
import pandas as pd
import sys

In [23]:
def read_csv_files(folder_path):
    try:
        files = [file for file in os.listdir(folder_path) if file.endswith('.csv')]
        if not files:
            print("No CSV files found in the specified folder.")
            return
        for file in files:
            file_path = os.path.join(folder_path, file)
            print(f"\nReading file: {file}")
            try:
                df = pd.read_csv(file_path)
                print("------"*40)
                print(df.head())
                print('Shape of given data {file :}',df.shape)
                print(f"Columns: {list(df.columns)}")
                print(f"Shape: {df.shape}")
                df.head()
                print('------'*40,'\n')
            except Exception as e:
                print(f"Failed to read {file}: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")

In [24]:
read_csv_files(r'D:\Final_year_project\final project dataset\final project')


Reading file: ADMISSIONS.csv
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   ROW_ID  SUBJECT_ID  HADM_ID         ADMITTIME         DISCHTIME DEATHTIME  \
0      21          22   165315  09-04-2196 12:26  10-04-2196 15:54       NaN   
1      22          23   152223  03-09-2153 07:15  08-09-2153 19:10       NaN   
2      23          23   124321  18-10-2157 19:34  25-10-2157 14:00       NaN   
3      24          24   161859  06-06-2139 16:14  09-06-2139 12:48       NaN   
4      25          25   129635  02-11-2160 02:06  05-11-2160 14:55       NaN   

  ADMISSION_TYPE         ADMISSION_LOCATION         DISCHARGE_LOCATION  \
0      EMERGENCY       EMERGENCY ROOM ADMIT  DISC-TRAN CANCER/CHLDRN H   
1       ELECTIVE  PHYS REFERRAL/NORMAL DELI           HOME HEALTH CARE   
2      EMERGENCY  TRANSFER

  df = pd.read_csv(file_path)


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   ROW_ID  SUBJECT_ID  HADM_ID COSTCENTER CHARTDATE CPT_CD  CPT_NUMBER  \
0     317       11743   129545        ICU       NaN  99232     99232.0   
1     318       11743   129545        ICU       NaN  99232     99232.0   
2     319       11743   129545        ICU       NaN  99232     99232.0   
3     320       11743   129545        ICU       NaN  99232     99232.0   
4     321        6185   183725        ICU       NaN  99223     99223.0   

  CPT_SUFFIX  TICKET_ID_SEQ              SECTIONHEADER  \
0        NaN            6.0  Evaluation and management   
1        NaN            7.0  Evaluation and management   
2        NaN            8.0  Evaluation and management   
3        NaN            9.0  Evaluation and management   
4        NaN            

# MIMIC DATASET COLUMN DESCRIPTION

### 1. ADMISSIONS.csv DATA INFORMATION

In [25]:
ADMISSIONS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\ADMISSIONS.csv')
ADMISSIONS.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,09-04-2196 12:26,10-04-2196 15:54,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,09-04-2196 10:06,09-04-2196 13:24,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,03-09-2153 07:15,08-09-2153 19:10,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,,,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1
2,23,23,124321,18-10-2157 19:34,25-10-2157 14:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,,,BRAIN MASS,0,1
3,24,24,161859,06-06-2139 16:14,09-06-2139 12:48,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME,Private,,PROTESTANT QUAKER,SINGLE,WHITE,,,INTERIOR MYOCARDIAL INFARCTION,0,1
4,25,25,129635,02-11-2160 02:06,05-11-2160 14:55,,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,,UNOBTAINABLE,MARRIED,WHITE,02-11-2160 01:01,02-11-2160 04:27,ACUTE CORONARY SYNDROME,0,1


In [26]:
ADMISSIONS.shape

(58976, 19)

Here’s the ADMISSIONS table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID|	Identifier for the patient associated with the admission (links to the PATIENTS table).|
|HADM_ID|	Unique identifier for the hospital admission (used to link to other tables like CHARTEVENTS).|
|ADMITTIME|	Date and time when the patient was admitted to the hospital.|
|DISCHTIME|	Date and time when the patient was discharged from the hospital.|
|DEATHTIME|	Date and time of death if the patient died during the hospital stay.|
|ADMISSION_TYPE|	Type of admission (e.g., ELECTIVE, EMERGENCY, URGENT, NEWBORN).|
|ADMISSION_LOCATION|	Location from which the patient was admitted (e.g., EMERGENCY ROOM ADMIT).|
|DISCHARGE_LOCATION|	Location to which the patient was discharged (e.g., HOME, SKILLED NURSING FACILITY).|
|INSURANCE|	Type of insurance used for the admission (e.g., Medicare, Private, Medicaid, Self Pay).|
|LANGUAGE|	Primary language of the patient (nullable).|
|RELIGION|	Stated religion of the patient (e.g., CATHOLIC, JEWISH).|
|MARITAL_STATUS|	Marital status of the patient (e.g., SINGLE, MARRIED, DIVORCED).|
|ETHNICITY|	Stated ethnicity of the patient (e.g., WHITE, BLACK/AFRICAN AMERICAN, HISPANIC/LATINO).|
|EDREGTIME|	Time the patient registered in the emergency department (if applicable).|
|EDOUTTIME|	Time the patient was discharged from the emergency department (if applicable).|
|DIAGNOSIS|	Free-text primary diagnosis recorded at admission.|
|HOSPITAL_EXPIRE_FLAG|	Indicator for whether the patient expired during the hospital stay (1 if expired, 0 otherwise).|
|HAS_CHARTEVENTS_DATA|	Indicator for whether the admission has associated data in the CHARTEVENTS table (1 if yes).|

## 2. CALLOUT.csv DATA INFORMATION

In [27]:
CALLOUT=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\CALLOUT.csv')
CALLOUT.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,SUBMIT_WARDID,SUBMIT_CAREUNIT,CURR_WARDID,CURR_CAREUNIT,CALLOUT_WARDID,CALLOUT_SERVICE,REQUEST_TELE,...,CALLOUT_STATUS,CALLOUT_OUTCOME,DISCHARGE_WARDID,ACKNOWLEDGE_STATUS,CREATETIME,UPDATETIME,ACKNOWLEDGETIME,OUTCOMETIME,FIRSTRESERVATIONTIME,CURRENTRESERVATIONTIME
0,402,854,175684,52.0,,29.0,MICU,1,MED,0,...,Inactive,Discharged,29.0,Acknowledged,2146-10-05 13:16:55,2146-10-05 13:16:55,2146-10-05 13:24:00,2146-10-05 18:55:22,2146-10-05 15:27:44,
1,403,864,138624,15.0,,55.0,CSRU,55,CSURG,0,...,Inactive,Discharged,55.0,Acknowledged,2114-11-28 08:31:39,2114-11-28 09:42:08,2114-11-28 09:43:08,2114-11-28 12:10:02,,
2,404,864,138624,12.0,,55.0,CSRU,55,CSURG,1,...,Inactive,Discharged,55.0,Acknowledged,2114-11-30 10:24:25,2114-12-01 09:06:18,2114-12-01 12:26:05,2114-12-01 21:55:05,,
3,405,867,184298,7.0,,17.0,CCU,17,CCU,1,...,Inactive,Discharged,17.0,Acknowledged,2136-12-29 08:45:42,2136-12-29 10:17:16,2136-12-29 10:33:51,2136-12-29 18:10:02,,
4,157,306,167129,57.0,,3.0,SICU,44,NSURG,1,...,Inactive,Discharged,3.0,Acknowledged,2199-09-18 11:47:47,2199-09-18 11:47:47,2199-09-18 11:58:33,2199-09-18 15:10:02,,


In [28]:
CALLOUT.shape

(34499, 24)

 Here’s the CALLOUT table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID|	Identifier for the patient associated with the callout (links to the PATIENTS table).|
|HADM_ID	|Identifier for the hospital admission associated with the callout (links to the ADMISSIONS table).|
|SUBMIT_WARDID|	Identifier for the ward where the callout was submitted.|
|SUBMIT_CAREUNIT|	Care unit where the callout was initiated (e.g., ICU, General Ward).|
|CURR_WARDID|	Identifier for the ward where the patient was located at the time of callout submission.|
|CURR_CAREUNIT|	Care unit where the patient was located at the time of callout submission.|
|CALL_WARDID	|Identifier for the ward to which the patient is being transferred.|
|CALL_CAREUNIT|	Care unit to which the patient is being transferred (e.g., ICU, Recovery Unit).|
|BASE_WARDID|	Identifier for the patient's originating ward before the transfer.|
|BASE_CAREUNIT|	Care unit of the patient before the transfer.|
|SERVICEREQ|	Service requested for the callout (e.g., Surgery, Medicine).|
|CALLOUT_REASON|	Reason for the callout (e.g., Transfer to ICU, Procedure Request).|
CALLOUT_OUTCOME|	Outcome of the callout (e.g., Executed, Canceled).|
|DISCHARGE_WARDID|	Identifier for the ward where the patient was discharged after the callout.|
|ACKNOWLEDGE_STATUS|	Status indicating whether the callout was acknowledged (e.g., Acknowledged, Unacknowledged).|
|CREATETIME|	Timestamp for when the callout was created.|
|UPDATETIME|	Timestamp for when the callout was last updated.|
|ACKNOWLEDGETIME|	Timestamp for when the callout was acknowledged (if applicable).|
|OUTCOMETIME|	Timestamp for when the callout outcome was determined.|
|FIRSTRESERVATIONTIME|	Timestamp for the first reservation time associated with the callout.|
|CURRENTRESERVATIONTIME|	Timestamp for the most recent reservation time associated with the callout.|


## 3. CAREGIVERS.csv DATA INFORMTAION

In [29]:
CAREGIVERS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\CAREGIVERS.csv')
CAREGIVERS.head()

Unnamed: 0,ROW_ID,CGID,LABEL,DESCRIPTION
0,2228,16174,RO,Read Only
1,2229,16175,RO,Read Only
2,2230,16176,Res,Resident/Fellow/PA/NP
3,2231,16177,RO,Read Only
4,2232,16178,RT,Respiratory


In [30]:
CAREGIVERS.shape

(7567, 4)

Here’s the CAREGIVERS table description in tabular format:

|Column Name |	Description|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID      |  Unique identifier for each row in the table (used for internal tracking).|
|CGID        |	Unique identifier for each caregiver (used as a foreign key in other tables).|
|LABEL       |	Free-text label describing the caregiver's role or designation (e.g., Nurse).|
|DESCRIPTION |	Detailed description of the caregiver's role or specialty (e.g., RN, Physician).|

## 4. CPTEVENTS (2).csv DATA INFORMATION

In [31]:
CPTEVENTS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\CPTEVENTS.csv')
CPTEVENTS.head()

  CPTEVENTS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\CPTEVENTS.csv')


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,COSTCENTER,CHARTDATE,CPT_CD,CPT_NUMBER,CPT_SUFFIX,TICKET_ID_SEQ,SECTIONHEADER,SUBSECTIONHEADER,DESCRIPTION
0,317,11743,129545,ICU,,99232,99232.0,,6.0,Evaluation and management,Hospital inpatient services,
1,318,11743,129545,ICU,,99232,99232.0,,7.0,Evaluation and management,Hospital inpatient services,
2,319,11743,129545,ICU,,99232,99232.0,,8.0,Evaluation and management,Hospital inpatient services,
3,320,11743,129545,ICU,,99232,99232.0,,9.0,Evaluation and management,Hospital inpatient services,
4,321,6185,183725,ICU,,99223,99223.0,,1.0,Evaluation and management,Hospital inpatient services,


In [32]:
CPTEVENTS.shape

(573146, 12)

Here’s the CPTEVENTS table description in tabular format:

|**Column Name**	|**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID|	Identifier for the patient associated with the event (links to the PATIENTS table).|
|HADM_ID|	Identifier for the hospital admission associated with the event (links to the ADMISSIONS table).|
|COSTCENTER|	Department or cost center responsible for the procedure or event (e.g., LABORATORY, RADIOLOGY).|
|CHARTDATE|	Date on which the procedure or event was recorded.|
|CPT_CD|	Current Procedural Terminology (CPT) code for the procedure or event.|
|CPT_NUMBER|	A numeric representation of the CPT code (used for internal mapping).|
|CPT_SUFFIX|	Suffix associated with the CPT code, providing additional detail about the procedure (if applicable).|
|TICKET_ID_SEQ|	Sequence number of the ticket or order associated with the procedure.|
|SECTIONHEADER|	High-level category of the CPT code (e.g., SURGERY, IMAGING).|
|SUBSECTIONHEADER|	Sub-category providing more specific context for the CPT code (e.g., ABDOMINAL SURGERY).|
|DESCRIPTION|	Free-text description of the procedure or event (e.g., Chest X-Ray, Blood Draw).|

## 5. D-CPT.csv DATA INFORMATION

In [33]:
D_CPT=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\D_CPT.csv')
D_CPT.head()

Unnamed: 0,ROW_ID,CATEGORY,SECTIONRANGE,SECTIONHEADER,SUBSECTIONRANGE,SUBSECTIONHEADER,CODESUFFIX,MINCODEINSUBSECTION,MAXCODEINSUBSECTION
0,1,1,99201-99499,Evaluation and management,99201-99216,Office/other outpatient services,,99201,99216
1,2,1,99201-99499,Evaluation and management,99217-99220,Hospital observation services,,99217,99220
2,3,1,99201-99499,Evaluation and management,99221-99239,Hospital inpatient services,,99221,99239
3,4,1,99201-99499,Evaluation and management,99241-99255,Consultations,,99241,99255
4,5,1,99201-99499,Evaluation and management,99261-99263,Follow-up inpatient consultations (deleted codes),,99261,99263


In [34]:
D_CPT.shape

(134, 9)

Here’s the D_CPT table description in tabular format:

|Column Name|	Description|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID	|Unique identifier for each row in the table (used for internal tracking).|
|CATEGORY	|Broad category of the CPT code (e.g., Radiology, Surgery).|
|SECTIONRANGE|	Range of CPT codes that fall within the specific section (e.g., 10000-19999 for Surgery).|
|SECTIONHEADER	|High-level category name of the CPT section (e.g., SURGERY, MEDICINE).|
|SUBSECTIONHEADER|	More specific sub-category name within the CPT section (e.g., CARDIOVASCULAR SURGERY).|
|CODESUFFIX	|Suffix for the CPT code, providing additional detail about the procedure (if applicable).|
|MINCODEINSUBSECTION|	Smallest CPT code within the subsection.|
|MAXCODEINSUBSECTION	|Largest CPT code within the subsection.|
|CATEGORYDESCRIPTION	|Description of the category of CPT codes (e.g., "Procedures involving the chest and thorax").|

## 6. D_ICD_DIAGNOSES.csv DATA INFORMATION

In [35]:
D_ICD_DIAGNOSES=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\D_ICD_DIAGNOSIS.csv')
D_ICD_DIAGNOSES.head()

Unnamed: 0,ROW_ID,ICD9_CODE,SHORT_TITLE,LONG_TITLE
0,174,1166,TB pneumonia-oth test,"Tuberculous pneumonia [any form], tubercle bac..."
1,175,1170,TB pneumothorax-unspec,"Tuberculous pneumothorax, unspecified"
2,176,1171,TB pneumothorax-no exam,"Tuberculous pneumothorax, bacteriological or h..."
3,177,1172,TB pneumothorx-exam unkn,"Tuberculous pneumothorax, bacteriological or h..."
4,178,1173,TB pneumothorax-micro dx,"Tuberculous pneumothorax, tubercle bacilli fou..."


In [36]:
D_ICD_DIAGNOSES.shape

(14567, 4)

Here’s the D_ICD_DIAGNOSES table description in tabular format:

|Column Name|	Description|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|ICD9_CODE|	ICD-9 (International Classification of Diseases, 9th Revision) code for the diagnosis.|
|SHORT_TITLE|	Abbreviated description of the diagnosis associated with the ICD-9 code.|
|LONG_TITLE|	Full descriptive title of the diagnosis associated with the ICD-9 code.|

## 7.D_ICD_PROCEDURES.csv DATA INFORMATION

In [37]:
D_ICD_PROCEDURES=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\D_ICD_PROCEDURES.csv')
D_ICD_PROCEDURES.head()

Unnamed: 0,ROW_ID,ICD9_CODE,SHORT_TITLE,LONG_TITLE
0,264,851,Canthotomy,Canthotomy
1,265,852,Blepharorrhaphy,Blepharorrhaphy
2,266,859,Adjust lid position NEC,Other adjustment of lid position
3,267,861,Lid reconst w skin graft,Reconstruction of eyelid with skin flap or graft
4,268,862,Lid reconst w muc graft,Reconstruction of eyelid with mucous membrane ...


In [38]:
D_ICD_PROCEDURES.shape

(3882, 4)

Here’s the D_ICD_PROCEDURES table description in tabular format:

|Column Name|	Description|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|ICD9_CODE|	ICD-9 (International Classification of Diseases, 9th Revision) code for the procedure.|
|SHORT_TITLE|	Abbreviated description of the procedure associated with the ICD-9 code.|
|LONG_TITLE|	Full descriptive title of the procedure associated with the ICD-9 code.

## 8. D_ITEMS.csv DATA INFORMATION

In [39]:
D_ITEMS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\D_ITEMS.csv')
D_ITEMS.head()

Unnamed: 0,ROW_ID,ITEMID,LABEL,ABBREVIATION,DBSOURCE,LINKSTO,CATEGORY,UNITNAME,PARAM_TYPE,CONCEPTID
0,457,497,Patient controlled analgesia (PCA) [Inject],,carevue,chartevents,,,,
1,458,498,PCA Lockout (Min),,carevue,chartevents,,,,
2,459,499,PCA Medication,,carevue,chartevents,,,,
3,460,500,PCA Total Dose,,carevue,chartevents,,,,
4,461,501,PCV Exh Vt (Obser),,carevue,chartevents,,,,


In [40]:
D_ITEMS.shape

(12487, 10)


Here’s the D_ITEMS table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ITEMID	|Unique identifier for each item (e.g., laboratory test, medication, observation) in the dataset.|
|LABEL|	Name or description of the item (e.g., "Blood Pressure", "Sodium Level", "Morphine").|
|CATEGORY|	Broad category to which the item belongs (e.g., "Vitals", "Laboratory", "Medications").|
|LOINC_CODE|	LOINC (Logical Observation Identifiers Names and Codes) code for the item, if applicable.|
|CHARTTIME|	Timestamp for when the item was recorded, usually linked to patient observations or lab results.|
|HADM_ID|	Identifier for the hospital admission associated with the item (foreign key to ADMISSIONS table).|
|SUBJECT_ID	|Identifier for the patient associated with the item (foreign key to PATIENTS table).|

## 9. DATETIMEEVENTS.csv DATA INFORMATION

In [41]:
DATETIMEEVENTS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\DATETIMEEVENTS.csv')
DATETIMEEVENTS.head()

Unnamed: 0,row_id,subject_id,hadm_id,icustay_id,itemid,charttime,storetime,cgid,value,valueuom,warning,error,resultstatus,stopped
0,208474,10076,198503,201006.0,5684,2107-03-25 04:00:00,2107-03-25 04:34:00,20482,2107-03-24 00:00:00,Date,,,,NotStopd
1,208475,10076,198503,201006.0,5684,2107-03-25 07:00:00,2107-03-25 07:06:00,15004,2107-03-24 00:00:00,Date,,,,NotStopd
2,208836,10076,198503,201006.0,5684,2107-03-26 04:00:00,2107-03-26 05:31:00,20834,2107-03-24 00:00:00,Date,,,,NotStopd
3,208837,10076,198503,201006.0,5684,2107-03-26 08:00:00,2107-03-26 08:33:00,17480,2107-03-24 00:00:00,Date,,,,NotStopd
4,208838,10076,198503,201006.0,5684,2107-03-26 16:00:00,2107-03-26 16:08:00,17480,2107-03-24 00:00:00,Date,,,,NotStopd


In [42]:
DATETIMEEVENTS.shape

(15551, 14)

The DATETIMEEVENTS table in the MIMIC-III dataset contains information about time-stamped events recorded in the ICU. These events are typically related to orders, such as medication administration, procedures, or other patient care activities.

Column-Wise Description of DATETIMEEVENTS Table:
|**Column Name**	|**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID	|A unique identifier for each row in the table.|
|SUBJECT_ID	|A unique identifier for the patient associated with the event.|
|HADM_ID|	A unique identifier for the hospital admission during which the event occurred.|
|ICUSTAY_ID	|A unique identifier for the ICU stay during which the event occurred.|
|ITEMID	|A unique identifier for the type of event (linked to the D_ITEMS table for descriptions).|
|CHARTTIME	|The time when the event was recorded or occurred (if available).|
|STORETIME	|The time when the event was stored in the database.|
|CATEGORY	|The category of the event (e.g., "Medication," "Procedure," etc.).|
|DESCRIPTION	|A textual description of the event type, often linked to ITEMID through the D_ITEMS table.|
|CGID	|The identifier for the caregiver (clinician) associated with the event.|
|VALUE	|The value associated with the event (e.g., a dosage amount, a measurement, etc.).|
|VALUEUOM	|The unit of measurement for the VALUE (e.g., "mg," "mL," etc.).|
|WARNING	|A flag indicating if there was a warning associated with the event (1 for warning, NaN otherwise).|
|ERROR	|A flag indicating if there was an error associated with the event (1 for error, NaN otherwise).|


## 10. DIAGNOSES_ICD.csv DATA INFORMATION

In [43]:
DIAGNOSES_ICD=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\DIAGNOSES_ICD.csv')
DIAGNOSES_ICD.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,SEQ_NUM,ICD9_CODE
0,1297,109,172335,1.0,40301
1,1298,109,172335,2.0,486
2,1299,109,172335,3.0,58281
3,1300,109,172335,4.0,5855
4,1301,109,172335,5.0,4254


In [44]:
DIAGNOSES_ICD.shape

(651047, 5)

Here’s the DIAGNOSES_ICD table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID|	Identifier for the patient associated with the diagnosis (foreign key to the PATIENTS table).|
|HADM_ID	|Identifier for the hospital admission associated with the diagnosis (foreign key to the ADMISSIONS table).|
|SEQ_NUM|	Sequential number of the diagnosis within a given admission (e.g., primary, secondary diagnoses).|
|ICD9_CODE|	ICD-9 (International Classification of Diseases, 9th Revision) code for the diagnosis.|

## 11. DRGCODES.csv DATA INFORMATION

In [45]:
DRGCODES=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\DRGCODES.csv')
DRGCODES.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,DRG_TYPE,DRG_CODE,DESCRIPTION,DRG_SEVERITY,DRG_MORTALITY
0,342,2491,144486,HCFA,28,"TRAUMATIC STUPOR & COMA, COMA <1 HR AGE >17 WI...",,
1,343,24958,162910,HCFA,110,MAJOR CARDIOVASCULAR PROCEDURES WITH COMPLICAT...,,
2,344,18325,153751,HCFA,390,NEONATE WITH OTHER SIGNIFICANT PROBLEMS,,
3,345,17887,182692,HCFA,14,SPECIFIC CEREBROVASCULAR DISORDERS EXCEPT TRAN...,,
4,346,11113,157980,HCFA,390,NEONATE WITH OTHER SIGNIFICANT PROBLEMS,,


In [46]:
DRGCODES.shape

(125557, 8)

Here’s the DRGCODES table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID|	Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID	|Identifier for the patient associated with the DRG code (foreign key to the PATIENTS table).|
|HADM_ID	|Identifier for the hospital admission associated with the DRG code (foreign key to the ADMISSIONS table).|
|DRG_CODE|	Diagnosis-Related Group (DRG) code assigned to the admission, which classifies the diagnosis for payment purposes.|
|DRG_TYPE|	Type of DRG, which indicates whether the DRG is for a medical or surgical case.|
|DRG_DESCRIPTION|	Description of the DRG code, providing a textual explanation of the diagnosis group (e.g., "Heart failure").|

## 12. medical_health.csv DATA INFORMATION

In [47]:
medical_health=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\medical_health.csv')
medical_health.head()

Unnamed: 0,hadm_id,avg_albumin,std_albumin,avg_bicarbonate,std_bicarbonate,avg_blood_glucose,std_blood_glucose,avg_blood_urea_nitrogen,std_blood_urea_nitrogen,avg_creatinine,...,avg_hematrocrit,std_hematrocrit,avg_platelet_count,std_platelet_count,avg_potasssium,std_potasssium,avg_sodium,std_sodium,avg_white_blood_cells,std_white_blood_cells
0,100003.0,2.4,0.173205,19.666667,3.983298,96.833333,26.798632,34.5,16.071714,1.083333,...,27.781818,2.644927,145.285714,27.219566,4.783333,0.556477,132.0,1.264911,13.328571,2.566265
1,100006.0,2.0,,29.384615,3.990373,105.0,33.578267,17.076923,3.226493,0.638462,...,31.853846,2.733318,228.615385,35.998932,4.076923,0.29764,131.384615,1.980676,9.7,2.840188
2,100009.0,4.3,,25.2,1.923538,152.235294,42.697379,17.0,3.63318,0.783333,...,35.914286,3.575112,142.571429,28.814679,4.242857,0.222539,137.714286,3.039424,12.385714,2.927131
3,100011.0,2.45,0.070711,27.1875,2.286737,132.458333,19.834049,11.764706,2.10741,0.817647,...,27.562963,5.553665,518.058824,419.045116,4.084615,0.473871,138.037037,1.911131,11.829412,3.451044
4,100012.0,3.85,0.353553,26.555556,3.778595,111.166667,14.288004,12.909091,1.868397,0.845455,...,28.486667,4.719695,198.384615,65.723586,4.185714,0.382013,132.928571,2.164905,8.930769,2.450301


In [48]:
medical_health.shape

(30816, 21)

## 13. MICROBIOLOGYEVENTS.csv DATA INFORMATION

In [49]:
MICROBIOLOGYEVENTS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\MICROBIOLOGYEVENTS.csv')
MICROBIOLOGYEVENTS.head()

Unnamed: 0,row_id,subject_id,hadm_id,chartdate,charttime,spec_itemid,spec_type_desc,org_itemid,org_name,isolate_num,ab_itemid,ab_name,dilution_text,dilution_comparison,dilution_value,interpretation
0,134694,10006,142345,2164-10-23 00:00:00,2164-10-23 15:30:00,70012,BLOOD CULTURE,80155.0,"STAPHYLOCOCCUS, COAGULASE NEGATIVE",2.0,,,,,,
1,134695,10006,142345,2164-10-23 00:00:00,2164-10-23 15:30:00,70012,BLOOD CULTURE,80155.0,"STAPHYLOCOCCUS, COAGULASE NEGATIVE",1.0,90015.0,VANCOMYCIN,2,=,2.0,S
2,134696,10006,142345,2164-10-23 00:00:00,2164-10-23 15:30:00,70012,BLOOD CULTURE,80155.0,"STAPHYLOCOCCUS, COAGULASE NEGATIVE",1.0,90012.0,GENTAMICIN,<=0.5,<=,1.0,S
3,134697,10006,142345,2164-10-23 00:00:00,2164-10-23 15:30:00,70012,BLOOD CULTURE,80155.0,"STAPHYLOCOCCUS, COAGULASE NEGATIVE",1.0,90025.0,LEVOFLOXACIN,4,=,4.0,I
4,134698,10006,142345,2164-10-23 00:00:00,2164-10-23 15:30:00,70012,BLOOD CULTURE,80155.0,"STAPHYLOCOCCUS, COAGULASE NEGATIVE",1.0,90016.0,OXACILLIN,=>4,=>,4.0,R


In [50]:
MICROBIOLOGYEVENTS.shape

(2003, 16)

The MICROBIOLOGYEVENTS table in the MIMIC-III dataset contains microbiology test results for patients. This table provides detailed information about cultures and sensitivity tests performed during patient care, often to identify infections and determine effective treatments.

Column-Wise Description of MICROBIOLOGYEVENTS Table:

|**Column Name**|**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|'ROW_ID'                 |	A unique identifier for each row in the table.|
|'SUBJECT_ID '             |	A unique identifier for the patient associated with the test.|
|'HADM_ID'                 |	A unique identifier for the hospital admission during which the test was performed.|
|'CHARTDATE'               |	The date on which the microbiology sample was taken (YYYY-MM-DD format).|
|'CHARTTIME'               |	The specific time when the microbiology sample was taken (if available).|
|'SPEC_ITEMID'             |	The identifier for the type of specimen collected (e.g., blood, urine, sputum).|
|'SPEC_TYPE_DESC'          |	A description of the type of specimen (e.g., "Blood Culture," "Urine Culture").|
|'ORG_ITEMID'              |	The identifier for the organism identified in the culture (if applicable).|
|'ORG_NAME'               |	The name of the organism identified in the culture (e.g., "Escherichia coli," "Staphylococcus aureus").|
|'ISOLATE_NUM'             |	The isolate number, which distinguishes between multiple isolates from the same specimen.|
|'AB_ITEMID'               |	The identifier for the antibiotic tested (if a sensitivity test was performed).|
|'AB_NAME'                 |	The name of the antibiotic tested (e.g., "Vancomycin," "Ceftriaxone").|
|'DILUTION_TEXT'           |	The text representation of the dilution tested for antibiotic sensitivity (if applicable).|
|'DILUTION_COMPARISON '    |	A symbol indicating the relationship between the tested dilution and the breakpoint (e.g., "=", ">").|
|'INTERPRETATION'          |	The interpretation of the sensitivity result (e.g., "S" for Sensitive, "R" for Resistant).|
|'CHARTTYPE'               |  The category of the microbiology event (e.g., "Culture and Sensitivity").|


## 14. NOTEEVENTS DATA INFORMATION

In [51]:
NOTEEVENTS=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\NOTEEVENTS.csv')

In [52]:
NOTEEVENTS.head()

Unnamed: 0,row_id,subject_id,hadm_id,chartdate,category,description,cgid,iserror,text
0,776,20007,188442.0,2183-10-29 00:00:00,Discharge summary,Report,,,Admission Date: [**2183-9-25**] Dischar...
1,777,20007,193793.0,2184-01-20 00:00:00,Discharge summary,Report,,,Admission Date: [**2184-1-16**] Dischar...
2,778,59883,118446.0,2103-04-18 00:00:00,Discharge summary,Report,,,Admission Date: [**2103-4-11**] ...
3,779,17043,157985.0,2103-10-11 00:00:00,Discharge summary,Report,,,Admission Date: [**2103-10-7**] Dischar...
4,785,7019,189488.0,2131-04-06 00:00:00,Discharge summary,Report,,,Admission Date: [**2131-4-2**] D...


In [53]:
NOTEEVENTS.shape

(2083180, 9)

The NOTEEVENTS table in the MIMIC-III dataset contains unstructured free-text clinical notes documented during patient care. These notes include a wide variety of information, such as physician observations, radiology reports, nursing progress notes, and discharge summaries.

Here is a column-wise description of the NOTEEVENTS table:

| **Column Name** |**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|'ROW_ID	'        |A unique identifier for each row in the table.|
|'SUBJECT_ID'		    |A unique identifier for the hospital admission during which the note was recorded.|
|'CHARTDATE'	    |The calendar date on which the note was recorded (YYYY-MM-DD format).|
|'CHARTTIME'	    |The specific time when the note was recorded (if available).|
|'STORETIME'	    |The timestamp indicating when the note was stored in the database.|
|'CATEGORY'	    |The category of the note (e.g., Discharge summary, Nursing/other, Radiology, ECG, etc.).|
|'DESCRIPTION'	|A brief description of the note content or type (e.g., "Nursing Progress Note," "ECG Report").|
|'CGID'	        |The identifier for the caregiver (clinician) who wrote the note (if available).|
|'ISERROR'	    |Indicates if there was an error in the note (1 if erroneous, NaN otherwise).|
|'TEXT'	        |The full text of the clinical note. This field contains the unstructured free text.|

## 15. PROCEDURES_ICD.csv DATA INFORMATION

In [54]:
PROCEDURES_ICD=pd.read_csv(r'D:\Final_year_project\final project dataset\final project\PROCEDURES_ICD.csv')
PROCEDURES_ICD.head()

Unnamed: 0,ow_id,subject_id,hadm_id,seq_num,icd9_code
0,3994,10114,167957,1,3605
1,3995,10114,167957,2,3722
2,3996,10114,167957,3,8856
3,3997,10114,167957,4,9920
4,3998,10114,167957,5,9671


In [55]:
PROCEDURES_ICD.shape

(506, 5)

Here’s the PROCEDURES_ICD table description in tabular format:

|**Column Name**|	**Description**|
|------------------------|-------------------------------------------------------------------------------------------------------|
|ROW_ID	|Unique identifier for each row in the table (used for internal tracking).|
|SUBJECT_ID	|Identifier for the patient associated with the procedure (foreign key to the PATIENTS table).|
|HADM_ID	|Identifier for the hospital admission associated with the procedure (foreign key to the ADMISSIONS tabl|e).|
|SEQ_NUM	|Sequential number of the procedure within a given admission (e.g., primary, secondary procedures).|
|ICD9_CODE|	ICD-9 (International Classification of Diseases, 9th Revision) code for the procedure performed.|