## ICD-9 code 

Using the **MIMIC dataset**, several types of predictions can be made by leveraging the rich clinical data it provides. Below are common predictive tasks, examples of columns and attributes used, and approaches for these analyses.

---

### **1. Predicting ICU Mortality**
   - **Goal**: Predict whether a patient will survive their ICU stay.
   - **Key Columns and Attributes**:
     - From `CHARTEVENTS`:
       - Vital signs: Heart rate, blood pressure, respiratory rate, temperature.
       - Lab results: Blood glucose, creatinine, pH, lactate.
     - From `ICUSTAYS`:
       - ICU admission and discharge times.
       - Length of ICU stay.
     - From `ADMISSIONS`:
       - Admission type: Elective, emergency.
       - Diagnosis text or ICD codes.
     - Demographics:
       - Age, gender, ethnicity.
   - **Approach**:
     - Time-series models like LSTMs for sequential data.
     - Feature engineering for static features and gradient-boosted models like XGBoost.

---

### **2. Predicting Length of Stay (LOS)**
   - **Goal**: Estimate the number of days a patient will spend in the ICU or hospital.
   - **Key Columns and Attributes**:
     - From `ICUSTAYS`:
       - Admission and discharge times.
     - From `CHARTEVENTS`:
       - First 24 hours of vitals and lab results (e.g., heart rate, blood pressure, creatinine).
     - From `ADMISSIONS`:
       - Admission type and source.
     - From `PATIENTS`:
       - Age and chronic conditions.
   - **Approach**:
     - Regression models (e.g., Linear Regression, Random Forest Regressor).
     - Time-series analysis for sequential trends.

---

### **3. Sepsis Prediction**
   - **Goal**: Predict the onset of sepsis based on patient data.
   - **Key Columns and Attributes**:
     - From `CHARTEVENTS`:
       - Heart rate, respiratory rate, temperature, white blood cell count, and lactate levels.
     - From `LABEVENTS`:
       - Blood culture results.
     - From `INPUTEVENTS`:
       - Fluid intake and drug administration (e.g., antibiotics).
   - **Approach**:
     - Feature engineering to identify trends over time (e.g., lactate rising).
     - Gradient-boosting models or recurrent neural networks (RNNs).

---

### **4. Readmission Prediction**
   - **Goal**: Predict whether a patient will be readmitted to the hospital within 30 days of discharge.
   - **Key Columns and Attributes**:
     - From `ADMISSIONS`:
       - Discharge date and type.
     - From `CHARTEVENTS`:
       - Clinical stability indicators at discharge.
     - From `PATIENTS`:
       - Chronic conditions and comorbidities.
   - **Approach**:
     - Logistic regression or classification models.
     - Feature selection from discharge-related data.

---

### **5. Predicting Diagnoses (ICD Code Prediction)**
   - **Goal**: Predict ICD-9 codes based on patient clinical data.
   - **Key Columns and Attributes**:
     - From `NOTEEVENTS`:
       - Clinical notes and discharge summaries.
     - From `CHARTEVENTS`:
       - Vitals, interventions, and lab results.
     - From `LABEVENTS`:
       - Blood tests and other lab measurements.
   - **Approach**:
     - Natural Language Processing (NLP) for note text (e.g., embeddings using BERT or Word2Vec).
     - Multi-label classification using neural networks.

---

### **6. Ventilator Use Prediction**
   - **Goal**: Predict whether a patient will require mechanical ventilation.
   - **Key Columns and Attributes**:
     - From `CHARTEVENTS`:
       - SpO2 (oxygen saturation), respiratory rate, blood gases (pCO2, pO2).
     - From `INPUTEVENTS`:
       - Drugs related to sedation or muscle relaxation.
   - **Approach**:
     - Binary classification using decision trees, random forests, or deep learning.

---

### **7. Predicting Outcomes for Specific Conditions**
   - **Example**: Predicting outcomes for patients with acute kidney injury (AKI).
   - **Key Columns and Attributes**:
     - From `CHARTEVENTS`:
       - Creatinine levels, urine output, blood pressure.
     - From `LABEVENTS`:
       - Electrolytes, pH levels.
   - **Approach**:
     - Combining static features (age, gender) and dynamic features (creatinine trends).

---

### General Workflow for Predictions Using MIMIC Data
1. **Data Extraction**:
   - Identify relevant tables (e.g., `CHARTEVENTS`, `LABEVENTS`, `ADMISSIONS`).
   - Use SQL queries to extract data, joining on `SUBJECT_ID`, `HADM_ID`, or `ICUSTAY_ID`.

2. **Data Cleaning**:
   - Handle missing values, outliers, and erroneous data.
   - Standardize units of measurement (e.g., converting °F to °C).

3. **Feature Engineering**:
   - Aggregate time-series data into summary statistics (e.g., max, min, mean).
   - Extract sequential patterns from time-series data.

4. **Modeling**:
   - Select appropriate algorithms (e.g., logistic regression for classification, LSTM for time-series).
   - Train models using patient data.

5. **Evaluation**:
   - Use metrics like accuracy, precision, recall, AUROC, and RMSE (for regression).

---

### Python Example for ICU Mortality Prediction
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load data
chartevents = pd.read_csv("CHARTEVENTS.csv", usecols=['SUBJECT_ID', 'ICUSTAY_ID', 'ITEMID', 'VALUE', 'CHARTTIME'])
admissions = pd.read_csv("ADMISSIONS.csv", usecols=['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME'])

# Merge and preprocess data
merged_data = pd.merge(chartevents, admissions, on='SUBJECT_ID', how='inner')

# Feature engineering
# Example: Extracting vital stats from first 24 hours
first_24h = merged_data[merged_data['CHARTTIME'] < (merged_data['ADMITTIME'] + pd.Timedelta(hours=24))]
features = first_24h.groupby('SUBJECT_ID')['VALUE'].mean().reset_index()
features.rename(columns={'VALUE': 'AVG_VITALS'}, inplace=True)

# Add target variable (mortality)
features = pd.merge(features, admissions[['SUBJECT_ID', 'DEATHTIME']], on='SUBJECT_ID', how='left')
features['MORTALITY'] = features['DEATHTIME'].notnull().astype(int)

# Train-test split
X = features[['AVG_VITALS']]
y = features['MORTALITY']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Evaluation
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
```

By combining domain knowledge, robust feature engineering, and advanced models, MIMIC data allows for impactful predictive insights in healthcare.

To predict ICD-9 codes (International Classification of Diseases, 9th Revision), the following types of information are typically needed. These details describe the patient's medical condition and associated healthcare data:

### 1. **Demographic Information**
   - **Age**: Helps in diagnosing age-specific conditions.
   - **Sex**: Some diseases are gender-specific or more common in one gender.
   - **Ethnicity**: Certain conditions have higher prevalence in specific ethnic groups.

### 2. **Clinical Information**
   - **Primary Diagnosis**: The main reason for the patient's visit or admission.
   - **Secondary Diagnoses**: Any coexisting conditions that may impact treatment or outcomes.
   - **Symptoms**: A detailed description of the patient's symptoms.
   - **Vital Signs**: Blood pressure, heart rate, temperature, etc.

### 3. **Medical History**
   - **Chronic Conditions**: Previous diagnoses such as diabetes, hypertension, or asthma.
   - **Past Surgeries or Treatments**: Helps understand complications or predisposing factors.
   - **Family History**: Genetic predispositions to certain diseases.

### 4. **Laboratory Results**
   - **Blood Tests**: Levels of glucose, hemoglobin, cholesterol, etc.
   - **Urinalysis**: Indicates infections, kidney conditions, or other disorders.
   - **Imaging Results**: X-rays, CT scans, MRIs for structural abnormalities.

### 5. **Medications**
   - **Current Medications**: Can give clues about ongoing treatment and conditions.
   - **Medication History**: Allergies, prior adverse reactions, and treatment patterns.

### 6. **Procedures**
   - **Diagnostic Procedures**: Biopsies, endoscopies, etc.
   - **Therapeutic Procedures**: Surgeries or interventions already performed.

### 7. **Social Determinants of Health**
   - **Lifestyle Choices**: Smoking, alcohol use, diet, exercise.
   - **Occupational Hazards**: Exposure to chemicals, repetitive stress injuries.
   - **Living Conditions**: Housing stability, access to healthcare, and socioeconomic status.

### 8. **Encounter Information**
   - **Reason for Visit**: Symptoms or issues prompting the encounter.
   - **Length of Stay**: For inpatient cases, this may hint at the severity.
   - **Specialty**: The type of healthcare provider (e.g., cardiologist, neurologist).

### 9. **Natural Language Data**
   - **Clinical Notes**: Free-text descriptions from physicians or nurses about patient conditions, examination findings, and differential diagnoses.

### 10. **Behavioral and Psychological Assessments**
   - **Mental Health Diagnoses**: Depression, anxiety, or other psychiatric conditions.
   - **Cognitive Testing Results**: When relevant, for conditions like dementia or developmental delays.

By collecting and preprocessing this information, predictive models like machine learning algorithms can classify conditions into appropriate ICD-9 codes. However, it's crucial to ensure patient privacy and follow HIPAA (Health Insurance Portability and Accountability Act) regulations when working with such sensitive data.

In [1]:
import pandas as pd
import numpy as np

In [2]:
Admissions=pd.read_csv(r'D:\FINALYEARPROJECTREC\data\ADMISSIONS.csv')

In [3]:
Admissions.head(2)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,09-04-2196 12:26,10-04-2196 15:54,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,09-04-2196 10:06,09-04-2196 13:24,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,03-09-2153 07:15,08-09-2153 19:10,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,,,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1


In [4]:
Admissions.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,09-04-2196 12:26,10-04-2196 15:54,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,09-04-2196 10:06,09-04-2196 13:24,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,03-09-2153 07:15,08-09-2153 19:10,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,,,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1
2,23,23,124321,18-10-2157 19:34,25-10-2157 14:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,,,BRAIN MASS,0,1
3,24,24,161859,06-06-2139 16:14,09-06-2139 12:48,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME,Private,,PROTESTANT QUAKER,SINGLE,WHITE,,,INTERIOR MYOCARDIAL INFARCTION,0,1
4,25,25,129635,02-11-2160 02:06,05-11-2160 14:55,,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,,UNOBTAINABLE,MARRIED,WHITE,02-11-2160 01:01,02-11-2160 04:27,ACUTE CORONARY SYNDROME,0,1


In [5]:
Admissions[['ROW_ID', 'SUBJECT_ID', 'HADM_ID','DIAGNOSIS','ETHNICITY',]]

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,DIAGNOSIS,ETHNICITY
0,21,22,165315,BENZODIAZEPINE OVERDOSE,WHITE
1,22,23,152223,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,WHITE
2,23,23,124321,BRAIN MASS,WHITE
3,24,24,161859,INTERIOR MYOCARDIAL INFARCTION,WHITE
4,25,25,129635,ACUTE CORONARY SYNDROME,WHITE
...,...,...,...,...,...
58971,58594,98800,191113,TRAUMA,WHITE
58972,58595,98802,101071,SAH,WHITE
58973,58596,98805,122631,RENAL CANCER/SDA,WHITE
58974,58597,98813,170407,S/P FALL,WHITE


In [7]:
note=pd.read_csv(r'data\NOTEEVENTS.csv')

In [12]:
note['text'][20]

"Admission Date:  [**2149-7-30**]              Discharge Date:   [**2149-8-5**] Date of Birth:  [**2087-11-13**]             Sex:   M Service: MEDICINE Allergies: Penicillins Attending:[**First Name3 (LF) 3984**] Chief Complaint: Confusion & agitation Major Surgical or Invasive Procedure: Endotracheal intubation Central venous line & arterial line placement EEG History of Present Illness:   61yo male with HBV cirrhosis complicated by portal HTN, gastric varicies, and s/p TIPS who was transferred to [**Hospital1 18**] from outside facility for confusion & agitation.  He was also noted to be jaundiced with asterixis.  Pt had also sustained a fall with facial trauma several weeks ago.   In the ED, he desaturated to 80% on 6L/NC and was found to have EKG evidence of an acute anterior MI.  He was intubated for airway protection & seen by cardiology.  He had an emergent ECHO which demonstrated an EF of 40-50%.  He was felt to be too high risk for anticoagulation or catheterization, and was t