# Chapter 2: HealthData

This chapter provides an overview of health data, highlighting its diverse modalities, including structured data (like diagnosis and procedure codes) and unstructured data (such as clinical notes and medical images). It discusses the rapid adoption of Electronic Health Records (EHRs) by healthcare providers, illustrating the growth in their usage over the past decade. EHRs compile a comprehensive record of patient encounters, which includes various types of data managed by providers, payers, pharmacies, and researchers. The chapter also examines the life cycle of health data, the challenges in managing both structured and unstructured data, and the importance of established health data standards like ICD and CPT. These data types are crucial for deep learning applications in healthcare, facilitating improved patient care and research.

## Key Insights
- The transition from paper-based records to EHRs has significantly increased the volume and variety of health data available.
- EHR systems are primarily managed by healthcare providers, leading to fragmented patient information across different systems.
- Structured health data, such as medical codes, is essential for billing but poses challenges for secondary uses in research.
- Unstructured data, like clinical notes, presents additional complexities, including high dimensionality and privacy concerns.
- Established health data standards, such as ICD and SNOMED CT, play a critical role in ensuring data interoperability and quality for analytics and modeling.

## Frequently Asked Questions
- What types of health data are considered structured?
  - Structured health data includes diagnosis codes, procedure codes, medication prescriptions, and patient demographics, often represented in standardized formats like ICD and CPT.

- Why is the adoption of Electronic Health Records (EHRs) important?
  - The adoption of EHRs enhances patient care by providing comprehensive, easily accessible medical histories, allowing for better data management, research opportunities, and improved healthcare delivery.

- What challenges do researchers face when analyzing unstructured clinical notes?
  - Researchers encounter challenges such as high dimensionality, the need for external medical knowledge, and privacy issues, which limit access to sensitive clinical text data.
  
- How do health data standards like ICD and CPT contribute to healthcare analytics
  - Health data standards provide a consistent framework for coding diseases and procedures, which facilitates the aggregation, comparison, and analysis of health data across different providers and systems, improving data quality and interoperability.


## Exercises


1. **Most Useful Health Data for Predicting Patient Outcome (e.g., Mortality):**
   - Clinical data including vital signs (e.g., heart rate, blood pressure), lab results (e.g., blood glucose, electrolytes), demographic information (age, gender), medical history, comorbidities, medication records, and social determinants of health.

2. **Most Accessible Health Data:**
   - **Electronic Health Records (EHR):** They are widely available in healthcare settings, though access can be restricted due to privacy laws. 
   - **Claims Data:** Accessible from insurance companies, but may have limitations.
   - **Public Health Data:** Often available from governmental health organizations and agencies.

3. **Most Difficult Health Data to Access and Model:**
   - **Unstructured Data:** Such as clinical notes and imaging reports, due to their non-standardized formats.
   - **Genomic Data:** Requires specialized knowledge and consent for access.
   - **Patient-reported Outcomes:** Can be difficult to standardize and quantify.

4. **Important Health Data Not Described in This Chapter:**
   - Patient-reported outcomes, social determinants of health, genomic data, and environmental exposure data.

5. **Which of the Following is NOT True About Electronic Health Records (EHR)?**
   - (a) **EHR data from a single hospital consists of complete clinical history from each patient.**
     - This is not true because EHR data may not capture all patient encounters, especially if a patient has visited multiple healthcare facilities.

6. **Which of the Following is NOT True About Clinical Notes?**
   - (d) **Because of its unstructured format, it is easy for computer algorithms to process the notes.**
     - This is not true; the unstructured format makes it challenging for algorithms to process clinical notes.

7. **Which of the Following are Limitations of Claims Data?**
   - (c) **Claims data are rare and difficult to find.**
     - This is not true; claims data are commonly available as they are generated for billing purposes.

8. **Which of the Following is NOT True?**
   - (c) **Continual signals are rarely collected in hospitals.**
     - This is not true; continuous signals, such as heart rate and blood pressure, are often collected in inpatient settings.

9. **Which of the Following are NOT Imaging Data?**
   - (c) **Electrocardiogram (ECG).**
     - ECGs are not considered imaging data in the same context as the other modalities listed, as they provide electrical activity readings rather than visual images.

10. **What is True About Medical Literature Data?**
    - (a) **They are difficult to parse because of the natural language format.**
      - This is true; medical literature often contains complex language and jargon that makes parsing difficult.

11. **Which of the Following is a Medical Ontology for Medications?**
    - (b) **RxNorm.**
      - RxNorm is a standardized nomenclature for medications.

12. **Which of the Following is NOT Clinical Trial Data?**
    - (d) **Electronic Health Records.**
      - EHRs are not considered clinical trial data, though they may contain data relevant to clinical trials.

13. **Which of the Following is NOT True About Drug Data?**
    - (b) **Drug data are standard.**
      - This statement is not entirely true, as drug data can vary widely between different systems and sources.