# Healthcare happens over time

## Introduction

In healthcare data, the **patient timeline** is a crucial element, as it represents a chronological record of all interactions, treatments, and outcomes over time. This timeline spans various **timescales**, from individual visits to long-term chronic conditions. Different actors—such as healthcare providers, patients, pharmacies, and insurance companies—capture fragments of this timeline, leading to **fragmentation** and challenges in **data linkage**. Understanding how to effectively represent and utilize the patient timeline is key to making sense of healthcare data and answering research questions.

### Key Considerations for Patient Timelines and Timescales

1. **Timescales of Questions**:
   - **Short-Term Questions**: These might focus on acute events or immediate treatment outcomes, such as the impact of a particular intervention on recovery within a few days or weeks.
   - **Long-Term Questions**: These include chronic disease progression, long-term medication adherence, or risk factors for developing a condition over several years.

   The timescale of the research question determines what part of the patient timeline is relevant, and which data sources or entities might provide the needed information.

2. **Representation of Time**:
   - **Granularity**: The level of detail in time measurements (e.g., hours, days, weeks) depends on the research focus. More granular data are needed for precise timing, such as the effect of drugs administered within a narrow time window, while coarser data may suffice for long-term studies.
   - **Event-Based Representation**: Some representations focus on specific events like diagnoses, procedures, or medication refills.
   - **Continuous Representation**: In other cases, continuous measurements over time (e.g., blood pressure monitoring) are necessary.

3. **Challenges of Linking Data Across Sources**:
   - **Fragmentation**: Each healthcare entity captures only a portion of the patient timeline. For example, pharmacies capture medication refills, hospitals record inpatient events, and patients themselves might provide self-reported outcomes.
   - **Inconsistent Time Recording**: Different systems might log events at different resolutions or timestamps, making it difficult to align the data.
   - **Missing Data**: Gaps in the patient timeline may arise due to missed visits, unrecorded treatments (e.g., use of free samples), or delays in documentation.

4. **Dynamic Nature of Healthcare Data**:
   - **Time-Varying Data**: Patient data are not static—conditions evolve, medications change, and new treatments are introduced. This requires approaches that can handle **temporal dependencies**, where the past influences the present.
   - **Censoring and Delays**: Some data points may only become available after a delay (e.g., lab results), or certain events (e.g., death) may prevent further data collection, leading to **censored data**.
   
5. **Temporal Bias and Inaccuracy**:
   - **Recall Bias**: Patient-reported outcomes may suffer from inaccuracies due to memory limitations.
   - **Documentation Lag**: Healthcare providers may document events after they occur, leading to time lags that skew data interpretation.

### Implications for Research Questions

The timescale of the patient timeline directly impacts the type of research questions you can ask. For example:

- **Short-Term Questions**: "How does a specific intervention impact patient outcomes within 30 days?"
- **Long-Term Questions**: "What are the risk factors for developing a chronic disease over the next 10 years?"

### Effective Use of Temporal Healthcare Data

To work effectively with time-based healthcare data:
- **Align data sources temporally** by accounting for different timescales, resolutions, and event timings.
- **Use longitudinal data** to track patient changes over time and ensure that analyses reflect temporal dependencies.
- **Mitigate missing data** by using techniques like **imputation** or **longitudinal cohort studies** to fill in gaps where data are unavailable.

Understanding the patient timeline in relation to research timescales is key to producing accurate, meaningful insights from healthcare data.

## Time, timelines, timescales and representations of time

The timeline approach for integrating diverse patient data captures each event's timing, offering a holistic view of a patient's medical history. This framework provides a practical solution to understanding the temporal relationships between events, which is critical for analyzing causes and effects in healthcare. Here’s a breakdown of key points:

### Importance of Time in Healthcare:
1. **Age-Related Variability**:
   - Medical care varies significantly by age group. Pediatric, adult, and geriatric patients have different susceptibilities to diseases, different responses to medications, and unique healthcare needs. Age also affects insurance coverage and access to healthcare services.
   
2. **Causal Relationships**:
   - Temporal order is key in understanding causality. If one event (e.g., medication exposure) leads to another (e.g., an adverse reaction), we expect to see a clear sequence in the patient’s timeline, with the cause preceding the effect.

### Timescales in Medical Questions:
Healthcare data spans multiple timescales, from milliseconds (e.g., heart rhythm recordings) to decades (e.g., tracking lifetime medication exposure). This wide range of timescales affects how data are analyzed, depending on whether you’re focusing on short-term events or long-term health trends.

### Non-Stationarity in Healthcare Data:
1. **Changing Healthcare Environment**:
   - The healthcare system is dynamic. New medications, treatments, diseases, and even coding practices emerge over time, influencing the meaning of data. This evolution introduces **non-stationarity**, where the data distributions shift over time, unlike in stationary processes where data distributions remain consistent.

2. **Implications for Analysis**:
   - Non-stationarity complicates data analysis because models that assume consistent patterns over time might fail. Researchers must account for these evolving trends to ensure their analyses remain accurate over time.

Understanding the temporal structure of healthcare data allows for more nuanced insights, particularly in causal inference and adapting to the evolving nature of healthcare systems.

## Timescale: Choosing the relevant units of time

Examining timescales in healthcare is crucial because different medical conditions, treatments, and systems operate over vastly different time intervals. To build intuition, let's break down some examples across various timescales and how they relate to disease processes, technological measurements, and the healthcare system.

### Exercise: Identifying Appropriate Timescales

1. **Fractions of a Second (Milliseconds to Seconds)**:
   - **Disease Process**: Immediate physiological responses such as heartbeats or neural activity.
   - **Measurements**: Heart rate monitors, electrocardiograms (ECGs), or neural recordings measure at these fast intervals.
   - **Healthcare System**: Real-time monitoring in critical care settings, such as during surgery or in the ICU.

   **Example**: Monitoring a patient's heart rhythm to detect arrhythmias.

2. **Seconds to Minutes**:
   - **Disease Process**: Short-term responses such as reactions to anesthesia or immediate allergic responses.
   - **Measurements**: Blood pressure, oxygen levels, or pulse oximetry are measured at these intervals.
   - **Healthcare System**: Immediate post-operative care or during emergency interventions.

   **Example**: Administering an epinephrine shot for anaphylaxis and observing its rapid effect.

3. **Hours to Days**:
   - **Disease Process**: Onset and progression of acute conditions such as infections or post-surgical recovery.
   - **Measurements**: Temperature, lab tests, or daily medication doses.
   - **Healthcare System**: Hospital stays, outpatient visits, and monitoring of short-term recovery.

   **Example**: Monitoring fever patterns over days in a patient with an infection.

4. **Weeks to Months**:
   - **Disease Process**: Chronic disease management, such as diabetes or hypertension, where conditions evolve more slowly.
   - **Measurements**: Blood sugar levels, blood pressure, or cholesterol levels tracked over weeks or months.
   - **Healthcare System**: Routine checkups, prescription refills, and long-term treatment plans.

   **Example**: Adjusting medication for hypertension based on blood pressure readings over several months.

5. **Years to Decades**:
   - **Disease Process**: Long-term disease development such as cancer or cardiovascular diseases.
   - **Measurements**: Cancer screenings, cumulative exposure to risk factors (e.g., smoking, environmental toxins).
   - **Healthcare System**: Lifetime medical history, risk factor management, or tracking outcomes of long-term treatments.

   **Example**: Assessing the impact of a lifetime of smoking on lung cancer risk.

6. **Lifetimes (Decades to a Century)**:
   - **Disease Process**: Aging and its associated conditions (e.g., osteoporosis, Alzheimer's).
   - **Measurements**: Longitudinal tracking of functional decline, cognitive assessments, or bone density over a lifetime.
   - **Healthcare System**: Geriatric care, end-of-life planning, and comprehensive medical records.

   **Example**: Evaluating age-related decline in cognitive function in elderly patients.

### Reflection:
The timescale you focus on is shaped by the disease process you're studying, the available technology, and the structure of the healthcare system. Short-term events like heart rhythms require fine-grained data, while long-term outcomes, such as the progression of chronic diseases, need extended time horizons. Understanding these nuances helps to better interpret healthcare data and tailor medical interventions accordingly.

## What affects the timescale

The timescale and the type of data are both key factors in determining the approach to answering healthcare questions. Here's a summary of the main points:

1. **Timescale Influences**:
   - **Question Type**: The timescale varies depending on whether the condition is acute (e.g., influenza) or chronic (e.g., diabetes). Acute conditions require shorter timescales (days to weeks), while chronic conditions span months to years or even a lifetime.
   
2. **Data Type**:
   - **Laboratory Tests**: Typically relevant for **days to weeks**, with some tests remaining relevant for **months**.
   - **Diagnoses**: Can remain relevant for **days to an entire lifetime**, depending on whether the condition is acute or chronic.

3. **Strategy Development**:
   - The combination of the **question** and **data type** informs decisions about:
     - Which **features** to use.
     - How **accurate** they need to be.
     - How many different **kinds of features** to include.
     - How to **infer the patient’s condition** based on the data.

This integrated understanding helps refine the approach for analyzing healthcare data to ensure the right features and timescales are used for accurate conclusions.

# Representation of time

## Representation of time

When representing time in healthcare data, especially in a patient timeline and the patient-feature matrix, different approaches have trade-offs depending on the precision and complexity required. Here's a look at common methods for encoding time:

### 1. **Timestamp Representation**:
   - **How it works**: Assigns a specific timestamp (e.g., date and time) to each event or measurement.
   - **Advantages**: Precise, preserves the exact moment of each event, useful for tracking sequences or detailed analysis.
   - **Disadvantages**: Difficult to handle in large datasets where events occur irregularly; timestamps add complexity when aggregating or comparing data across patients.
   - **Use case**: Tracking real-time events like medication administration or lab results.

### 2. **Interval Representation**:
   - **How it works**: Defines events or states as occurring within a specific time range (e.g., 1 month, 1 year).
   - **Advantages**: Easier to aggregate and analyze over defined periods, less precision needed than timestamps.
   - **Disadvantages**: Loss of granularity, which may be critical for understanding causal relationships.
   - **Use case**: Summarizing periods of disease progression or treatment effects.

### 3. **Age or Time Since Event**:
   - **How it works**: Measures time relative to a reference point, such as the patient's birthdate (age) or time since a specific event (e.g., diagnosis, surgery).
   - **Advantages**: Focuses on biologically meaningful intervals (e.g., age-based risk), useful for cross-patient comparisons.
   - **Disadvantages**: Can be difficult to handle multiple reference points, may obscure finer temporal details.
   - **Use case**: Analyzing disease onset or medication effects as they relate to age or time since diagnosis.

### 4. **Cumulative Time Windows**:
   - **How it works**: Groups events or measurements into cumulative periods (e.g., last 7 days, last 6 months).
   - **Advantages**: Simplifies longitudinal data by aggregating events into relevant time windows for analysis.
   - **Disadvantages**: Potential loss of temporal sequence and trends within windows, depending on window size.
   - **Use case**: Predicting outcomes based on recent lab results or vital signs.

### 5. **Time Binning**:
   - **How it works**: Groups continuous time into bins (e.g., weekly, monthly, yearly).
   - **Advantages**: Reduces data complexity, easier to manage and compare across patients, good for long-term trends.
   - **Disadvantages**: Arbitrary bin sizes might obscure important details or event sequences.
   - **Use case**: Tracking long-term disease progression or treatment adherence.

### Summary:
- **Timestamps**: Best for fine-grained, precise event tracking.
- **Intervals**: Useful for summarizing events over time, such as chronic disease management.
- **Age/Time Since Event**: Effective for analyzing data relative to a biologically meaningful point.
- **Cumulative Time Windows**: Good for simplifying recent event trends.
- **Time Binning**: Ideal for long-term trend analysis, but may lose temporal detail.

Choosing how to represent time depends on the type of data and analysis. For some analyses, exact timestamps are critical, while for others, summarizing time into intervals or age can provide more actionable insights.

## Time series and non-time series data

This section emphasizes how time is represented in medical data and how it influences the structure of patient data, especially in the context of time series and asynchronous measurements:

### Key Points:

1. **Time Series in Healthcare**: Time series data, such as EKG readings, are continuously recorded at regular intervals. In intensive care units (ICUs), time series data from sensors (e.g., heart rate, oxygen levels) are crucial for monitoring critically ill patients. Signal processing methods from electrical engineering are often applied to analyze these continuous streams of data.

2. **Asynchronous Sampling**: Most medical data are collected at irregular intervals, driven by clinical necessity. For example, blood pressure is measured only when needed. This creates challenges in representing time for analysis, as the data are not continuous like in time series.

3. **Bias in Data Collection**: Healthcare data is often biased towards times when patients are sick, as more tests and observations occur during illness. There is generally less data on healthy patients, which can limit understanding of overall health patterns.

4. **Two-Stage Data Collection**: Many medical measurements are acquired in two stages:
   - **Test Ordered**: The clinician orders a test.
   - **Test Result**: The test is performed, and the result is recorded.

   This distinction is important because some datasets, such as medical claims, may only record the **order** of the test, not the **result**. To address this, **indicator variables** can be used to record when a test was ordered, even if the result is not available.

5. **Inferring Stability**: Long periods without measurements might suggest that the patient is healthy and stable, though this is often an assumption rather than a certainty.

### Implications for Data Representation:
- **Time Series Analysis**: Methods used in ICUs and for continuous data streams apply well to regular time intervals but less so to irregular, event-driven medical data.
- **Indicator Variables**: Using separate variables to represent test orders (when the test was requested) and test results (when the actual value was recorded) allows for richer representation of the data.
- **Health Status Inference**: When data are sparse, there might be an implicit assumption that the patient is stable, but this can introduce uncertainty into analyses.

This approach to representing and analyzing time helps ensure that critical details in patient care are captured and understood, even when data are not collected continuously.

## Order of events

This passage highlights the complexities of reasoning about the **order of events** in medical data, especially when those events are not instantaneous but instead represent intervals of time (e.g., diseases like pneumonia or rheumatoid arthritis). The key challenge is that depending on how events are defined and interpreted, the relationship between them can vary significantly.

### Key Concepts:

1. **Instantaneous Events vs. Intervals**: If events are defined as **instantaneous points in time**, determining the order (e.g., A then B) is straightforward. For example, if a patient receives a diagnosis of pneumonia on one date and rheumatoid arthritis on another, we can easily say which came first.

2. **Interpreting Events as Intervals**: Many medical conditions, like pneumonia and rheumatoid arthritis, are **not instantaneous** but are **ongoing states** that persist over time. This leads to several possible interpretations when reasoning about the relationship between them:
   - **Sequential with No Overlap**: In one interpretation, event A (e.g., pneumonia) **finishes** before event B (e.g., arthritis) **starts**, such as a patient recovering from pneumonia at age 40 and later developing arthritis at age 60.
   - **Overlapping Conditions**: In another interpretation, event A (pneumonia) **overlaps** with event B (arthritis), such as a patient having both conditions at the same time.

3. **Challenges with Temporal Relationships**:
   - **"A and B"**: When considering patients with both conditions, it's unclear whether we mean that the conditions occurred **simultaneously** or at **different times** in their lives.
   - **"A before B"**: This could mean that condition A **ended before** condition B started (no overlap), or it could mean that condition A **started before** condition B, with the possibility of some overlap.

### Implications for Data Representation:

- **Timeline Representation**: A timeline is a natural way to address these complexities. It can show exactly when events start, end, and whether they overlap. This makes reasoning about the temporal relationships between events more intuitive and accurate.
  
- **General Databases**: Most general-purpose databases (e.g., relational databases) are not well-suited for capturing these distinctions, especially for interpreting and querying overlapping intervals of time.

### Practical Example:
If you want to find patients with pneumonia and arthritis, depending on how you define the relationship between these conditions, you might:
- Look for patients who had pneumonia and then developed arthritis (no overlap).
- Look for patients who had both conditions simultaneously (overlapping events).
  
By leveraging timelines, these distinctions can be captured and reasoned about more easily, improving the accuracy of medical analysis.

## Implicit representations of time

Binning is a practical approach for representing time-related data in the patient-feature matrix, especially when precise timestamps are not necessary. Here’s a detailed look at the key aspects of binning:

### Key Concepts in Binning:

1. **Defining Time Intervals (Bins)**:
   - **Bins** are predefined time intervals used to group events. For example, you might have monthly bins, quarterly bins, or yearly bins.
   - **Size of Bins**: The choice of bin size depends on the timescales relevant to your research question. For instance, if studying monthly changes, use monthly bins. For long-term trends, yearly bins might be more appropriate.

2. **Counting Events in Each Bin**:
   - **Event Count**: For each bin, count how many times a specific event or type of event occurs. These counts are then used as features in the patient-feature matrix.
   - **Example**: You might count the number of times a patient received a certain type of medication or had a specific diagnosis within each bin.

3. **Granularity of Binning**:
   - **Granularity** refers to how detailed or broad the bins are. More granular bins (e.g., weekly) provide finer detail but may be less useful if events are sparse or infrequent.
   - **Timescales**: The granularity should match the timescale of the research question. For example, if investigating seasonal patterns, monthly or quarterly bins might be appropriate.

4. **Aggregation/Summarization**:
   - **Aggregation**: Within each bin, you might need to summarize the data in various ways. Common methods include:
     - **Counting**: Total number of events.
     - **Averaging**: Average value of measurements (e.g., average blood pressure).
     - **Summing**: Total sum of continuous variables (e.g., total dosage of medication).
   - **Choosing the Method**: The choice of aggregation depends on the nature of the data and the research question. For example, if you're studying the impact of medication adherence, counting the number of doses within each bin might be most relevant.

### Design Considerations:

- **Number of Bins**: Too few bins may oversimplify the data, while too many bins can lead to sparsity and noise. Choose a number that balances detail with manageability.
- **Size of Each Bin**: The interval size should align with the timescales of interest. Larger intervals may smooth out short-term fluctuations, while smaller intervals capture more detailed changes.
- **Data Aggregation**: Ensure the aggregation method reflects the research goals. For instance, summing event counts might be useful for studying cumulative exposure, while averaging might be better for analyzing trends.

### Summary:
Binning transforms continuous time information into discrete intervals, making it easier to analyze and integrate into the patient-feature matrix. The effectiveness of binning depends on selecting appropriate bin sizes and aggregation methods tailored to the research question and the nature of the data.

## Different ways to put data in bins

When summarizing or aggregating time-related data within bins, the choice of method significantly impacts the analysis. Here’s a summary of the main approaches and considerations:

### Aggregation and Summarization Techniques:

1. **Count of Events**:
   - **Counts**: Simply counting the number of events that occur within each bin.
   - **Binary Indicator**: Marking presence or absence of events, such as a count of zero versus a positive count.

2. **Statistical Measures**:
   - **Average**: Calculating the mean of values in the bin. Useful for continuous variables where the average provides a central tendency.
   - **Maximum**: Recording the highest value in the bin. Useful for identifying extreme values or peak events.
   - **Most Recent Value**: Using the latest measurement in the bin. Important when recent values are more relevant, such as in tracking glucose control with HBA1C.

3. **Variance**:
   - **Variance**: Measuring the variability of values within the bin. For example, high variance in HBA1C measurements might indicate instability in glucose control.

4. **Rate of Change**:
   - **Rate of Change**: Calculating how a feature changes over time. For instance, tracking the rate of change in wound size can be crucial for predicting healing outcomes.

### Feature Engineering:

- **Creating New Features**: Adding derived features that capture more information from the temporal data. For example, adding a feature for the rate of change in wound size helps in predicting healing chances.
- **Medical Knowledge**: Applying domain expertise to design features that effectively capture relevant temporal patterns and relationships.

### Practical Example:

- **Diabetes Management**: For tracking type 2 diabetes, you might use the most recent HBA1C value to reflect current control and variance to capture historical stability. This provides a nuanced view of the patient’s condition over the year.

- **Wound Healing Prediction**: Include both the current size of the wound and its rate of change to better predict healing outcomes. Wounds that show slow healing rates are at higher risk of complications.

### Summary:

Selecting the right aggregation or summarization method depends on the research question and the clinical context. Feature engineering, such as calculating rates of change, adds valuable insights that improve model performance and relevance.

## Timing of exposures and outcomes

When dealing with time in cohort studies and analyzing exposures and outcomes, several specific issues and considerations arise:

### **Defining Exposure and Outcome Times**

1. **Exposure Time**:
   - **Exposed Group**: The start time is straightforward, determined by when the exposure event (e.g., drinking coffee) appears on the patient timeline.
   - **Non-Exposed Group**: The start time is less straightforward. If no exposure event is recorded, determining when they should be considered as "not exposed" can be challenging.

2. **Proxy Events for Non-Exposure**:
   - **Proxy Event Example**: Drinking tea could be used as a proxy for non-exposure to coffee. However, this can introduce selection bias by excluding those who neither drink tea nor coffee.
   - **Selection Bias**: This approach might exclude individuals who are neither exposed nor have a proxy event, affecting the control group composition.

3. **Outcome Time**:
   - **Observed Outcome**: The time of the outcome is straightforward if it is recorded on the patient timeline.
   - **Unobserved Outcome (Right Censoring)**: When the outcome has not yet occurred, you may use a special code (e.g., `>T`) to indicate the event hasn't happened by a certain time T.

### **Handling Right Censoring**

- **Special Codes**: Use codes like `>T` to represent that the outcome has not occurred by time T.
- **Last Observation Time**: Record the last observed time along with an indicator variable to differentiate between the last observation and the outcome time.

### **Constructing the Patient-Feature Matrix**

- **Features**: Create features that capture both the time of exposure and the time of the outcome.
- **Composite Features**: Include derived features like time to event, and handle right-censored data appropriately to ensure accurate analysis.

### **Summary**

- **Exposure and Outcome Times**: Accurately determine the times for exposures and outcomes, taking into account any potential biases or missing data.
- **Right Censoring**: Use appropriate methods to handle cases where the outcome has not yet occurred, ensuring your analysis accounts for these complexities.
- **Patient-Feature Matrix**: Construct it carefully, incorporating both direct and derived features to support robust analysis.

# Data change over time

## Clinical processes are non-stationary

Non-stationarity is a crucial consideration when analyzing healthcare data over time. Here’s a concise summary:

### **Non-Stationarity in Healthcare Data**

1. **Definition**:
   - **Stationary Process**: A process where the distribution of data remains consistent over time.
   - **Non-Stationary Process**: A process where the distribution of data changes over time, leading to evolving associations and predictors.

2. **Impact of Non-Stationarity**:
   - **Changing Data Elements**: New medications, treatments, and changes in coding or billing practices can alter data characteristics.
   - **Predictive Modeling**: Features or predictors that were effective in the past may become less useful or obsolete due to changes in data patterns.
   - **Combining Data**: Merging older and newer data requires careful consideration to account for shifts in data distributions.

3. **Detecting Non-Stationarity**:
   - **Machine Learning Approach**: Remove time as a feature and attempt to predict time from other variables. High accuracy in prediction indicates strong non-stationarity.
   - **Example**: In wind healing prediction, distinguishing data from before and after 2013 revealed non-stationarity, indicating systematic differences between the two periods.

4. **Implications**:
   - **Long-Term Studies**: For studies spanning long time intervals, non-stationarity must be tested for and accounted for in analysis and model building.
   - **Feature Relevance**: Continually validate and update features to ensure they remain relevant and predictive in changing data contexts.

### **Summary**

Non-stationarity can significantly affect the analysis and interpretation of healthcare data. It's essential to recognize its presence and take appropriate measures to address it, especially in longitudinal studies or when combining data from different time periods.