In [7]:
!pip install pyhealth



In [12]:
!pip install mne pandarallel rdkit transformers accelerate polars



In [11]:
!rm -rf PyHealth
!git clone -b feat/accommodate_legacy_api https://github.com/sunlabuiuc/PyHealth.git

Cloning into 'PyHealth'...
remote: Enumerating objects: 8807, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 8807 (delta 1), reused 0 (delta 0), pack-reused 8804 (from 2)[K
Receiving objects: 100% (8807/8807), 123.55 MiB | 3.28 MiB/s, done.
Resolving deltas: 100% (5755/5755), done.


In [13]:
import sys
sys.path.append("./PyHealth")

### **Instruction on [pyhealth.data](https://pyhealth.readthedocs.io/en/latest/api/data.html)**
- **[README]**: This module defines the basic data structures in PyHealth.
- **[Structures]**:
  - `Event` is the info structure for a single clinical event
  - `Patient` is the info structure for a single patient

  All data can be thought of as a stream of events.

![Image description](https://drive.google.com/uc?id=1lQIIb4n5VUEn5tIVDOVCnE4dHgnYvnhn)

In [21]:
from IPython.display import Image
Image(url='https://drive.google.com/uc?id=1lQIIb4n5VUEn5tIVDOVCnE4dHgnYvnhn')

### **[pyhealth.data.Event](https://pyhealth.readthedocs.io/en/latest/api/data/pyhealth.data.Event.html)**
# Understanding the Event Class in PyHealth

## Event Class

The `Event` class in PyHealth represents a single clinical event with the following structure:

### Explicit Arguments

- `event_type`: str
  - Type of the clinical event (e.g., 'DIAGNOSES_ICD', 'PROCEDURES_ICD', 'medication')
  - This is **required** and typically matches the table name in the database

- `timestamp`: datetime, optional
  - When the event occurred
  - If not provided, defaults to current time
  - Ideally, we can define when an event occurs, but datasets like imaging may not necessarily have one.

### Optional Arguments (Key-Value Pairs)

Any additional keyword arguments are stored in the `attr_dict` attribute. Common attributes include:

- `code`: str
  - Code of the event (e.g., "428.0" for heart failure)
- `table`: str
  - Name of the table where the event is recorded (e.g., "DIAGNOSES_ICD")
- `vocabulary`: str
  - Vocabulary of the code (e.g., 'ICD9CM', 'ICD10CM', 'NDC')
- `visit_id`: str
  - Unique identifier of the visit
- `patient_id`: str
  - Unique identifier of the patient

## Key idea:
The key idea here is that events are super flexible, whether an image, a lab signal, or a piece of text. The only thing that's required is the event_type so we can define exactly what it is or where it's from (i.e a diagnoses table).

In [32]:
from pyhealth.data import Event
from datetime import datetime

# create an event
event1 = Event(
    event_type = "DIAGNOSES_ICD",
    code="428.0",
    vocabulary="ICD9CM",
    visit_id="v001",
    patient_id="p001",
    timestamp=datetime.now(),
)

print(event1)

Event from patient p001 visit v001:
	- Code: 428.0
	- Table: None
	- Vocabulary: ICD9CM
	- Timestamp: 2025-05-29 19:37:37.391321
	- event_type: DIAGNOSES_ICD


You can also add any additional attributes as key=value pairs.

In [30]:
# event2 contains an additional attributes
event2 = Event(
    event_type="PRESCRIPTIONS",
    code="00069153041",
    vocabulary="NDC",
    visit_id="v001",
    patient_id="p001",
    timestamp=datetime.now(),
    active_on_discharge = True,
)

event2.attr_dict

{'event_type': 'PRESCRIPTIONS', 'active_on_discharge': True}

In [34]:
print(event2)

Event from patient p001 visit v001:
	- Code: 00069153041
	- Table: None
	- Vocabulary: NDC
	- Timestamp: 2025-05-29 19:36:38.633075
	- event_type: PRESCRIPTIONS
	- active_on_discharge: True


**TODO:** create the following event:

- code: 00069153041
- table: PRESCRIPTIONS
- vocabulary: NDC
- visit_id: 130744
- patient_id: 103
- timestamp: 2019-08-12 00:00:00
- dosage: 250 mg

Note that try to create 2 additional attributes dosage and dosage_unit.

In [35]:
event3 = Event(
    event_type="PRESCRIPTIONS",
    code="00069153041",
    vocabulary="NDC",
    visit_id="130744",
    patient_id="103",
    timestamp=datetime.fromisoformat("2019-08-12 00:00:00"),
    dosage = "250mg",
)

event3.attr_dict

{'event_type': 'PRESCRIPTIONS', 'dosage': '250mg'}

In [36]:
print(event3)

Event from patient 103 visit 130744:
	- Code: 00069153041
	- Table: None
	- Vocabulary: NDC
	- Timestamp: 2019-08-12 00:00:00
	- event_type: PRESCRIPTIONS
	- dosage: 250mg


# Understanding the Patient Class in PyHealth

## Patient Class

The `Patient` class in PyHealth represents an individual patient with their associated clinical data. It serves as a container for organizing events and visits related to a single patient.

### Required Arguments

- `patient_id`: str
  - Unique identifier for the patient
  - This is **required** to instantiate a Patient object

### Data Source Argument

- `data_source`: polars.DataFrame
  - DataFrame containing all events related to this patient
  - Contains columns for "patient_id", "event_type", "timestamp", and event-specific attributes

### Methods

#### get_events()

Retrieves events with optional filtering by type and time range:
```
def get_events(
    self,
    event_type: Optional[str] = None,
    start: Optional[datetime] = None,
    end: Optional[datetime] = None,
    filters: Optional[List[tuple]] = None,
    return_df: bool = False
) -> Union[pl.DataFrame, List[Event]]
```

In [45]:
from pyhealth.data import Patient
import polars as pl
from datetime import datetime, timedelta

# Create a Patient object
patient = Patient(
    patient_id="p001",
    data_source=pl.DataFrame({
        "patient_id": ["p001", "p001", "p001"],
        "event_type": ["DIAGNOSES_ICD", "DIAGNOSES_ICD", "PROCEDURES_ICD"],
        "timestamp": [datetime.now(), datetime.now(), datetime.now()],
        "DIAGNOSES_ICD/code": ["428.0", "401.9", None],
        "DIAGNOSES_ICD/vocabulary": ["ICD9CM", "ICD9CM", None],
        "PROCEDURES_ICD/code": [None, None, "39.61"],
        "PROCEDURES_ICD/vocabulary": [None, None, "ICD9CM"]
    })
)

In [39]:
dir(patient)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'add_event',
 'add_visit',
 'attr_dict',
 'available_tables',
 'birth_datetime',
 'death_datetime',
 'ethnicity',
 'gender',
 'get_visit_by_id',
 'get_visit_by_index',
 'index_to_visit_id',
 'patient_id',
 'visits']

The simple idea around the new PyHealth 2.0 rework is that all data is just an event stream, and that patients are just a grouping mechanism of events within the stream.

In some sense, a dataset can be a list of patients, a patient is just a list of events. In other datasets, there might be no patients, and again just a list of events.

In our existing framework, an event is just a row within a table, a single image inside of an imaging dataset, etc.