## Data Generation 

The heart attack dataset is a simulated electronic health record from a simple point process model. This dataset records 1000 patients on 10 variables for each day over 10 years.

# Data Dictionary

| **Variable**              | **Var name** | **Type**                                                                 |
|---------------------------|--------------|--------------------------------------------------------------------------|
| Age                       | age          | Continuous                                                               |
| Cardiovascular Risk       | cvr          | Continuous, range from 0 to 1, indicating the baseline risk of CV events. Fixed for each patient. |
| **Medication**     |              |                                                                          |
| Beta Blocker              | BetaB        | Binary, patient's medication lasted for 1 year (365 days).               |
| Statins                   | Statins      | Binary, patient's medication lasted for 1 year (365 days).               |
| Acid Reducer              | AcidR        | Binary, patient's medication lasted for 1 year (365 days).               |
| Vioxx                     | Vioxx        | Binary, patient's medication lasted for 1 year (365 days).               |
| **Disease**               |              |                                                                          |
| Hypertension              | HT           | Binary, the disease lasts for 1 month from onset (30 days).              |
| High Cholesterol          | HC           | Binary, the disease lasts for 1 month from onset (30 days).              |
| Arthritis                 | Arthritis    | Binary, the disease lasts for 1 month from onset (30 days).              |
| **Primary Outcome**          |              |                                                                          |
| Myocardial Infarction     | MI           | Binary, it occurs then stops immediately.                                |


# Simulating Process

Patients are independent. For each patient, we follow the same process:

1. **Initialization:**
    - Determine **Cardiovascular Risk (CVR)**: Sample from a uniform distribution $U(0,1)$. This value is fixed over time.
    - Determine **Age**: Start from the patient's 40th birthday.
    - Initialize all other events as 0.

2. **Start Simulation:**
    - Set the starting day as $\text{day} = 0$ (the first day).
    - Initialize $t_0 = 0$.

3. **Daily Event Check:**
    - When $\text{day} = t_0$, check today’s status for the 8 events.
    - Only sample the events that are currently 0.

4. **Define Intensity Parameters:**
    - According to the relationships defined in the model, set the **intensity parameters** (rate of the exponential distribution) for the events that are 0.
    - Expoential distribution guarantees that events are expected to happen continuously and independently at a constant average rate.
    - Intensity scores allow you to model the randomness of when events happen, but in a way that reflects how risky or likely each event is for the patient. 
    - For the 8 binary variables, each has intensity parameter which is a function of its parents. We assume the time to event follows exponential distribution with parameter determined by current status of a patient.

5. **Sample Time to Next Occurrence:**
    - For each event that is 0, sample the **time to next occurrence** from the exponential distribution.

6. **Determine Next Event:**
    - Identify the next earliest event by finding the smallest sampled time to the event.
    - Let this event happen on $\text{day} = t$.

7. **Event Occurrence:**
    - When $\text{day} = t$, the picked event occurs (turn it to 1):
      1. If it is a **drug**, it stays 365 days (or another specified duration, e.g., 100/60/30 days).
      2. If it is a **disease**, it stays for 30 days.
      3. If it is **Myocardial Infarction (MI)**, it only happens on that day.

8. **Update Time:**
    - Set $t_0 = t$.

9. **Repeat:**
    - Repeat steps 3-8 until the end of the simulation.




# Expected Result

1000*3650*10 numpy array, where:
1)	1000 indicating 1000 patients
2)	3650 indicating 3650 days
3)	10 indicating 10 variables.
