# Run `ClearPath_iter_2.ipynb` first.

# ==========
# = Iteration #3 =
# ==========

## Stage 1: Planning
The tasks for this stage are:
1. Compose the project team
2. Set the research question
3. Schedule all review meeting to ensure iterations are time-boxed.

### 3.1.1. Compose the project team
No change since previous iteration.

### 3.1.2. Set the research question
The research question is _How representative is a simulated process model of the discovered process model, in terms of control, engagement, and adherence statistics ?_

### 3.1.3. Schedule all review meeting to ensure iterations are time-boxed
No change since previous iteration.

## Stage 2: Extraction
This tasks for this stage are:
1. Gather knowledge and insight into the processes under study and the data-generating mechanisms.
2. Obtain data for processing

### 3.2.1. Gather knowledge and insight into the processes under study and the data-generating mechanisms.
No change since previous iteration.

### 3.2.2. Obtain data for processing
No change since previous iteration.

## Stage 3: Data processing
This tasks for this stage are:
1. Assess data quality
2. Format data for study

### 3.3.1. Assess data quality
No change since previous iteration.

### 3.3.2. Format data for study
No change since previous iteration.

## Stage 4: Mining and analysis
This tasks for this stage are:
1. Discover / Mine process models
2. Build simulation models, if applicable
3. Design and test model evaluation rig
4. Set up and/or update the evidence template

### 3.4.1 Discover / Mine process models
No further process mining is required to answer the research question associated with this iteration. I will use the discovered processes from the previous iteration.

### 3.4.2 Build simulation models, if applicable
The process models "discovered" using process-mining methods lend themselves to discrete event simulation (DES), for which the `simmer` package in R was developed.

In [None]:
# Simulate.
...

### 3.4.3 Design and test model evaluation rig
All process models will be evaluated for the control, engagement, and adherence statistics listed below. All adherence statistics will be calculated by the `AdhereR` package in R. "CMA" stands for continuous medication adherence. "Duration of supply" is the quantity of medication prescribed divided by the directed daily usage (e.g. 28 tablets at 1 per day = 28-day duration of supply). These durations can overlap. Although arithmetically equivalent, the "duration of theoretical adherence" assumes the durations of supply are stitched end-to-end rather than overlapped.
#### Control statistics
1. _Test status_: a binary indicator variable indicating whether the given test value is 'Of concern' or 'No concern'. RAYG {>70, 70<=x<=58, 58<=x<=48, <48}
2. _Test improvement_: a binary indicator variable indicating whether the given test value is an 'Improvment' or 'Disimprovement' relative to the previous test value.
#### Engagement statistics
3. _Count of DNAs_: the count of did-not-attend events recorded during the inter-test interval.
#### Adherence statistics (see [Vollmer et al. 2012](https://bmchealthservres.biomedcentral.com/counter/pdf/10.1186/1472-6963-12-155) and [Dima and Dediu 2017](https://sci-hub.wf/10.1371/journal.pone.0174426))
4. _CMA1_ a.k.a. _Medication Possession Ratio_ a.k.a. _Compliance Rate_: the sum of every prescription's duration of supply from all but the last prescription started within the observations window, divided by the number of days between the first and last prescription.
5. _CMA2_ a.k.a. _Medication Possession Ratio_ a.k.a. _Continuous Measure of Medication Acquisition_: the sum of every prescription's duration of supply from all prescriptions started within the observations window, divided by the number of days between the first prescription and the end of the observation window. 
6. _CMA3_ a.k.a. _Proportion of Days Covered_: CMA1 capped at 1.0.
7. _CMA4_ a.k.a. _Proportion of Days Covered_: CMA2 capped at 1.0.
8. _CMA5_ a.k.a. _Proportion of Days Covered_: the sum of every prescription's duration of theoretical adherence from all but the last prescription started within the observations window, divided by the number of days between the first and last prescription.
9. _CAM6_ a.k.a. _Proportion of Days Covered_: the sum of every prescription's duration of theoretical adherence from all prescriptions started within the observations window, divided by the number of days between the first prescription and the end of the observation window. 
10. _CAM7_: the sum of every prescription's duration of theoretical adherence from all prescriptions started within the observations window, plus any portion of the duration of theoretical adherence from prescriptions started before the observation window, all divided by the width of the observation window.
11. _CAM8_: the sum of every prescription's duration of theoretical adherence from all prescriptions started within the observations window, divided by the width of the observation window, but the start of the observation window is lagged to start on the date that the duration of theoretical adherence from prescriptions started before the observation window ends.
12. _CAM9_: the numerator is the sum of either a) all durations of theoretical adherence, if there is overlap, or b) each duration of supply as a proportion of inter-prescription days. The denominator is the width of the observation window. Essentially, if gaps exist between prescriptions, it assumes they don't by spreading the supply over the inter-prescription period, which results in fractional daily supply (rather than the true case of full supply on some days and no supply on others). Finaly prescriptions are included.

All adherence statistics can be calculated by the `AdhereR` package in R. "CMA" stands for continuous medication adherence. Other key concepts related to the adherence statistics are:
- the follow-up window is the total period for which relevant prescriptions are included.
- the observation window is the period within the follow-up window for which adherence is calculated. This is always expected to occur after the last prescription of interest.
- the date of prescription is used as the date of dispense or issue when the date of dispense or issue are not available.
- surplus medication from earlier overlapping events within the observation are carried over, for CAM5 and higher.
- surplus medication from before the observation are carried over, for CAM7 and higher.
- CAM3 will always equal CAM5, and CAM4 will always equal CAM6 for a fixed observation window
- it is possible to set threshold gaps between prescriptions that do break the prescription history into epsiodes within the observation window. The CMAs can be calculated for these periods instead of considering the entire observation window as one episode.
- if the period between prescriptions is greater than the N-per-day expection, then one can assume that either a) patients use the medication sparingly and get a new prescription exactly when their previous prescription is exhausted, or b) the period between prescriptions are made up of a period of perfect dherence to the N-per-day expectation followed by a period with no medicatl available. The distinction between these distinguishes CMAs 1-4 from CMAs 5-9.

In [None]:
# Control statistics.
...


# Engagement statistics.
...


# Adherence statistics.
...

### 3.4.4 Set up and/or update the evidence template
No change since previous iteration.

## Stage 5: Evaluation
This tasks for this stage are:
1. Meet with Clinical Review Board to assess validity.
2. Set requirements for next interation of stages 1-5.

### 3.5.1. Meet with Clinical Review Board to assess validity
My assessment is that the simluated model is fit for the purposes of this example.

### 3.5.2 Set requirements for next interation of stages 1-5.
Requirements for the next iteration are:
1. Evaluate a perturbed simulation model that halves the inter-test duration.
2. Report on the differences in evaluation statistics between the simulated and perturbed model.