In [None]:
# Imports and environment setup
import pandas as pd
from IPython.display import display, HTML
from CPOETrackerAnalysis import SimulationAnalyzer, aggregate_simulation_data

The SimulationAnalyzer class is meant to be an analyitical wrapper for the JSON files created by the tracking module of the CPOE recommender simulation. It takes care of parsing through the nested elements of the JSON and computing several useful statistics on the data.

First we'll run through some of the exisiting functionality of the SimulationAnalyzer on a single case

In [None]:
# Set data source for a single case
data_file = 'sim_data/34_156_data_v4.json'

The simulation analyzer is instantiated by providing the path to the json file containing the recorded events during the simulation of a case:

In [None]:
# Instantiate SimulationAnalyzer on source data file
sim_analyzer = SimulationAnalyzer(data_file)

## Using the simulation analyzer:

## Analyzer components:
When instantiated with a particular data source, the simulation analyzer object will parse the data into several attributes that can be operated on to report metrics of interest. Some of these attributes are defined below.

#### User, patient, start time, and end time
Included in the analyzer are some general attributes about the case from which the source data was derived. The `user` attribute gives the id of the user that completed the case. The `patient` attribute gives the id of the case. The `start_time` and `end_time` attributes give the absolute times when the case began (when the simulation window was opened) and when the case ended (when the simulation was saved), respectively.

In [None]:
user = sim_analyzer.user
patient = sim_analyzer.patient
start_time = sim_analyzer.start_time
end_time = sim_analyzer.end_time
print("user: {}\npatient: {}\nstart_time: {}\nend_time: {}".format(user, patient, start_time, end_time))

#### Event tracker
The `event_tracker_data` attribute is an object that keeps track of interactions with the simulation interface. Specifically, for the elements that have been interacted with in the interface (such as the search bar and the various search modes, any results on the page, or the notes selection), there is list of items containing the interaction timestamp as well as information about the event.

In [None]:
# Show event tracker data for current data source
event_tracker_data = sim_analyzer.event_tracker_data
event_tracker_data

#### Results tracker
The `results_tracker_data` attribute is an object that contains the results that appeared throughout the simulation, for each search mode (results from the recommender correspond to the `''` key). The results are further grouped by the section in which they appear. Along with each section of results, there is also a state object that contains information of the search.

In [None]:
# Show results tracker data for current data source
results_tracker_data = sim_analyzer.results_tracker_data
results_tracker_data

#### Signed item tracker
The `signed_item_tracker_data` attribute is an object that contains information for clinical orders signed during the simulation. These signed items are grouped by the timestamp of when they were signed. The information is in the format `clinical_item_id|source|search_query|search_mode|list_index`. These attributes are defined below:

|Attribute|Description|
|---------|:-----------|
|clinical_item_id|The database id associated with the signed item|
|source|The result source from wich the item was selected - _'resultSpace1' or 'resultSpace2' for recommender items, 'non-recommender' otherwise_|
|search_query|The query used that generated the signed item - _empty for recommender items_|
|search_mode|What type of search was used with the query (Find Orders, Order Sets, Diagnoses) - _empty for recommender items_|
|list_index|The index of the list that contained the signed item|

In [None]:
# Show signed item tracker data for current data source
signed_item_tracker_data = sim_analyzer.signed_item_tracker_data
signed_item_tracker_data

#### Results and sgined orders collections
To simplify certain operations on the data collected, the hierarchical `results_tracker_data` and `signed_item_tracker_data` have been flattened into the collections `results_collection` and `signed_orders_collection`, respectively. These collections represent each item as an item dictionary containing the item's information.

In [None]:
# Show results collection for current data source
results_collection = sim_analyzer.results_collection
results_collection

In [None]:
# Show signed orders collection for current data source
signed_orders_collection = sim_analyzer.signed_orders_collection
signed_orders_collection

### Summary Metrics:
Below are a few implemented summary metrics that make use of the attributes discussed above. For details on how these are implemented, please refer to the source code.

#### Elapsed Time
One of the first things we might be interested in about a user's behavior is the amount of time they spend on a particular case. This metric is retrieved as follows and is presented in the form __hours:minutes:seconds__. Note that this elapsed time does not filter out idle time.

In [None]:
# Retrieve elapsed time from the base data
elapsed_time = sim_analyzer.elapsed_time()
elapsed_time

#### Mouse clicks
There are several ways to retrieve information about mouse clicks during the simulation. The first is to retrieve all the clicks made during the duration of the simulation through `number_mouse_clicks_all()`. The second is to retrieve the number of mouse clicks over certain buttons and inputs on the simulation page through `number_mouse_clicks()`. And the final way is to retrieve a dict summary of clicks through `click_summary()` , where the key of the dict is the event and the value is the number of clicks. Note that the latter two of these methods have an option to return the frequency of clicks as a percentage of all clicks.

In [None]:
# Retrieve number of total mouse clicks made:
clicks_all = sim_analyzer.number_mouse_clicks_all()

# Retrieve number of mouse clicks made over the Results and Results Review sections:
clicks_notes_results = sim_analyzer.number_mouse_clicks(filters=['ResultInteraction', 'ResultsReview'], perc=True)

# Retrieve dict of mouse click summary
click_summary = sim_analyzer.click_summary(perc=False)

print("Total number of clicks: {}".format(clicks_all))
print("Percent of clicks over Results and Results Review sections: {}".format(clicks_notes_results))
print("Mouse click summary:\n {}".format(click_summary))

### Item Metrics:
There are also several implemented metrics for result items and signed items that use the attributes discussed above and other helpful utilities. Again, for more details about the implementations, please refer to the source code.

#### Results
There exists utility functions for retrieving the item options that appeared from manual searches and those that appeared from the recommender. These functions can be used to retreive the number of items from both sources, as well as for doing any other specific operations on the collections.

In [None]:
# Get item options that were results from a manual search
manually_search_options = sim_analyzer.get_manually_searched_options()
print("Number of manually searched options: {}\nManually searched options: {}".format(len(manually_search_options), manually_search_options))

In [None]:
# Get item options that were results from recommender
recommended_options = sim_analyzer.get_recommended_options()
print("Number of recommended options: {}\nRecommended options: {}".format(len(recommended_options), recommended_options))

In [None]:
# Since identical items may appear multiple times throughout searches, or by the recommender,
# it may be useful to only retreive unique options. This can be done by definining a uniqueness
# comparison function and applying it with the `get_unique` helper method.
def clinical_item_id_key_fn(item):
    """Returns clinical item value from item object"""
    return item['clinicalItemId']

unique_results = sim_analyzer.get_unique(sim_analyzer.results_collection, key_fn=clinical_item_id_key_fn)
print("Number of unique results: {}\nUnique results: {}".format(len(unique_results), unique_results))

#### Signed orders
Similar to results, there exists utility functions for retrieving signed items that appeared from manual searches or from the recommender. These functions are useful for metrics evaluating how the recommender influences the order behavior of users. These functions can be used to retreive the number of items from both sources, as well as for doing any other specific operations on the collections.

In [None]:
# Get signed items that appeared from manual searches
signed_from_manual_search = sim_analyzer.get_signed_from_manual_search()
print("Number signed from manual searches: {}\nItems signed from manual searches: {}".format(len(signed_from_manual_search), signed_from_manual_search))

In [None]:
# Get signed items that appeared from the recommender
signed_from_recommender = sim_analyzer.get_signed_from_recommended()
print("Number signed from recommender: {}\nItems signed from recommender: {}".format(len(signed_from_recommender), signed_from_recommender))

In [None]:
# One metric that is interesting to see is the orders that appeared from the recommender
# but were ultimately signed during a manual search. These signed items are essentially those
# that were 'missed' by the user.
signed_missed = sim_analyzer.get_signed_missed_recommended()
print("Number signed that were originally missed: {}\nSigned items that were originally missed: {}".format(len(signed_missed), signed_missed))

### Aggregation
There are several ways of aggregating the data captured by the simulation analyzer object, whether it be for visualization purposes or for constructing data reports on multiple data sources.

#### Timeline
One visualization that is useful is a timeline of events. This visualization allows us to note any clusters(or lack of clusters) of behaviour during the simulation. To visualize the timeline, we first construct the collection of sorted events using `construct_timeline()`. We can then pass this collection of events into the `visualize_timeline()` method as shown below.

In [None]:
# Sort events by their timestamp
sorted_events = sim_analyzer.construct_timeline()

# Visualize timeline
sim_analyzer.visualize_timeline(sorted_events)

#### Data dump
Although not part of the SimulationAnalyzer class, there exists functionality in `CPOETrackerAnalysis.py` for running an analyzer instance on a set of json-stored data sources and creating (or appending to an existing) csv of metrics.

In [None]:
# Create flat CSV from data
out_file = 'sim_data/out.csv'
data_home = 'sim_data'
aggregate_simulation_data(data_home, output_path=out_file)

In [None]:
# Show csv
csv = pd.read_csv('sim_data/out.csv')
display(csv)

## Exercises:
Below are a few implementation exercises to test your understanding of the different components of the `SimulationAnalyzer` class.

In [None]:
# Instantiate SimulationAnalyzer on source data file
sim_analyzer = SimulationAnalyzer('sim_data/48_244_data_v4.json')

#### Exercise 1
One of the things we want to track for a simulation case is whether a user ordered appropriate items. These items must also be ordered in a particular sequence (i.e. ordering antibiotics before a particular lab test could interfere with the results of the lab test). For this exercise, we will write a function that validates the 'correctness' of a case, according to a reference sequence of orders.

In [None]:
# Fill in the function below according to the docstring
def case_orders_valid(sim_analyzer, expected_orders):
    """ Validate that the case represented by sim_analyzer adheres to the expected sequence of orders.
    
    Args:
        sim_analyzer (SimulationAnalyzer): The SimulationAnalyzer instance representing the case being validated
        expected_orders (list): List of clinical id strings representing the expected sequence of orders. 
        This means that the clinical item expected_orders[i] should have been ordered before clinical items 
        expected_orders[j] for j > i.
    """
    is_valid = False
    # ****** START YOUR IMPLEMENTATION HERE ******
    
    # ****** END YOUR IMPLEMENTATION HERE ******
    return is_valid

expected_orders = ['45771', '45838', '45866', '45818']
case_orders_valid(sim_analyzer, expected_orders)
    