Skip to content

bptlab/fiber2xes

Repository files navigation

fiber2xes

This project contains a python utility intended to use data coming from fiber to create .xes event logs. To use this tool you need access to the Mount Sinai Data Warehouse.

Installation

Follow these steps to install fiber2xes:

  1. Install fiber according to their installation guide.
  2. Download and install Spark 3.1.2 according to their installation guide. This website provides a concise overview of how the Spark environment can be set up. Make sure that both the SPARK_HOME and JAVA_HOME environment variables are correctly set and exported. Should the Spark Version available change, the pyspark version of this package, as well as the one of the docker image, needs to be changed accordingly.
  3. Run the pip installation to install fiber2xes:
pip install git+https://gitlab.hpi.de/pm1920/fiber2xes.git

For development and testing, all dev dependencies can be installed using

pip install -e .[dev]

If you're using zsh, escape the square brackets: pip install -e .\[dev\] In case you encounter version or dependency issues in relation to fiber, it is advisable to run

sed -i 's/==/>=/' requirements.txt

in the fiber directory in order to allow the installation of fiber2xes to override the right dependency versions.

Example

After following all installation steps, example.py, a demo file containing a short overview of how fiber2xes can be executed, can be run by calling

python3 ./example.py

This example creates a sample cohort for a MRN-based event log, which will be extracted and saved to the repository's root directory as a file called ./log_<timestamp>_mrn_5.xes This file can then be used for process mining.

Interface

The package offers two methods for the event log creation and filter for trace and event filtering. The following chapters contains more details about these methods.

Log creation

To create a log from a fiber cohort, just call the cohort_to_event_log-method:

from fiber2xes import cohort_to_event_log

cohort_to_event_log(
  cohort,
  trace_type,
  verbose=False,
  remove_unlisted=True,
  remove_duplicates=True,
  event_filter=None,
  trace_filter=None,
  cores=multiprocessing.cpu_count(),
  window_size=500,
  abstraction_path=None,
  abstraction_exact_match=False,
  abstraction_delimiter=";",
  include_anamnesis_events,
  duplicate_event_identifier,
  event_identifier_to_merge,
  perform_complex_duplicate_detection
)

Parameters:

  • cohort: The fiber cohort with the patient
  • trace_type: The type of a trace (mrn or visit)
  • verbose=False: Flag if the events should contain original non abstracted values (default False)
  • remove_unlisted=True: Flag if a trace should only contain listed events (default True)
  • remove_duplicates=True: Flag if duplicate events should be removed (default True)
  • event_filter=None: A custom filter to filter events (default None)
  • trace_filter=None: A custom filter to filter traces (default None)
  • cores=multiprocessing.cpu_count(): The number of cores which should be used to process the cohort (default amount of CPUs)
  • window_size=500: The number of patients per window (default 500)
  • abstraction_path=None: The path to the abstraction file (default None)
  • abstraction_exact_match=False: Flag if the abstraction algorithm should only abstract exacted matches (default False)
  • abstraction_delimiter=";": The delimiter of the abstraction file (default ;)
  • include_anamnesis_events=True: Should anamnesis events be included in the log (default True)
  • duplicate_event_identifier="BACK PAIN": Event identifier to be analysed separately for duplications (default "BACK PAIN")
  • event_identifier_to_merge="CHRONIC LOW BACK PAIN": Event identifier to be used for separately identified duplicates (default "CHRONIC LOW BACK PAIN")
  • perform_complex_duplicate_detection=False: should complex time- and lifecycle-based duplicate detection be performed (default False)

Log serialisation

The method save_event_log_to_file serialises a created log to a file.

from fiber2xes import save_event_log_to_file

save_event_log_to_file(log, file_path)

Parameters:

  • log: The log generated by the cohort_to_event_log method
  • file_path: The file path / name

Trace and event filtering

With the trace or event filter its possible to filter the traces or events during the creation process. Therefore there are the following conditions:

These can be combined by And, Or and Not operations.

Diagnosis

A filter for a specific diagnosis given by the code.

from fiber2xes.filter.condition import Diagnosis

filter = Diagnosis(diagnosis_code)

Parameter:

  • diagnosis_code: The diagnosis code

Material

A filter for a specific material given by the code.

from fiber2xes.filter.condition import Material

filter = Material(material_code)

Parameter:

  • material_code: The material code

Procedure

A filter for a specific procedure given by the code

from fiber2xes.filter.condition import Procedure

filter = Procedure(procedure_code)

Parameter:

  • procedure_code: The procedure code

Time

A filter the traces based on timing conditions (see parameter)

from fiber2xes.filter.condition import Time

filter = Time(one_event_after=None, one_event_before=None, all_events_after=None, all_events_before=None)

Parameters:

  • one_event_after: The trace is relevant if one event of the trace was after the given date
  • one_event_before: The trace is relevant if one event of the trace was before the given date
  • all_events_after: The trace is relevant if all events of the are were after the given date
  • all_events_before: The trace is relevant if all events of the are were after the given date

Generic

A filter the traces or events with the given lambda expression. The lambda expression gets the trace or event as a parameter and it should return true or false. In case of true its a relevant trace or event, otherwise not.

from fiber2xes.filter.condition import Generic

filter = Generic(lambda_expression)

Parameter:

  • lambda_expression: The lambda expression which will be applied on all traces and events

And

An aggregation of two other filters with a logical and as aggregation function.

from fiber2xes.filter.operator import And

filter = And(filter1, filter2)

Parameter:

  • filter1 and filter2: Two other trace or event filters which will be aggregated by a logical and.

Or

An aggregation of two other filters with a logical or as aggregation function.

from fiber2xes.filter.operator import Or

filter = Or(filter1, filter2)

Parameter:

  • filter1 and filter2: Two other trace or event filters which will be aggregated by a logical or.

Not

An inverter of the result of another filter.

from fiber2xes.filter.operator import Not

filter = Not(filter)

Parameter:

  • filter: The result of the given filter will be negated.

Spark Configuration

This pipeline tool utilises spark for transforming large event data sets. For local development, or for using the tool on differently equipped hardware, it can be sensible to change memory requirements and other spark configuration options. For this, the .env file in the project's root directory can be used in order to override the default options passed to the spark calls.

Contribution

To contribute please fork this repository and create a merge request. Assign one of the developer of this project for a review. Please always add a short introduction of your submission containing a reason for your submission.

About

This project contains a python utility intended to use EHR data coming from fiber to create .xes event logs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published