Episode Bitwise Encoding for temporal medical data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


EBEncoding - efficient bitwise encoding for temporal (medical) data

Episode Bitwise Encoding is an encoding method designed for abstracting multiple medication episodes of a patient related to a certain event, e.g., an adverse drug event. This being said, such encoding is actually domain independent and can be applied to any temporal events.

The motivation of such encoding is to make time series data easy to be consumed. One time series (e.g., a drug usage in past months) can be encoded as one numberic value. Multiple time series (e.g., multipharmacy) can be encoded as a vector. Therefore, after encoded, such data can be easily analysed (e.g., used in off-the-shelf machine learning algorithms). In our first use case, it is used for predicting adverse drug events for patients with mental health disorders.


  • (7 December 2016) Mimic III data encoding functions added including postgres DAO and mimic events encoding. Play with the test_sepsis_encoding() in MEEncoder.py to get a feeling.
  • (11 November 2016) Cross correlation (similar to the correlaiton in signal processing) between two encodings is implemented. The correlation result is a list of 2-element tuple, of which the first element is the time shift and the second is the value of the correlation based on the time shift. This correlation enables the calculation of various time correlations between two encodings, e.g., which one is earlier than the other and how many time units; time delay analysis: at what time shift the correlation value achieves the maximum value. Also, a negative time delay analysis is on its way, which will be very useful for analysing the effectiveness of treatment episodes for certain symptoms/disorders. An example of usage can be found at the function of test_correlation() in here.
  • (23 September 2016) Define classess of EBEncoding and EBVector with opertators. The update was put on a new branch, which was set as the default branch. A sample usage file was added and the applicaiton of the encoding/vectors/matrix in Adverse Drug Event analytics was implemented in the EBUtils.py.


The EBEncoding.py contains the encoding class and vector class definition. Two usage examples:

  • the general usage example is available here
  • the application of the encoding in Adverse Drug Event Analytics is here
  • prefer encoding a real EHR data? check test_sepsis_encoding() in mimic events encoding if you have access to Mimic III.

Analytics using the coding

  • Association Analysis of Adverse Drug Events and Polyphamacy
  • Drug-drug interaction analysis: using SVD (Singular Value Decomposition) on the matrix of drug-drug interaction Episode Encodings over 47k Adverse Events has revealed some potential new knowledge. The top 5 singular vectors after removing known causes of the ADE are visualised here. The absolute y values represent the significances of each drug pair in terms of its correlation to the adverse event. (This study is an ongoing work and more details will be updated soon.)

##Questions? This is my ongoing work (2016) at Kings College London. Any questions please email: honghan.wu@kcl.ac.uk.

##citation If you find this useful, please cite the following publication.

  • Wu, Honghan, Zina M. Ibrahim, Ehtesham Iqbal, and Richard JB Dobson. “Encoding Medication Episodes for Adverse Drug Event Prediction.” In Research and Development in Intelligent Systems XXXIII: Incorporating Applications and Innovations in Intelligent Systems XXIV, pp. 245-250. Springer International Publishing, 2016.