Skip to content

Releases: USM-CHU-FGuyon/BlendedICU


03 Jul 08:26
Choose a tag to compare

🐛 Bugfix:
Error in medication processing of eICU, see #32


26 Jun 14:57
Choose a tag to compare


  • Support for time resampling, min-max clipping, and pivoting data to wide format is temporarily dropped. They will be moved later in the pipeline, this provided major speedups.
  • Exploiting polars laziness to provide fast harmonization and low memory pressure.
  • No more processing by patient chunk and individual patient files.

The full OMOP-ization pipeline can be run for all 5 databases in a single day :)


  • ICU stays with missing Length-of-stay data were dropped from the database. All patients are now preserved.
  • Drug exposures that were not omop-ized were kept, with drug_concept_id = 0, previously they were dropped.

🏃 Getting further:

  • Drug dosages were partially omop-ized: dosage and routes were extracted. Some units were omop-ized, routes were not harmonized yet.
  • Observation period table was added
  • drug strength table is still a work in progress, contributions are welcome ! especially for eICU.


28 May 06:12
Choose a tag to compare

Changes :
⚡ Speedup : Converting MIMIC-III, MIMIC-IV and Amsterdam's csv.gz files as parquet in step 1. This conversion is only done once and allowed speeding up the following step.


03 May 12:37
Choose a tag to compare

Changes :

  • 🐛 Bugfix : visit_occurrence_id is no longer missing from condition_occurrence table.
  • ⚡ Speedup : Converting eICU's csv.gz files as parquet in step 1. This makes re-running 3 times faster.


24 Apr 13:28
Choose a tag to compare

Started to speed up some operations using polars.


12 Mar 06:58
Choose a tag to compare

Corrections on variables and dtypes in final OMOP tables.


  • Removed visit_start_date from measurement table, and string values in care_site table's place_of_service_concept_id
  • Save all OMOP tables to parquet + corrected wrong dtypes on some tables.
  • Rounding times to the second. This avoids an error due to high precision in time when writing some records to parquet OMOP tables.

Minor changes:

  • Refactored timeseriespreprocessing to timeseriesprocessor
  • Option to skip reset_dir() when starting 2_{dataset}.py


07 Mar 08:24
Choose a tag to compare

Major changes:

  • Generated a numeric patient id for OMOP-standardization. (Issue #15 )
  • Added some insight for running times of each scripts. (as suggested in Issue #24 )
  • Simplified the structure of paths.json
  • Fixed inconsistency in datetimes of OMOP tables : some datetime columns contained the date, other contained the time of day. Now they all contain the full datetime. Issue #26

Minor changes:

  • Added unit_concept_id to auxillary_files/user_input/timeseries_variables.csv
  • Fixed harmless SettingWithCopy warning happening in database_processing/

Thanks to @mostafaalishahi, and @xinyuejohn for their contribution to the project.


19 Feb 07:55
Choose a tag to compare

What's changed

  • BlendedICU now handles the latest version of MIMIC-IV. (Issue #1 )
  • Timeseries files are saved in a way that reduces indexing times.
  • update to package versions
  • other minor corrections. (Issue #21 )


29 Jan 03:43
Choose a tag to compare

Fixed issues in the omop tables: A patient_id column had the wrong values, and some columns from the OMOP standard were missing. see Issue #17


19 Jan 05:02
Choose a tag to compare

Quickfix. Bug introduced in the latest commit. See Issue #15