Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Postgres CHARTEVENTS partition is no longer efficient #136

Open
alistairewj opened this Issue Sep 30, 2016 · 8 comments

Comments

Projects
None yet
4 participants
Owner

alistairewj commented Sep 30, 2016

With the addition of text data (a fix in MIMIC-III v1.4), the partitioning of CHARTEVENTS is no longer effectively distributing the data.

with drb_stats as (
select 1 as bucket, 1 as itemid_min , 210 as itemid_max UNION
select 2 as bucket, 210 as itemid_min     , 250 as itemid_max UNION
select 3 as bucket, 250 as itemid_min     , 614 as itemid_max UNION
select 4 as bucket, 614 as itemid_min     , 640 as itemid_max UNION
select 5 as bucket, 640 as itemid_min     , 742 as itemid_max UNION
select 6 as bucket, 742 as itemid_min     , 1800 as itemid_max UNION
select 7 as bucket, 1800 as itemid_min    , 2700 as itemid_max UNION
select 8 as bucket, 2700 as itemid_min    , 3700 as itemid_max UNION
select 9 as bucket, 3700 as itemid_min    , 4700 as itemid_max UNION
select 10 as bucket, 4700 as itemid_min    , 6000 as itemid_max UNION
select 11 as bucket, 6000 as itemid_min    , 7000 as itemid_max UNION
select 12 as bucket, 7000 as itemid_min    , 8000 as itemid_max UNION
select 13 as bucket, 8000 as itemid_min    , 220074 as itemid_max UNION
select 14 as bucket, 220074 as itemid_min  , 323769 as itemid_max
),
histogram as (
select bucket, itemid_min, itemid_max, count(*) as freq
from drb_stats drb
left join chartevents ce
on ce.itemid >= drb.itemid_min and ce.itemid < drb.itemid_max
group by bucket, itemid_min, itemid_max
order by bucket, itemid_min, itemid_max
)
select bucket, itemid_min, itemid_max,
repeat('*', (freq::float / max(freq) over() * 50)::int) as bar
from histogram;
 bucket | itemid_min | itemid_max |                        bar                         
--------+------------+------------+----------------------------------------------------
      1 |          1 |        210 | *******************
      2 |        210 |        250 | *******
      3 |        250 |        614 | *******************
      4 |        614 |        640 | *****
      5 |        640 |        742 | *********
      6 |        742 |       1800 | **************
      7 |       1800 |       2700 | 
      8 |       2700 |       3700 | *****************
      9 |       3700 |       4700 | *
     10 |       4700 |       6000 | *****
     11 |       6000 |       7000 | 
     12 |       7000 |       8000 | 
     13 |       8000 |     220074 | ********************
     14 |     220074 |     323769 | **************************************************

Essentially, the addition of the text data added ~100 million rows to ITEMIDs > 220000. I've noticed queries are going slower and I'm sure users will start to notice this too. Perhaps we should take this opportunity move to a smarter partitioning strategy - one based on the type of data stored with the ITEMID (e.g. the Metavision text data goes into a few partitions of its own, vital sign data goes into its own partition, etc). I'd welcome thoughts from anyone who has any - @parisni @tompollard

Contributor

parisni commented Sep 30, 2016

First let me inform you that postgres 9.6 released today. https://www.postgresql.org/docs/9.6/static/release-9-6.html
That is pretty funny because its major feature is Parallel execution of sequential scans, joins and aggregates, and this feature should be very usefull for chartevents table.

That's beeing said sorry I am not aware about that MIMIC new feature. Questions:

  • Text data = chartevents.value column ?
  • It's mostly used when d_items.param_type=text ?
  • Can it contains free text in it when d_items.param_type=text ?
  • What kind of query are slow ? Those on text fields ? Those on itemid contained in the chartevent_14 ?
Owner

alistairewj commented Oct 3, 2016

Great news! Looking forward to trying some of the parallel processing features.

No problem - I will update you as it's also relevant to the issue at hand. Essentially, we added a large amount of data to CHARTEVENTS. This data is from TEXTSIGNALS in Metavision. It is structured drop down box data. For example for ITEMID = 223901 (GCS - Motor Response) there are six options: No response, Abnormal extension, Abnormal Flexion, Flex-withdraws, Localizes Pain, Obeys Commands. Practically, it means CHARTEVENTS increased from 250 million rows to 350 million rows. It is indeed all the ITEMID in D_ITEMS where param_type='Text'. It is not "free" text, but it is text.

Personally, I am finding that the ventildation-durations.sql query is very slow now, but that is a very inefficient query. I think in general queries will be slowed because almost all of the Metavision data is in partition 14 ( 220074 and above ).

Contributor

parisni commented Oct 3, 2016

First, have you got a set of chartevents queries that are runed ? I am not sure to get the big picture about variety of queries runed on it. The way I describe above is ok for the queries I made, but maybe not for all use cases. The way I use chartevent is allways beginning with getting a subset of it, based on an itemid. If this does not represent the way all user query it, then I would think again about that.

My fisrt reaction about storing those drop down box data as text in chartevent was surprise. In general, such predefined values are stored as smallint ids and reference a dimension table around the fact table. But in the meantime, adding a new column (say drop_down_box) in chartevents would be very costly, and mostly containing null values. Then storing those values as text is maybe the best option.

In the general case, speeding up chartevent on postgresql = reducing sequential scans. It is not possible to totally avoid them, that is why partitionning is a good help because it limits sequential scan times in most cases.

Partitionning data by datatype can be a good option. It allows to index different partition in specialized ways. That is why separating "text rows" from "num rows" would allow to adapt indexes (say on value/valuenum) respectively. By the way, I am not sure it is relevant to index value/valuenum for now, because itemid is allready very disperse.

So maybe a better option would be to work on dispersion and not data type. By example, put all itemids that are very disperse, in a single huge partition. Querying on itemid would always result on an index-scan ( small % of the table ). And those itemids that represent huge portion of chartevents on other little partitions. The table below represents the 100 first itemids in terms of row number. They represent 140M rows (~middle of chartevents table ?). Separating them as 14 tables of 10M rows would result to use those tables for sequential scans only. And 10M rows is a descent amount of rows for sequential scans. And one of 200M very disparate.

To be fair, on postgresql I would go for a chartevent table implemented with FDW like cstore_fdw in case of a standalone server ( take a look at this well explained one : http://blog.dbi-services.com/optimized-row-columnar-orc-format-in-postgresql/ ) and hive_fdw in case you have an hadoop cluster ( http://www1.bigsql.org/hadoopfdw/ ).

count itemid
5180809 211
3464326 742
3418917 646
3386719 618
3303151 212
3236350 161
3216866 128
3205052 550
2955851 1125
2762225 220045
2737105 220210
2671816 220277
2544519 159
2261065 1484
2096678 51
2085994 8368
2072743 52
2023672 220048
1862707 227969
1849287 224650
1807316 5815
1804988 8549
1802684 5820
1796313 8554
1772387 5819
1769925 8553
1716561 834
1687886 3450
1687615 8518
1676872 3603
1641889 581
1621393 3609
1619782 8532
1579681 455
1573583 8441
1553537 456
1442406 31
1378959 5817
1375295 8551
1292916 220181
1290488 220179
1289885 220180
1198681 113
1174868 1703
1156173 220052
1155571 467
1149788 220050
1149537 220051
1136214 80
1083809 1337
1042512 674
1032728 432
982518 5813
976252 8547
962191 617
955452 210
954354 637
954139 184
952177 723
950038 454
945638 198
937064 707
933238 704
917780 479
892239 54
878442 32
852968 547
818852 154
803881 3656
803816 1046
803619 224084
787137 599
786544 428
784361 593
783762 425
783321 224080
775873 224082
774213 663
774157 478
773891 678
772277 677
765274 223792
743354 224086
740176 680
737224 27
727719 706
725866 8537
719203 8480
717326 224093
716961 227958
690295 3645
679113 3327
673719 3420
666496 621
655454 5814
649745 8548
636690 49
618565 224642
601818 8367
592344 1087
Owner

tompollard commented Nov 16, 2016

@theonesp this is the issue that we discussed yesterday, in case you are interested in taking it on.

@tompollard Great! I'll have a look. Thanks

@tompollard Why did you decide to create 14 tables?
Is the aim of dividing the big one into several others just to improve performance, or does it have to be 14?

Owner

tompollard commented Nov 19, 2016

@theonesp 14 is just an arbitrary number and the aim was to improve performance. Not a great deal of thought went into the partitioning, so there is plenty of room for improvement :)

Contributor

parisni commented Nov 20, 2016

Hi,
What about first building a benchmark by listing a set of queries and a little script running and timing them ? This could be a good basis to test improving methods
Here are two benchmarks eg:
https://github.com/MIT-LCP/mimic-code/tree/master/benchmark
https://github.com/parisni/mimic/tree/master/mimic3/benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment