Merge branch 'master' of https://github.com/Evolveum/midpoint

Evolveum · Dec 8, 2020 · 8802a81 · 8802a81
2 parents 7ec00fc + 7dd1f4d
commit 8802a81
Show file tree

Hide file tree

Showing 5 changed files with 121 additions and 144 deletions.
diff --git a/repo/repo-sqale/README.adoc b/repo/repo-sqale/README.adoc
@@ -277,32 +277,6 @@ https://stackoverflow.com/questions/722221/how-to-log-postgresql-queries
 We may need regular `ANALYZE` and/or `VACUUM`.
 This should be run regularly - can it be done in DB or should MP call this or something else will trigger it?
 
-== Pagination
-
-Various types of pagination are summed up in https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/[this article].
-
-For *pagination in the GUI* `OFFSET`/`LIMIT` seems to be the acceptable despite the performance
-degradation with big values of `OFFSET`.
-The reason is that GUI requires random access and first pages are accessed more than later/last pages.
-Also, any inconsistency (suddenly added entry) is easy to explain and user probably knows what is happening.
-
-For long term processes that need to process many items we use *keyset pagination*, e.g. using last
-ID from current page to define the next page without any `OFFSET`, only using `LIMIT`.
-This is very efficient *even when no additional index besides PK is used* - and it's natural, too.
-It is called "top-N query" and typically the best thing is to order and filter it by the same index.
-Even when ordered by ID with filters not using the index, it tends to be very fast unless
-the required number of rows satisfying the condition is far from the beginning of the search.
-While this may skip some items that are added after we processed the page (and at the same time
-process other items added later that appear on later pages) it is more or less deterministic.
-We can also avoid processing "future" items with `WHERE` clause using creation timestamp
-(or current maximal ID, if sequential) at the processing start time.
-
-Following techniques are generally not usable for us:
-
-* *TID Scan* and *Keyset with Estimated Bookmarks* does not support `WHERE` clauses.
-* *Cursor* pagination causes high client-server coupling and is state-full.
-We don't want to hold the cursor for operations that can take longer and need transactions.
-
 == Performance drop with volume
 
 TL/DR:
@@ -428,46 +402,3 @@ JIT:
   Functions: 18
 "  Options: Inlining true, Optimization true, Expressions true, Deforming true"
 ----
-
-== JSONB vs EAV performance
-
-[NOTE]
-Performance was measured on default PG installation on 2GB RAM VirtualBox.
-The results can be better (or worse depending on the host) but not order-of-magnitude different.
-
-See link:sql/pgnew-eav-vs-json.sql[] for the whole source.
-
-Structure for JSON is simple, single table `tjson` with `(oid UUID PK, name VARCHAR, ext JSONB)`.
-Column `name` has a unique index and `ext` has a `GIN` index only by default.
-Other indexes are added as needed and mentioned during tests.
-
-Structure for EAV model has a master table `teav` with `(oid UUID PK, name VARCHAR)`
-and detail table `teav_ext_string` with `(owner_oid FK, key VARCHAR, value VARCHAR)`.
-Extension table has `PK(owner_oid, key, value)`, value could be omitted if uniqueness is not necessary.
-Because the PK covers all columns, it should be index-organized table in Oracle (PG doesn't have this feature).
-In PG it can be https://www.postgresql.org/docs/13/sql-cluster.html[clustered] by an index,
-but this is potentially costly (12min for 15M rows) during which the table is unusable
-because of the used `ACCESS EXCLUSIVE` lock.
-
-=== 15 million rows
-
-|===
-| Object | Size | Note
-
-| teav_ext_string | 2763 MB | attribute value table (detail for EAV)
-| teav_ext_string_pk | 2312 MB | PK of AV table, nearly the same size
-| tjson | 1909 MB | JSON table (smaller than just AV table alone)
-| tjson_ext_idx | 1620 MB | GIN index for `ext` column
-| teav_ext_string_key_value_idx | 867 MB | index for searching by key+value
-| teav | 862 MB | master table for EAV model
-| teav_oid_pk | 583 MB | primary key index of that table
-| tjson_oid_pk | 582 MB | PK of JSON table, virtually the same as for EAV
-| teav_name_key, tjson_name_key | 451 MB | name indexes are the same
-|===
-
-
-|===
-| Operation | Time | Note
-
-// TODO
-|===
diff --git a/repo/repo-sqale/sql/pgnew-eav-vs-json.sql b/repo/repo-sqale/sql/pgnew-eav-vs-json.sql
@@ -108,7 +108,6 @@ BEGIN
             END LOOP;
         END IF;
 
-
         IF r % 1000 = 0 THEN
             COMMIT;
         END IF;
@@ -128,71 +127,54 @@ select * from teav_ext_string;
 analyze teav;
 analyze teav_ext_string;
 analyze tjson;
-cluster teav_ext_string using teav_ext_string_pk; -- 10-13min
-
-
--- matching big portion of the table (>20%), doesn't use the index
--- 2399882
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from tjson where ext @> '{"hobbies":["video"]}';
-
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from teav t
-where exists (
-    select from teav_ext_string es
-    where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video');
-
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from teav t
-where exists (
-    select from teav_ext_string es
-    where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video')
-and name > 'user0009000000';
-
-
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
--- select count(*)
-select *
-FROM teav t
-JOIN teav_ext_string es ON es.owner_oid = t.oid
-WHERE es.key = 'hobbies' and es.value = 'video';
+-- cluster teav_ext_string using teav_ext_string_pk; -- 10-13min, not sure how useful this is
 
--- selecting rare values, uses index and is very fast (under 50ms)
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from tjson
-where ext @> '{"hobbies":["sleeping"]}';
+-- SHOW enable_seqscan;
+-- SET enable_seqscan = on;
 
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from teav t
-where exists (
-        select from teav_ext_string es
-        where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping');
 
--- selecting emails between two values (comparison)
 EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
 select count(*)
 -- select *
 from tjson
--- not numerical, matches unexpected values too, but that's not the point here
-where ext->>'email' BETWEEN 'user200000@mycompany.com' AND 'user250000@mycompany.com';
+where ext->>'email' LIKE 'user2%'
+and ext ? 'email';
 
+-- index experiments
+analyze;
 CREATE INDEX tjson_exttmp_idx ON tjson ((ext->>'email'));
 DROP INDEX tjson_exttmp_idx;
 
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*)
--- select *
-from teav t
-where exists (
-    select from teav_ext_string es
-    where es.owner_oid = t.oid and es.key = 'email' and es.value BETWEEN 'user200000@mycompany.com' AND 'user250000@mycompany.com');
+-- BENCHMARK
+-- counts
+select count(*) from tjson;
+-- low selectivity count
+select count(*) from tjson where ext @> '{"hobbies":["video"]}';
+-- high selectivity count (result is low)
+select count(*) from tjson where ext @> '{"hobbies":["sleeping"]}';
+select count(*) from tjson where ext->>'email' LIKE 'user2%';
+select count(*) from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%');
+
+select count(*) from teav;
+select count(*) from teav_ext_string;
+select count(*) from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video');
+select count(*) from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping');
+
+-- selects
+select * from tjson limit 500;
+select * from tjson where ext @> '{"hobbies":["video"]}' limit 500;
+select * from tjson where ext @> '{"hobbies":["video"]}' order by oid limit 500;
+select * from tjson where ext @> '{"hobbies":["video"]}' and oid>'fffe0000-0000-0000-0000-000000000000' order by oid limit 500;
+select * from tjson where ext @> '{"hobbies":["sleeping"]}' limit 500;
+select * from tjson where ext @> '{"hobbies":["sleeping"]}' order by oid limit 500;
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') limit 500;
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') order by t.oid limit 500;
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') and t.oid>'fffe0000-0000-0000-0000-000000000000' order by t.oid limit 500;
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') limit 500;
+-- EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') order by t.oid limit 500;
+
+-- TODO
+select * from tjson where ext->>'email' LIKE 'user2%' limit 500;
+select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%')  limit 500;
+
diff --git a/repo/repo-sqale/sql/pgnew-experiments.sql b/repo/repo-sqale/sql/pgnew-experiments.sql
@@ -73,7 +73,6 @@ SHOW session_replication_role;
 select count(*) from m_object_oid;
 explain
 select count(*) from m_user;
-select * from m_user order by name_norm offset 990 limit 50;
 -- vacuum full analyze; -- this requires exclusive lock on processed table and can be very slow, with 1M rows it takes 10s
 vacuum analyze; -- this is normal operation version (can run in parallel, ~25s/25m rows)
 
@@ -101,12 +100,13 @@ SELECT 'user-' || LPAD(r::text, 10, '0'),
             (select '{' || string_agg('"key-' || i || '": "value"', ',') || '}' from generate_series(1, 1000) i)::jsonb
         WHEN r % 10 = 7 THEN
             -- let's add some numbers and wannabe "dates"
-            ('{' || '"hired": "' || current_date - width_bucket(random(), 0, 1, 1000) || '",' || '"rating": ' || width_bucket(random(), 0, 1, 10) || '}')::jsonb
+            ('{"hired": "' || current_date - width_bucket(random(), 0, 1, 1000) || '", "rating": ' || width_bucket(random(), 0, 1, 10) || '}')::jsonb
         ELSE
-            ('{"hobbies": ' || array_to_json(random_pick(ARRAY['eating', 'books', 'music', 'dancing', 'walking', 'jokes', 'video', 'photo'], 0.5))::text || '}')::jsonb
+            ('{"hired": "' || current_date - width_bucket(random(), 0, 1, 1000) || '",' ||
+                '"hobbies": ' || array_to_json(random_pick(ARRAY['eating', 'books', 'music', 'dancing', 'walking', 'jokes', 'video', 'photo'], 0.5))::text || '}')::jsonb
         END,
     1
-from generate_series(50001, 100000) as r;
+from generate_series(2000001, 3000000) as r;
 
 select * from m_user;
 
@@ -140,9 +140,13 @@ CREATE INDEX m_user_ext_hired2_idx ON m_user ((ext ->> 'hired')) WHERE ext ? 'hi
 -- set jit=on; -- JIT can sometimes be slower when planner guesses wrong
 
 -- see also https://www.postgresql.org/docs/13/functions-json.html some stuff is only for JSONB
-EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
-select count(*) from m_user
-    where ext?'hobbies' -- faster, uses GIN index
+EXPLAIN --(ANALYZE, BUFFERS, FORMAT TEXT)
+-- select count(*)
+select oid, name_norm, ext
+from m_user
+    where
+--           ext?'hobbies' and -- faster, uses GIN index
+          ext @> '{"hobbies":["video"]}'
 --     where ext->>'hobbies' is not null -- seq-scan
 ;
 
@@ -181,17 +185,18 @@ analyse;
 
 EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
 -- select count(*)
-select oid -- "select *" with big bytea stuff can dominate the total cost
+-- select * -- "select *" with big bytea stuff can dominate the total cost
+select oid, length(fullobject) -- much faster than *
 from m_user
 where
     ext?'hobbies' and
     -- without array test jsonb_array_elements_text function can fail on scalar value (if allowed)
     exists (select from jsonb_array_elements_text(ext->'hobbies') v
 --         where jsonb_typeof(ext->'hobbies') = 'array'
 --             and upper(v::text) LIKE '%ING')
-        where upper(v::text) LIKE 'VID%')
+        where upper(v::text) LIKE '%ING')
 --     and oid > 14000000
--- order by oid
+order by oid
 ;
 
 -- MANAGEMENT queries
@@ -214,7 +219,7 @@ FROM pg_class t
     LEFT JOIN pg_namespace tftns ON tftns.oid = tft.relnamespace
 WHERE 'public' IN (ns.nspname, tftns.nspname)
 ORDER BY pg_relation_size(t.oid) DESC
-LIMIT 20;
+LIMIT 50;
 
 vacuum full analyze;
 -- database size
@@ -230,6 +235,9 @@ where ut.relkind = 'r' and tt.relkind = 't'
     and ns.nspname = 'public'
 order by relpages desc;
 
+-- find sequence name for serial column (e.g. to alter its value later)
+select pg_get_serial_sequence('m_qname', 'id');
+
 -- PRACTICAL UTILITY FUNCTIONS
 
 -- based on https://dba.stackexchange.com/a/22571

diff --git a/repo/repo-sqale/sql/pgnew-repo.sql b/repo/repo-sqale/sql/pgnew-repo.sql
@@ -416,6 +416,61 @@ CREATE TABLE m_acc_cert_wi_reference (
 
     PRIMARY KEY (owner_owner_owner_oid, owner_owner_id, owner_id, relation, targetOid)
 );
+
+-- TODO: catalog unused at the moment
+CREATE TABLE m_ext_item (
+    id SERIAL NOT NULL,
+    kind INTEGER, -- see RItemKind, is this necessary? does it contain cardinality?
+    name VARCHAR(157), -- no path for nested props needed?
+    type VARCHAR(32), -- data type
+    storageType VARCHAR(32) NOT NULL default 'EXT_JSON', -- type of storage (JSON, column, table separate/common, etc.)
+    storageInfo VARCHAR(32), -- optional storage detail, name of column or table if necessary
+
+    PRIMARY KEY (id)
+);
+
+-- EXPERIMENTAL EAV (first without catalog, so string keys are used)
+CREATE TABLE m_object_ext_boolean (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    value BOOLEAN NOT NULL,
+    PRIMARY KEY (owner_oid, ext_item_id, value)
+);
+CREATE TABLE m_object_ext_date (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    value TIMESTAMP NOT NULL,
+    PRIMARY KEY (owner_oid, ext_item_id, value)
+);
+CREATE TABLE m_object_ext_long (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    value INTEGER NOT NULL,
+    PRIMARY KEY (owner_oid, ext_item_id, value)
+);
+CREATE TABLE m_object_ext_poly (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    orig VARCHAR(255) NOT NULL,
+    norm VARCHAR(255),
+    PRIMARY KEY (owner_oid, ext_item_id, orig)
+);
+CREATE TABLE m_object_ext_reference (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    targetoid UUID NOT NULL,
+    relation VARCHAR(157),
+    targetType INTEGER,
+    PRIMARY KEY (owner_oid, ext_item_id, targetoid)
+);
+CREATE TABLE m_object_ext_string (
+    owner_oid UUID NOT NULL REFERENCES m_object_oid(oid),
+    ext_item_id VARCHAR(32) NOT NULL,
+    value VARCHAR(255) NOT NULL,
+    PRIMARY KEY (owner_oid, ext_item_id, value)
+);
+
+
 -- TODO what of assignment extensions? Can they change for various types of assignments?
 --  Then what? Inheritance is impractical, legacy extension tables are unwieldy.
 

diff --git a/tools/schrodinger/src/main/java/com/evolveum/midpoint/schrodinger/page/BasicPage.java b/tools/schrodinger/src/main/java/com/evolveum/midpoint/schrodinger/page/BasicPage.java
@@ -239,7 +239,7 @@ public ListTasksPage listTasks() {
 
     public ListTasksPage listTasks(String objectListMenuItemKey) {
         if (StringUtils.isEmpty(objectListMenuItemKey)) {
-            clickAdministrationMenu("PageAdmin.menu.top.serverTasks", "PageAdmin.menu.top.serverTasks.list");
+            clickAdministrationMenu("PageAdmin.menu.top.serverTasks", "PageAdmin.menu.top.tasks.list");
         } else {
             clickAdministrationMenu("PageAdmin.menu.top.serverTasks", objectListMenuItemKey);
         }
@@ -447,11 +447,12 @@ private SelenideElement getMainMenuItemElement(String topLevelMenuKey, String ma
         SelenideElement topLevelMenu = $(Schrodinger.byDataResourceKey(topLevelMenuKey));
         topLevelMenu.shouldBe(Condition.visible);
 
-        SelenideElement topLevelMenuChevron = topLevelMenu.parent().$(By.tagName("i"));
-        if (!topLevelMenuChevron.has(Condition.cssClass("fa-chevron-down"))) {
-            topLevelMenu.click();
-            topLevelMenuChevron.shouldHave(Condition.cssClass("fa-chevron-down")).waitUntil(Condition.cssClass("fa-chevron-down"), MidPoint.TIMEOUT_DEFAULT_2_S);
-        }
+        //todo temporary commented
+//        SelenideElement topLevelMenuChevron = topLevelMenu.parent().$(By.tagName("i"));
+//        if (!topLevelMenuChevron.has(Condition.cssClass("fa-chevron-down"))) {
+//            topLevelMenu.click();
+//            topLevelMenuChevron.shouldHave(Condition.cssClass("fa-chevron-down")).waitUntil(Condition.cssClass("fa-chevron-down"), MidPoint.TIMEOUT_DEFAULT_2_S);
+//        }
 
         SelenideElement mainMenu = topLevelMenu.$(Schrodinger.byDataResourceKey("span", mainMenuKey), index);
         mainMenu.shouldBe(Condition.visible);