Skip to content

Commit

Permalink
Merge branch 'master' of github.com:Evolveum/docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dejavix committed Mar 18, 2022
2 parents d98de6a + 350a00b commit efe9602
Show file tree
Hide file tree
Showing 4 changed files with 74 additions and 58 deletions.
6 changes: 5 additions & 1 deletion docs/repository/configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,11 @@ It can be raised if the iterative search overhead (executing the select)
is too high compared to the time used for processing the page results.
| `100`

| `sqlDurationWarningMs`
| Duration in millis after which the query is logged by `com.evolveum.midpoint.repo.sqlbase.querydsl.SqlLogger`
on the `WARN` level, including the provided parameters.
| `0` (disabled)

|===

There are no options for compression as this is left to PostgreSQL.
Expand All @@ -224,5 +229,4 @@ This also makes the inspection of the values in the columns easier.
* xref:../generic/[Old Generic Repository]
* xref:/midpoint/reference/deployment/clustering-ha/[Clustering / high availability setup]
* xref:/midpoint/reference/repository/native-postgresql/migration/[Migration to Native PostgreSQL Repository]
// TODO separate audit repository link
* xref:/midpoint/reference/tasks/task-manager/configuration/[Task Manager Configuration]
5 changes: 5 additions & 0 deletions docs/repository/native-audit.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,11 @@ is too high compared to the time used for processing the page results.
| Specifies whether midPoint should create missing columns for link:#custom-column[custom properties] during the startup.
| `false`

| `sqlDurationWarningMs`
| Duration in millis after which the query is logged by `com.evolveum.midpoint.repo.sqlbase.querydsl.SqlLogger`
on the `WARN` level, including the provided parameters.
| `0` (disabled)

|===

There are no options for compression as this is left to PostgreSQL.
Expand Down
101 changes: 50 additions & 51 deletions docs/repository/native-postgresql/db-maintenance.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,56 @@
:page-since: "4.4"

[WARNING]
This page is work-in-progress with stuff useful to DB admin but also to programmer or engineer.
This page is work-in-progress with stuff useful to a DB admin but also to a programmer or an engineer.

== Index tuning

Anything that is externalized into columns and related tables (like extensions, references, etc.) is effectively searchable using xref:/midpoint/reference/concepts/query/query-api/[Query API].
It is not possible to search for information stored only in the serialized form of the object.
But _searchable_ and _efficiently searchable_ are two different things.

For some tables (object types) and some columns no indexes are needed, but for others they typically are.
MidPoint is provided with all essential indexes out of the box.
Despite that it is possible to come up with real-life queries that will perform badly.
While it is possible to cover nearly all needs by more and more indexes, it is not necessarily
a good idea to have them all created by default.
Indexes also take space and if not necessary only add cost to insert/update operation without really helping.

For any non-trivial installation it is recommended to check the database performance logs/statistics
regularly to identify sluggish queries (see the link:#db-monitoring[DB monitoring] section).
When identified, check existing (predefined) indexes whether they should have covered the case and investigate why they didn't.
If the existing index does not cover the case, don't be afraid to add the index according to your specific needs.

Following notes and tips can be helpful:

* Don't index each column separately if the critical query uses multiple where conditions,
use https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys[multi-column index] instead.
Order of columns is important.
* Searching using `like` (especially with `%` at the start of the value) or case-insensitive
search often require specific indexes, e.g. https://www.postgresql.org/docs/current/gin-intro.html[GIN index] with https://www.postgresql.org/docs/current/pgtrgm.html[trigram options].
// TODO reference example lower, when added
* In general, don't index columns with low-cardinality (e.g. boolean or integer representing enum
with just a few distinct values) alone.
Leave the column unindexed and let other indexes do the job first.
Searching only by such a column is not recommended.
It is however possible to use low-cardinality column in multi-column index, and it may be beneficial
when it is the first column (of course, only selects using the column in `WHERE` use such an index).
* It is possible to use `where` clause in an index when only specific values are selected often, e.g. for value indicating an active user.
This is called https://use-the-index-luke.com/sql/where-clause/partial-and-filtered-indexes[partial index].
This is typically used for low variability columns (booleans, enums), using them in where part of the
index is good, and it also makes the index size smaller.
If unsure, don't use `WHERE` in the index definition.
* Be sure to add index on the concrete table like `m_user`, not the inheritance parent like `m_object`.
Indexes are not inherited.

// TODO extension indexing

Here are some examples of indexes or queries for indexes:
[source,sql]
----
-- indexes for a table
select * from pg_indexes where tablename = 'm_user';
----

== DB monitoring

Expand Down Expand Up @@ -236,56 +285,6 @@ ORDER BY count(*) DESC
LIMIT 10;
----

== Index tuning

[NOTE]
This section is not updated and cleaned-up for the new repository.

Anything that is externalized into columns and related tables (like extensions, references, etc.) is effectively searchable using xref:../concepts/query/query-api/[].
It is not possible to search for information stored only in the serialized form of the object.
But _searchable_ and _efficiently searchable_ are two different things.

For some tables (object types) and some columns no indexes are needed, but for others they typically are.
MidPoint is provided with all essential indexes out of the box.
Despite that it is possible to come up with real-life queries that will perform badly.
While it is possible to cover nearly all needs by more and more indexes, it is not necessarily
a good idea to have them all created by default.
Indexes also take space and if not necessary only add cost to insert/update operation without really helping.

For any non-trivial installation it is recommended to check the database performance logs/statistics regularly to identify sluggish queries.
When identified check existing (predefined) indexes whether they should have covered the case and investigate why they didn't.
If the existing index does not cover the case, don't be afraid to add the index according to your specific needs.

Following notes and tips can be helpful:

* Don't index each column separately if the critical query uses multiple where conditions,
use https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys[multi-column index] instead.
Order of columns is important.
* Searching using `like` (especially with `%` at the start of the value) or case-insensitive
search often require specific indexes.
Consult your database resources; some databases don't offer function-based index and indexing the column using lower/upper (depending on the used query) may not be possible.
Some databases offer specialized indexes, e.g. PostgreSQL trigram indexes that can significantly boost performance.
* In general, don't index columns with low-cardinality (e.g. boolean or integer representing enum
with just a few distinct values) alone.
Leave the column unindexed and let other indexes do the job first.
Searching only by such a column is not recommended.
It is however possible to use low-cardinality column in multi-column index, and it may be beneficial
when it is the first column (of course, only selects using the column in `WHERE` use such an index).
* It is possible to use `where` clause in an index when only specific values are selected often
, e.g. value indicating active user.
This is called https://use-the-index-luke.com/sql/where-clause/partial-and-filtered-indexes[partial index].
This is typical for low variability columns (booleans, enums), using them in where part of the
index is good, and it also makes the index size smaller.

// TODO extension indexing

Here are some examples of indexes or queries for indexes:
[source,sql]
----
-- indexes for a table
select * from pg_indexes where tablename = 'm_user';
----

== System troubleshooting commands and queries

When troubleshooting Postgres performance we need to check output of the following commands.
Expand Down
20 changes: 14 additions & 6 deletions docs/repository/native-postgresql/usage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -273,12 +273,20 @@ xref:/midpoint/reference/upgrade/database-schema-upgrade/#upgrading-native-postg

You can find further details in the source code documentation for `apply_change` procedure at the end of the `postgres-new.sql` script.

////
TODO: If different upgrade is needed for LTS version I'd start with apply_change using forced=true for LTS branch.
Each change used in LTS must have some "if not applied yet" check in the main upgrade script.
Alternatively m_global_metadata could be used to note what LTS changes were made.
Generally, minimal (if any) DB changes are expected on the LTS DB.
////
== Troubleshooting

If you find a bug or encounter performance problem with the Native repository,
it is always important to gather more information before reporting the issue.

* In case of error or exception, always include the relevant portion of the xref:/midpoint/reference/diag/logging/[midpoint.log] in the report.
* Review xref:/midpoint/reference/repository/native-postgresql/db-maintenance/#index-tuning[Index tuning]
tips for performance problems, especially for extension items or shadow attributes.
* If the performance problem is indeed DB related, identify the slow query, preferably using
`pg_stat_statements` extension as xref:/midpoint/reference/repository/native-postgresql/db-maintenance/#monitoring-queries[described here].
* To log issued SQL queries in xref:/midpoint/reference/diag/logging/[midpoint.log],
configure system loggers (*System* in main menu, then *Logging*) so that it contains
`com.evolveum.midpoint.repo.sqlbase.querydsl.SqlLogger` with level `DEBUG` (shows SQL)
or `TRACE` (includes parameter values).

== See also

Expand Down

0 comments on commit efe9602

Please sign in to comment.