Skip to content

Low cardinality auto indexing #142

@EricSoroos

Description

@EricSoroos

Describe the bug

Auto index generation on cardinality is backwards.

https://github.com/dathere/datapusher-plus/blob/master/datapusher/jobs.py#L1728

    # if a column's cardinality <= AUTO_INDEX_THRESHOLD, create an index for that column

Low cardinality indexes on postgresql aren't useful, because the planner will almost never choose them.

The planner will choose a table scan instead of an index scan if it is likely that many/most pages will be read. When you have a low cardinality index, eg a bool, the planner is going to assume that 1/2 of rows will be hit, and therefore essentially all pages. Even if you've got a distribution that's really skewed, the planner will generally not choose that index. (if you do have a known skewed distribution, you can do a partial/multicolumn index where the condition is the rare case, but that's not really applicable for a general purpose tool. )

This should be reversed, and set the default threshold to ~10 at least.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions