## Road Map

- **Kinds of patterns**
    - set, sequential, structural
- **Completeness**
    - all, closed, maximal, constrained, approximate, near-match, top-k
- **Levels of abstraction**
    - e.g., computer $\Rightarrow$ printer; laptop $\Rightarrow$ HP_printer
- **Number of data dimensions**
    - computer $\Rightarrow$ printer; (age: 30-39, income: 42K-48K) $\Rightarrow$ HDTV
- **Types of value**
    - Boolean: presence or absence; quantitative: e.g., age income
- **Types of rules**
    - association, correlation, gradient

## Various Association Rules

- Single-level, single-dimensional, Boolean value
- **Multi-level** association rules
    - support: uniform, reduced, group-based
    - redundancy filtering: milk $\Rightarrow$ wheat bread [8%, 70%]; 2% milk $\Rightarrow$ wheat bread [2%, 72%]
- **Multi-dimensional** association rules
- **Quantitative** association rules

## Multi-dimensional Association

- Single-dimensional (**intra**-dimensional) rules:
    - buys(X, "milk") $\Rightarrow$ buys(X, "bread")
- Multi-dimensional rules: $\geq$ 2 predicates
    - **inter**-dimensional (no repeated predicates)
        - age(X, "19-25") and occupation(X, "student") $\Rightarrow$ buys(X, "coke")
    - **hybrid**-dimensional (repeated predicates)
        - age(X, "19-25") and buys (X, "popcorn") $\Rightarrow$ buys(X, "coke")

## Categorical vs. Quantitative

- **Categorical** attributes
    - nominal, finite number of possible values, no ordering among values
    - e.g., occupation, brand, color
- **Quantitative** attributes
    - numeric, implicit ordering among values
    - e.g., age, income, price

## Constraint-Based Mining

- **Automatically** find **all** patterns in a data set
    - Unrealistic! Too many patterns, not focused
- Data mining should be an **interactive** process
    - user directs what to be mined
- **Constraint-based mining**
    - user flexibility: provides constraints on what to be mined
    - system optimization: more efficient mining

## Constraints in Data Mining

- Knowledge type constraint
- Data constraint
- Dimension/level constraint
- Interestingness constraint
- **Rule (or pattern)** constraint
    - metarules (rule templates)
    - #attributes, attribute values ,etc.

## Metarule-Guided Mining

- $P_1(X, Y) \land P_2(X, W) \Rightarrow buys(X, \text{"office sw"})$
- $age(X, \text{"30-39"}) \land income(X, \text{"41K-60K"}) \Rightarrow buys(X, \text{"office sw"})$
- $P_1 \land P_2 \land \ldots \land P_a \Rightarrow Q_1 \land Q_2 \land \ldots \land Q_b$
    - n = a + b, find all n-predicate sets $L_n$
    - compute the support of all a-predicate subsets of $L_n$
    - compute the confidence of rules

## Anti-Monotonicity

- **Anti-monotonicity**
    - if itemset S **violates** the constraint, so does any of its superset
- Example
    - sum(S.price) $\leq$ 100: yes
    - sum(S.price) $\geq$ 100: no
    - range(S.profit) $\leq$ 15: yes

## Monotonicity

- **Monotonicity**
    - if itemset S **satisfies** the constraint, so does any of its superset
- Example
    - sum(S.price) $\geq$ 100: yes
    - min(S.price) $\leq$ 100: yes
    - range(S.profit) $\geq$ 15: yes

## Succinctness

- **Succinctness**
    - enumerate all and only those sets that are guaranteed to satisfy the constraint
- Example
    - min(S.price) $\leq$ v: yes
    - sum(S.price) $\geq$ v: no
- Pre-counting prunable: no need for support counting

## Convertible Constraints

- Convert tough constraints into anti-monotonic or monotonic **by properly ordering items**
- Example
    - avg(S.profit) $\geq$ 25
    - ordering items in descending order: < a, f, g, d, b, h, c, e >
    - if afb violates C, so does afb*
    - it become anti-monotonic

## Strongly Convertible

- avg(S.profit) $\geq$ 25 is convertible anti-monotonic w.r.t. descending order
- avg(S.profit) $\geq$ 25 is convertible monotonic w.r.t. ascending order
- avg(S.profit) $\geq$ 25 is **strongly convertible**

## Classification of Constraints

![Classification of Constraints](./img/7.1.png)