<center>
<img src="../../img/ml_theme.png">
# Дополнительное профессиональное <br> образование НИУ ВШЭ
#### Программа "Машинное обучение и майнинг данных"
<img src="../../img/faculty_logo.jpg" height="240" width="240">
## Автор материала: преподаватель Факультета Компьютерных Наук НИУ ВШЭ Кашницкий Юрий
</center>
Материал распространяется на условиях лицензии <a href="https://opensource.org/licenses/MS-RL">Ms-RL</a>. Можно использовать в любых целях, кроме коммерческих, но с обязательным упоминанием автора материала.

## Занятие 7. Поиск ассоциативных правил и частых множеств признаков

In [1]:
import Orange

**Загружаем данные из набора c покупками.**

In [2]:
basket_data = Orange.data.Table("market-basket.basket")
basket_data[:]

[[], {"Bread":1.000, "Milk":1.000},
 [], {"Bread":1.000, "Diapers":1.000, "Beer":1.000, "Eggs":1.000},
 [], {"Milk":1.000, "Diapers":1.000, "Beer":1.000, "Cola":1.000},
 [], {"Bread":1.000, "Milk":1.000, "Diapers":1.000, "Beer":1.000},
 [], {"Bread":1.000, "Milk":1.000, "Diapers":1.000, "Cola":1.000}]

**Находим все ассоциативные правила с поддержкой не менее 0.3.**

In [3]:
rules = Orange.associate.AssociationRulesSparseInducer(basket_data, support=0.3)

**Выводим 5 правил и значения поддержки и достоверности.**

In [4]:
print "%4s %4s  %s" % ("Supp", "Conf", "Rule")
for r in rules[:5]:
    print "%4.1f %4.1f  %s" % (r.support, r.confidence, r)

Supp Conf  Rule
 0.4  1.0  Cola -> Diapers
 0.4  0.5  Diapers -> Cola
 0.4  1.0  Cola -> Diapers Milk
 0.4  1.0  Cola Diapers -> Milk
 0.4  1.0  Cola Milk -> Diapers


**Найдем частые множества товаров, встречающиеся как минмимум в 40% покупок.**

In [5]:
rules = Orange.associate.AssociationRulesSparseInducer(support=0.4, storeExamples = True)
itemsets = rules.get_itemsets(basket_data)
# относительная поддержка и частые множества
for itemset, tids in itemsets[:5]:
    print "(%4.2f) %s" % (len(tids) / float(len(basket_data)),
                          " ".join(basket_data.domain[item].name for item in itemset))

(0.40) Cola
(0.40) Cola Diapers
(0.40) Cola Diapers Milk
(0.40) Cola Milk
(0.60) Beer


**Теперь жизненный пример.**

Данные Weka о транзакциях в супермаркете. Неофициальное [описание](http://weka.8497.n7.nabble.com/question-of-using-supermarket-arff-for-academic-research-td2573.html), [тьюториал](http://machinelearningmastery.com/market-basket-analysis-with-association-rule-learning/) с Weka по поиску ассоциативных правил.

In [6]:
supermarket_data = Orange.data.Table("../../data/supermarket.arff")

In [7]:
supermarket_data.domain

[department1, department2, department3, department4, department5, department6, department7, department8, department9, grocery misc, department11, baby needs, bread and cake, baking needs, coupons, juice-sat-cord-ms, tea, biscuits, canned fish-meat, canned fruit, canned vegetables, breakfast food, cigs-tobacco pkts, cigarette cartons, cleaners-polishers, coffee, sauces-gravy-pkle, confectionary, puddings-deserts, dishcloths-scour, deod-disinfectant, frozen foods, razor blades, fuels-garden aids, spices, jams-spreads, insecticides, pet foods, laundry needs, party snack foods, tissues-paper prd, wrapping, dried vegetables, pkt-canned soup, soft drinks, health food other, beverages hot, health&beauty misc, deodorants-soap, mens toiletries, medicines, haircare, dental needs, lotions-creams, sanitary pads, cough-cold-pain, department57, meat misc, cheese, chickens, milk-cream, cold-meats, deli gourmet, margarine, salads, small goods, dairy foods, fruit drinks, delicatessen misc, department70

In [8]:
rules = Orange.associate.AssociationRulesSparseInducer(support=0.5, storeExamples=True)
itemsets = rules.get_itemsets(supermarket_data)
# относительная поддержка и частые множества
for itemset, tids in itemsets:
    print "(%4.2f) %s" % (len(tids) / float(len(supermarket_data)),
                          " ".join(supermarket_data.domain[item].name for item in itemset))

(0.72) bread and cake
(0.51) bread and cake milk-cream
(0.51) bread and cake milk-cream total
(0.50) bread and cake fruit
(0.50) bread and cake fruit total
(0.72) bread and cake total
(0.60) baking needs
(0.60) baking needs total
(0.53) juice-sat-cord-ms
(0.53) juice-sat-cord-ms total
(0.56) biscuits
(0.56) biscuits total
(0.59) frozen foods
(0.59) frozen foods total
(0.50) party snack foods
(0.50) party snack foods total
(0.64) milk-cream
(0.64) milk-cream total
(0.64) fruit
(0.64) fruit total
(0.64) vegetables
(0.64) vegetables total
(1.00) total


### Классифицирующие ассоциативные правила

In [9]:
lenses_data = Orange.data.Table("lenses")
print "Association rules:"
rules = Orange.associate.AssociationRulesInducer(lenses_data, support=0.3)
for r in rules:
    print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r)
    
print "\nClassification rules"
rules = Orange.associate.AssociationRulesInducer(lenses_data, support = 0.3, classificationRules = 1)
for r in rules:
    print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r)

Association rules:
0.333  0.533  lenses=none -> prescription=hypermetrope
0.333  0.667  prescription=hypermetrope -> lenses=none
0.333  0.533  lenses=none -> astigmatic=yes
0.333  0.667  astigmatic=yes -> lenses=none
0.500  0.800  lenses=none -> tear_rate=reduced
0.500  1.000  tear_rate=reduced -> lenses=none

Classification rules
0.333  0.667  prescription=hypermetrope -> lenses=none
0.333  0.667  astigmatic=yes -> lenses=none
0.500  1.000  tear_rate=reduced -> lenses=none
