## Uporaba Orange za regresijo

## Using Orange for regression

In [1]:
import Orange
import random

Regression in Orange is very similar to classification. Both require labeled data. Just like in classification, regression is implemented with learners and regression models (regressors). Regression learners are objects that accept data and return regressors. Regression models are given data items to predict the value of continuous class:

In [2]:
data = Orange.data.Table("housing")
learner = Orange.regression.LinearRegressionLearner()
model = learner(data)

print("predicted, observed:")
for d in data[:3]:
    print("%.1f, %.1f" % (model(d)[0], d.get_class()))

predicted, observed:
30.0, 24.0
25.0, 21.6
30.6, 34.7


Scatter plot predicted-observed

napaka v odvisnosti od prave vrednosti, za več regresorjev

Let us start with regression trees. Below is an example script that builds the tree from data on housing prices and prints out the tree in textual form:

In [3]:
tree_learner = Orange.regression.SimpleTreeLearner(max_depth=2)
tree = tree_learner(data)
print(tree.to_string())


RM (22.5: 506.0)
: <=6.941
   LSTAT (19.9: 430.0)
   : <=14.4 --> (23.3: 255.0)
   : >14.4 --> (15.0: 175.0)
: >6.941
   RM (37.2: 76.0)
   : <=7.437 --> (32.1: 46.0)
   : >7.437 --> (45.1: 30.0)


Following is initialization of few other regressors and their prediction of the first five data instances in housing price dataset:

In [4]:
random.seed(42)
test = Orange.data.Table(data.domain, random.sample(data, 5))
train = Orange.data.Table(data.domain, [d for d in data if d not in test])

lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()

learners = [lin, rf, ridge]
regressors = [learner(train) for learner in learners]

print("y   ", " ".join("%5s" % l.name for l in regressors))

for d in test:
    print(("{:<5}" + " {:5.1f}"*len(regressors)).format(
        d.get_class(),
        *(r(d)[0] for r in regressors)))

y    linear regression    rf ridge regression
22.2   19.3  19.1  19.5
31.6   33.2  32.7  33.2
21.7   20.9  20.4  21.0
10.2   16.9  12.9  16.8
14.0   13.6  15.0  13.5


Looks like the housing prices are not that hard to predict

### Cross-Validation

Evaluation and scoring methods are available at Orange.evaluation:

In [5]:
lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()
mean = Orange.regression.MeanLearner()

learners = [lin, rf, ridge, mean]

res = Orange.evaluation.CrossValidation(data, learners, k=5)
rmse = Orange.evaluation.RMSE(res)
r2 = Orange.evaluation.R2(res)

print("Learner  RMSE  R2")
for i in range(len(learners)):
    print("{:8s} {:.2f} {:5.2f}".format(learners[i].name, rmse[i], r2[i]))

Learner  RMSE  R2
linear regression 4.88  0.72
rf       3.92  0.82
ridge regression 4.91  0.71
mean     9.20 -0.00


Not much difference here. Each regression method has a set of parameters. We have been running them with default parameters, and parameter fitting would help. Also, we have included MeanLearner in a list of our regression; this regressors simply predicts the mean value from the training set, and is used as a baseline.

## Association rules

In [6]:
from orangecontrib.associate.fpgrowth import * 
from scipy.sparse import issparse

In [7]:
data = Orange.data.Table("podatki/foodmart.basket")

In [8]:
X, mapping = OneHot.encode(data)

In [9]:
ITEM_FMT = '{}' if issparse(data.X) else '{}={}'

names = {item: ('{}={}' if var is data.domain.class_var else ITEM_FMT).format(var.name, val)
                 for item, var, val in OneHot.decode(mapping, data, mapping)}

itemsets = {}
for itemset, support in frequent_itemsets(X, 0.01 / 100):
    itemsets[itemset] = support
    for rule in association_rules(itemsets, 0.99, itemset):
        left, right, support, confidence = rule
        left_str =  ', '.join(names[i] for i in sorted(left))
        right_str = ', '.join(names[i] for i in sorted(right))
        print(left_str+" -> "+right_str)

Personal Hygiene, Cooking Oil, Fresh Fruit->Fresh Vegetables
Cheese, Muffins, TV Dinner->Dried Fruit
Flavored Drinks, Peanut Butter, Chocolate Candy->Fresh Vegetables
Flavored Drinks, Sliced Bread, Nuts->Fresh Fruit
Milk, Bologna, Chocolate Candy, Soda->Fresh Vegetables
Fresh Vegetables, Milk, Bologna, Soda->Chocolate Candy
Bologna, Dried Fruit, Ice Cream->Fresh Vegetables
Fresh Vegetables, Plastic Utensils, Deli Meats, Wine->Dried Fruit
Fresh Vegetables, Plastic Utensils, Bologna, Soda->Chocolate Candy
Milk, Plastic Utensils, Bologna, Chocolate Candy->Fresh Vegetables
Bologna, Shampoo, Canned Fruit->Fresh Vegetables
Hard Candy, Bologna, Batteries->Fresh Vegetables
Hard Candy, Cleaners, Waffles->Fresh Vegetables
Personal Hygiene, Paper Dishes, Yogurt->Fresh Vegetables
Cereal, Beer, Soda->Frozen Chicken
Fresh Fruit, Tuna, Crackers->Fresh Vegetables
Coffee, Tools, Sports Magazines->Fresh Fruit
Plastic Utensils, Jam, Anchovies->Cooking Oil
Chocolate Candy, Frozen Vegetables, Home Magazine