# Práctica 3: Reglas de asociación
Duración: 2 sesiones

Objetivo: El objetivo de esta práctica es introducir los conceptos de básicos de la obtención de reglas de asociación a partir de un lista de transacciones u otro tipo de ficheros de datos.

Realice los siguientes ejercicios usando el módulo scikit-learn de Python y cualquier otro módulo adicional que considere:

1. Utilizando el conjunto de datos “store_data.csv” ejecute el programa “assoc.py” para comprobar su funcionamiento. Intente interpretar las reglas obtenidas e indicar cuáles de ellas son importantes.
2. Usando los conjuntos “titanic.csv” y “bank-data-final.arff ” ejecute de nuevo el programa de generación de reglas,
ordenando los resultados según su valor de lift. Interprete las reglas que se obtienen indicando su evaluación
objetiva de interés.
3. Seleccione al menos un nuevo conjunto de datos de los suministrados en los repositorios habituales. Para que se
pueda usar el método a priori es necesario que los conjuntos no contengan variables numéricas.
4. Seleccione las 5 mejores reglas usando las medida de confianza y lift. Compare las reglas obtenidas y comente
qué información puede obtener de algunas de las reglas.
5. Para uno de los conjuntos anteriores a su elección estudie como evoluciona la generación de reglas cuando se
modifican los valores de soporte y confianza mínimos en la generación de reglas.

In [10]:
pip install apyori



# Programa assoc.py con el dataset store_data.csv

In [11]:
#!  /bin/python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv('/content/sample_data/store_data.csv')
print(dataset.shape)

# Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, dataset.shape[1])])


from apyori import apriori
# Please download this as a custom package --> type "apyori"
# To load custom packages, do not refresh the page. Instead, click on the reset button on the Console.

rules = apriori(transactions, min_support = 0.005, min_confidence = 0.25, min_lift = 3, min_length = 2)
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules

#Try many combinations of values to experiment with the model

results = []
for item in rules:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

    results.append(item)


# viewing the rules
print(len(results))

# Transferring the list to a table
results = pd.DataFrame(results)

print(results.head(5))

(7500, 20)
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
Rule: ground beef -> herb & pepper
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
Rule: ground beef -> tomato sauce
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
Rule: pasta -> shrimp
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
Rule: frozen vegetables -> shrimp
Support: 0.005333333333333333
Confidence: 0.29629629629629634
Lift: 3.1080031080031083
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: escalope -> nan
Support: 0.005866666666666667
Confidence: 0.

# Programa assoc.py con titanic.csv

In [12]:
#!  /bin/python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv('/content/sample_data/titanic.csv')
print(dataset.shape)

# Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, dataset.shape[1])])


from apyori import apriori
# Please download this as a custom package --> type "apyori"
# To load custom packages, do not refresh the page. Instead, click on the reset button on the Console.

rules = apriori(transactions, min_support = 0.005, min_confidence = 0.25, min_lift = 3, min_length = 2)
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules

#Try many combinations of values to experiment with the model

results = []
for item in rules:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

    results.append(item)

# Ordenamos los resultados segun su valor de lift
results = sorted(results, key=lambda x: x[2][0][3], reverse=True)

# viewing the rules
print(len(results))

# Transferring the list to a table
results = pd.DataFrame(results)

print(results.head(5))

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Rule: 4 -> 3
Support: 0.010101010101010102
Confidence: 0.391304347826087
Lift: 6.836317135549872
Rule: 4 -> 3
Support: 0.006734006734006734
Confidence: 0.2608695652173913
Lift: 6.640993788819875
Rule: 4 -> 3
Support: 0.010101010101010102
Confidence: 0.391304347826087
Lift: 6.011244377811095
Rule: 3 -> 46.9
Support: 0.005611672278338945
Confidence: 0.8333333333333333
Lift: 148.5
Rule: 3 -> 46.9
Support: 0.005611672278338945
Confidence: 0.8333333333333333
Lift: 148.5
Rule: 3 -> 46.9
Support: 0.005611672278338945
Confidence: 0.8333333333333333
Lift: 14.558823529411763
Rule: 3 -> 46.9
Support: 0.005611672278338945
Confidence: 0.8333333333333333
Lift: 12.801724137931034
Rule: 3 -> 5
Support: 0.005611672278338945
Confidence: 0.45454545454545453
Lift: 81.0
Rule: 3 -> 5
Support: 0.005611672278338945
Confidence: 0.45454545454545453
Lift: 7.941176470588235
Rule: 3 -> nan
Support: 0.005611672278338945
Confidence: 0.4545454

In [13]:
print(results.head(5))


                     items   support  \
0         (382652, 29.125)  0.005612   
1     (73.5, S.O.C. 14879)  0.005612   
2      (382652, 29.125, 0)  0.005612   
3  (73.5, S.O.C. 14879, 0)  0.005612   
4  (73.5, S.O.C. 14879, 2)  0.005612   

                                  ordered_statistics  
0  [((29.125), (382652), 1.0, 178.20000000000002)...  
1  [((73.5), (S.O.C. 14879), 1.0, 178.20000000000...  
2  [((29.125), (382652, 0), 1.0, 178.200000000000...  
3  [((73.5), (S.O.C. 14879, 0), 1.0, 178.20000000...  
4  [((73.5), (2, S.O.C. 14879), 1.0, 178.20000000...  


# Programa assoc.py con bank-data-final.arff

In [14]:
from scipy.io import arff
import pandas as pd
data = arff.loadarff('/content/sample_data/bank-data-final.arff')
df = pd.DataFrame(data[0])
df.to_csv('bank-data-final.csv', index=False)


In [15]:
#!  /bin/python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv('bank-data-final.csv')
print(dataset.shape)

# Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, dataset.shape[1])])


from apyori import apriori
# Please download this as a custom package --> type "apyori"
# To load custom packages, do not refresh the page. Instead, click on the reset button on the Console.

rules = apriori(transactions, min_support = 0.005, min_confidence = 0.25, min_lift = 3, min_length = 2)
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules

#Try many combinations of values to experiment with the model

results = []
for item in rules:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

    results.append(item)

# Ordenamos los resultados segun su valor de lift
results = sorted(results, key=lambda x: x[2][0][3], reverse=True)

# viewing the rules
print(len(results))

# Transferring the list to a table
results = pd.DataFrame(results)

print(results.head(5))

(600, 11)
Rule: b'52_max' -> b'43759_max'
Support: 0.02666666666666667
Confidence: 1.0
Lift: 3.1413612565445024
Rule: b'52_max' -> b'43759_max'
Support: 0.03666666666666667
Confidence: 0.275
Lift: 3.5106382978723403
Rule: b'3' -> b'43759_max'
Support: 0.013333333333333334
Confidence: 1.0
Lift: 6.25
Rule: b'52_max' -> b'FEMALE'
Support: 0.06833333333333333
Confidence: 1.0
Lift: 3.1413612565445024
Rule: b'52_max' -> b'43759_max'
Support: 0.058333333333333334
Confidence: 0.4375
Lift: 3.28125
Rule: b'52_max' -> b'43759_max'
Support: 0.03
Confidence: 0.5454545454545454
Lift: 4.090909090909091
Rule: b'52_max' -> b'43759_max'
Support: 0.018333333333333333
Confidence: 1.0
Lift: 3.1413612565445024
Rule: b'52_max' -> b'43759_max'
Support: 0.02666666666666667
Confidence: 1.0
Lift: 3.1413612565445024
Rule: b'52_max' -> b'FEMALE'
Support: 0.028333333333333332
Confidence: 0.4146341463414634
Lift: 3.189493433395872
Rule: b'52_max' -> b'43759_max'
Support: 0.03
Confidence: 0.5454545454545454
Lift: 3.5

# Programa assoc.py con adult-stretch.data

In [16]:
#!  /bin/python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv('/content/sample_data/adult-stretch.data')
print(dataset.shape)

# Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, dataset.shape[1])])


from apyori import apriori
# Please download this as a custom package --> type "apyori"
# To load custom packages, do not refresh the page. Instead, click on the reset button on the Console.

rules = apriori(transactions, min_support = 0.005, min_confidence = 0.25, min_lift = 3, min_length = 2)
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules

#Try many combinations of values to experiment with the model

results = []
for item in rules:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

    results.append(item)

# Ordenamos los resultados segun su valor de lift
results = sorted(results, key=lambda x: x[2][0][3], reverse=True)

# viewing the rules
print(len(results))

# Transferring the list to a table
results = pd.DataFrame(results)

print(results.head(5))

(19, 5)
Rule: SMALL -> DIP
Support: 0.10526315789473684
Confidence: 0.6666666666666666
Lift: 3.1666666666666665
Rule: DIP -> T
Support: 0.10526315789473684
Confidence: 0.6666666666666666
Lift: 3.1666666666666665
Rule: SMALL -> STRETCH
Support: 0.10526315789473684
Confidence: 0.5
Lift: 3.166666666666667
Rule: STRETCH -> T
Support: 0.10526315789473684
Confidence: 0.5
Lift: 3.166666666666667
Rule: DIP -> YELLOW
Support: 0.05263157894736842
Confidence: 0.3333333333333333
Lift: 3.1666666666666665
Rule: DIP -> ADULT
Support: 0.05263157894736842
Confidence: 0.3333333333333333
Lift: 3.1666666666666665
Rule: DIP -> YELLOW
Support: 0.05263157894736842
Confidence: 0.3333333333333333
Lift: 3.1666666666666665
Rule: YELLOW -> ADULT
Support: 0.05263157894736842
Confidence: 1.0
Lift: 3.166666666666667
Rule: ADULT -> SMALL
Support: 0.05263157894736842
Confidence: 1.0
Lift: 3.166666666666667
Rule: YELLOW -> CHILD
Support: 0.05263157894736842
Confidence: 0.3333333333333333
Lift: 3.1666666666666665
Rule: 