# Apriori Algorithm

It is an algorithm used in data mining, on transactional databases, which allows to find efficiently "sets of frequent items", which serve as a basis to generate association rules. It proceeds by identifying the frequent individual items in the base and extending them to larger sets as long as those data sets appear sufficiently followed in said database. This algorithm has been widely applied in the analysis of commercial transactions and in prediction problems.


> * _Agrawal_
> * _Srikant_

In [1]:
import pandas as pd
from itertools import *
import numpy as np

In [2]:
dta = pd.DataFrame({
    "TID" :["T{}".format(i+1) for i in range(5)],
    #"List of Items":[[1,2,5],[1,4],[2,3],[1,2,4],[1,3],[2,3],[1,3],[1,3],[1,2,3,5],[1,2,3]]    
    "List of Items":[['A','C','D'],['B','C','E'],['A','B','C','E'],['B','E'],['A','B','C','D']]        
})

#### Database TDB

In [3]:
dta

Unnamed: 0,TID,List of Items
0,T1,"[A, C, D]"
1,T2,"[B, C, E]"
2,T3,"[A, B, C, E]"
3,T4,"[B, E]"
4,T5,"[A, B, C, D]"


#### Mínimo soporte

In [4]:
minsup = 2

#### Candidatos

In [5]:
#Conseguimos los candidatos principales del data Frame
candidates = set()
for items in dta["List of Items"]:
    candidates = candidates.union(set(items))
print("Candidates = {}".format(candidates))

Candidates = {'B', 'C', 'A', 'E', 'D'}


#### Algoritmo apriori

La siguiente función encunetra los patrones frecuentes usando el algoritmo apriori donde:
    1. canidates: es el conjunto de candidatos principal.
    2. minsup es el mínimo soporte con el que se va a trabajar
    3. dataFrame : es el data frame del las transacciones de la TDB
    
Para cada conjunto de candidatos, la función guarda en una lista todos los data frames generados y dicha lista la retorna.

In [12]:
def Apriori(candidates,minsup,dataFrame):
    it = 1
    can = set(candidates)
    dtaFrs = list()
    while True:        
        set_can_aux = set(combinations(can,it)) #Conjunto de tuplas, donde cada tupla tiene los candidatos
        if set_can_aux: #Verificamos si el conjunto no está vacio
            L = list()
            x = set()
            for e in set_can_aux: #Para cada tupla del conjunto de candidatos
                sup = 0            
                A = set(e)
                for t in dataFrame["List of Items"]: #Para cada transaccion del dataframe
                    B = set(t)
                    if A.issubset(B): #Si A es subconjunto de B 
                        sup +=1                    
                if sup >= minsup:
                    x = x.union(set(e))
                    L.append(["{} --- {}".format(A,sup)])
            df = pd.DataFrame(data=np.array(L),columns=["Support"])
            dtaFrs.append(df)
            can = x
            it += 1 
        else:
            break
    return dtaFrs

#### Resultados

In [13]:
dfs = Apriori(candidates,minsup,dta)
for i in range(len(dfs)):
    print("-"*40)
    print (dfs[i], "\n")

----------------------------------------
       Support
0  {'A'} --- 3
1  {'D'} --- 2
2  {'B'} --- 4
3  {'E'} --- 3
4  {'C'} --- 4 

----------------------------------------
            Support
0  {'A', 'C'} --- 3
1  {'D', 'A'} --- 2
2  {'B', 'A'} --- 2
3  {'E', 'B'} --- 3
4  {'D', 'C'} --- 2
5  {'B', 'C'} --- 3
6  {'E', 'C'} --- 2 

----------------------------------------
                 Support
0  {'A', 'D', 'C'} --- 2
1  {'E', 'B', 'C'} --- 2
2  {'B', 'A', 'C'} --- 2 

----------------------------------------
Empty DataFrame
Columns: [Support]
Index: [] 

