# Apriori - Ejemplo - Movies

**Contexto**  
Este conjunto de datos ficticio, con una lista de películas asociadad de manera aleatoria.  

**Contenido**  
Contiene 7501 renglones y 20 columnas, una por película.

**Planteamiento del problema**  
Se busca encontrar las asociaciones de películas que pudieran recomendarse al mismo tiempo.

In [1]:
# Importar librerias
import pandas as pd

from apyori import apriori

## Cargar Datos

In [2]:
# Importar los datos
df = pd.read_csv('movie_dataset.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,The Revenant,13 Hours,Allied,Zootopia,Jigsaw,Achorman,Grinch,Fast and Furious,Ghostbusters,Wolverine,Mad Max,John Wick,La La Land,The Good Dunosaur,Ninja Turtles,The Good Dunosaur Bad Moms,2 Guns,Inside Out,Valerian,Spiderman 3
1,Beirut,Martian,Get Out,,,,,,,,,,,,,,,,,
2,Deadpool,,,,,,,,,,,,,,,,,,,
3,X-Men,Allied,,,,,,,,,,,,,,,,,,
4,Ninja Turtles,Moana,Ghost in the Shell,Ralph Breaks the Internet,John Wick,,,,,,,,,,,,,,,


## EDA

In [3]:
# Revisar los datos
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
 1   1       5747 non-null   object
 2   2       4389 non-null   object
 3   3       3345 non-null   object
 4   4       2529 non-null   object
 5   5       1864 non-null   object
 6   6       1369 non-null   object
 7   7       981 non-null    object
 8   8       654 non-null    object
 9   9       395 non-null    object
 10  10      256 non-null    object
 11  11      154 non-null    object
 12  12      87 non-null     object
 13  13      47 non-null     object
 14  14      25 non-null     object
 15  15      8 non-null      object
 16  16      4 non-null      object
 17  17      4 non-null      object
 18  18      3 non-null      object
 19  19      1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


## Modelado

In [4]:
# Transformar cataframe para uso del algoritmo
transactions = []
for i in range(0, 7501):
    transactions.append([str(df.values[i, j]) for j in range(0,20)])

In [5]:
# Entrenamiento
model = apriori(transactions, min_support = 0.003 , min_confidence = 0.3,
                min_lift = 3, min_length = 2)

## Visualizacion

In [6]:
# Resultados
results = list(model)

In [7]:
def inspect(results):
    rh          = [tuple(result[2][0][0]) for result in results]
    lh          = [tuple(result[2][0][1]) for result in results]
    support     = [result[1] for result in results]
    confidence  = [result[2][0][2] for result in results]
    lift        = [result[2][0][3] for result in results]
    return list(zip(rh, lh, support, confidence, lift))

In [8]:
# Convierte a dataframe para ver los resultados
resultdf = pd.DataFrame(inspect(results),
           columns=['rhs','lhs','support','confidence','lift'])

In [9]:
resultdf

Unnamed: 0,rhs,lhs,support,confidence,lift
0,"(Red Sparrow,)","(Green Lantern,)",0.005733,0.300699,3.790833
1,"(Star Wars,)","(Green Lantern,)",0.005866,0.372881,4.700812
2,"(Kung Fu Panda,)","(Jumanji,)",0.015998,0.323450,3.291994
3,"(Wonder Woman,)","(Jumanji,)",0.005333,0.377358,3.840659
4,"(Star Wars,)","(The Revenant,)",0.005066,0.322034,4.506672
...,...,...,...,...,...
97,"(Intern, Ninja Turtles, Moana)","(nan, Spiderman 3)",0.003333,0.301205,4.582834
98,"(Intern, Thor)","(nan, Ninja Turtles, Moana)",0.003066,0.383333,7.987176
99,"(Ninja Turtles, The Revenant, Tomb Rider)","(Intern, nan)",0.003333,0.390625,4.098011
100,"(Intern, Ninja Turtles, World War Z)","(nan, Tomb Rider)",0.003066,0.522727,3.002280


In [10]:
resultdf.sort_values(by = ['support'],axis=0,ascending=False).head(30)

Unnamed: 0,rhs,lhs,support,confidence,lift
2,"(Kung Fu Panda,)","(Jumanji,)",0.015998,0.32345,3.291994
28,"(Kung Fu Panda,)","(nan, Jumanji)",0.015998,0.32345,3.291994
62,"(Intern, Tomb Rider)","(nan, Jumanji)",0.008666,0.311005,3.165328
18,"(Intern, Tomb Rider)","(Jumanji,)",0.008666,0.311005,3.165328
68,"(Ninja Turtles, The Revenant)","(Intern, nan)",0.007199,0.305085,3.200616
21,"(Ninja Turtles, The Revenant)","(Intern,)",0.007199,0.305085,3.200616
71,"(Tomb Rider, World War Z)","(Intern, nan)",0.006666,0.318471,3.341054
26,"(Ninja Turtles, Kung Fu Panda)","(Jumanji,)",0.006666,0.390625,3.975683
23,"(Tomb Rider, World War Z)","(Intern,)",0.006666,0.318471,3.341054
74,"(Ninja Turtles, Kung Fu Panda)","(nan, Jumanji)",0.006666,0.390625,3.975683
