# Sistema de recomendación de chistes

In [1]:
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd 
from sklearn.neighbors import NearestNeighbors

El objetivo es aplicar todo lo que hemos aprendido hasta el momento, especialmente sobre gradiente descendiente. Para esto desarrollaremos un sistema de recomendación usando factorización de matrices.

En esta entrega se darán todos elementos necesarios para desarrollar un sistema de recomendación de chistes, el objetivo es que usted entienda como funciona y desarrolle una solución.

Si desea conocer mas sobre el dataset que usaremos puede revisar acá: https://goldberg.berkeley.edu/jester-data/

1- Cargue las bases de datos de calificaciones, usuarios y chistes. Puede usar para este objetivo la biblioteca pandas.

In [2]:
calificaciones = pd.read_csv("Datasets/rating.csv", sep=";", usecols=range(1,101))
usuarios       = pd.read_csv("Datasets/users.csv",  sep=";")
chistes        = pd.read_csv("Datasets/jokes.csv",  sep="\n")

In [3]:
def imprimir_usuario(indice):
    columnas = usuarios.columns
    usuario = usuarios.to_numpy()[indice]
    
    print("\x1b[1;33m" + "Usuario #{}".format(usuario[0]))
    
    for i in range(1, len(columnas)):
        print("\x1b[1;35m" + columnas[i].upper() + "\x1b[0m" +": {}".format(usuario[i]), end="\t")
        if i == 1:
            print(end="\t")
    
    print("\n")
    
def imprimir_chistes(chistes):
    if len(chistes):
        print("\x1b[1;32m" + "Chistes recomendados:\n" + "\x1b[0m")
    
        for i, chiste in enumerate(chistes):
            print("\x1b[1;36m" + str(1+i) + ". " + "\x1b[0m" + chiste)
            
        print("\x1b[1;34m" + "\nRecomendaciones totales: " + "\x1b[0m" + str(len(chistes)))
    else:
        print("\x1b[1;31m" + "No hay recomendaciones para este usuario" + "\x1b[0m")
        
    print("\n")

2- La matriz de calificaciones debe pasar por un proceso para poder se usada en el algoritmo de factorización de matrices. Para esto se debe:
    - Redondear los valores de las calificaciones y trabajar solo con valores enteros.
    - Cambiar las calificaciones 99 por 0.
    - Cambiar el rango de calificaciones de -10 a 10 por 0 a 10.
    
Recuerde que debe garantizar que la matriz de calificaciones deben ser números enteros.

In [4]:
calificaciones = np.round(calificaciones)
calificaciones = np.round((10 + calificaciones)/2)
calificaciones[calificaciones > 10] = 0

3- Usando el algoritmo de vecinos más cercanos de scikit-learn programe una función que reciba como parámetro las calificaciones de un usuario y la matrix de calificaciones. Usando 20 vecinos devuelva los indices de los vecinos más cercanos al usuario.

Link a scikit-learn: https://scikit-learn.org/stable/modules/neighbors.html

In [5]:
def indices_cercanos(matriz, fila, n=20):
    nbrs = NearestNeighbors(n_neighbors=n, algorithm="auto").fit(matriz)
    distancias, indices = nbrs.kneighbors(fila)
    return indices[0]

def interpretar_indices(matriz, indices):
    n = len(indices)
    R = list()
    
    for i in range(n):
        R.append(matriz[indices[i]])
    
    return np.array(R)

4- Agregue la función donde se ejecuta el algoritmo de factorización de matrices. Puede basarse en el algoritmo compartido en las dispositivas.

In [6]:
def factorizacion(R, P, Q, K, steps=5000, alpha=0.002, beta=0.02, error=0.005):
    Q = Q.T
    err_relativo = 1
    err_anterior = 0
    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    eij = R[i][j] - P[i,:] @ Q[:,j]
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
                   
        err_actual = 0
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    err_actual = err_actual + pow(R[i][j] - P[i,:] @ Q[:,j], 2)
                    for k in range(K):
                        err_actual = err_actual + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2))
                        
        err_relativo = error_relativo(err_actual, err_anterior)
        err_anterior = err_actual
        
        if err_relativo < error:
            break
    return P, Q.T

def error_relativo(actual, anterior):
    return abs((actual - anterior) / actual)

5- Programe una función para generar $\hat{R}$. Recuerde que para generar $\hat{R}$ debe realizar lo siguiente:

- Generar R usando el algoritmo de vecinos más cercanos.
- Defina K.
- Defina P y Q.

In [7]:
def r(matriz, indice, n=20):
    fila = np.array([matriz[indice]])
    
    indices = indices_cercanos(matriz, fila, n)
    R = interpretar_indices(matriz, indices)
    
    return R

def r_prima(R, k):
    N = R.shape[0]
    M = R.shape[1]

    np.random.seed(0)
    P = np.random.rand(N,k)
    Q = np.random.rand(M,k)
    
    nP, nQ = factorizacion(R, P, Q, k)
    nR = nP @ nQ.T
    
    nR[nR > 10] = 10
    nR[nR <  0] = 0
    
    return np.round(nR)

6- Realice recomendaciones para los usuarios: 1, 470, 1241, 3044, 5758, 8105, 8899, 10597, 17391, 19821. Devueva como resultado todos los chistes que tiene una calificaciones de 7 o superior.

In [8]:
def sugerir(usuario, prediccion, cond):
    recomendaciones = list()
    
    for i in range(len(usuario)):
        if usuario[i] == 0 and cond(prediccion[i]):
            recomendaciones.append(i)
                    
    return recomendaciones

def recomendar_chistes(R, condicion, k=10):
    nR = r_prima(R, k)
    
    recomendaciones = sugerir(R[0], nR[0], condicion)
    
    print(str(recomendaciones) + "\n")
    print(str(len(recomendaciones)) + "\n")
    
    chistes_recomendados = interpretar_indices(chistes.to_numpy().T[0], recomendaciones)
    
    return chistes_recomendados

def recomendar_chistes_por_calificacion(usuario, k=10):
    R = r(calificaciones.to_numpy(), usuario)
    
    condicion = lambda calificacion: calificacion >= 7
    
    recomendacion = recomendar_chistes(R, condicion, k)
    return recomendacion

In [9]:
k = 50
lista_usuarios = [1, 470, 1241, 3044, 5758, 8105, 8899, 10597, 17391, 19821]

for usuario in lista_usuarios:
    imprimir_usuario(usuario)
    recomendaciones = recomendar_chistes_por_calificacion(usuario, k)
    chistes.to_numpy().T[0]
    imprimir_chistes(recomendaciones)

[1;33mUsuario #2
[1;35mAGE[0m: 73		[1;35mGENDER[0m: female	[1;35mCOUNTRY[0m: U.S	[1;35mLANGUAJE[0m: English	[1;35mCATEGORY[0m: Adventure	

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #471
[1;35mAGE[0m: 52		[1;35mGENDER[0m: other	[1;35mCOUNTRY[0m: Colombia	[1;35mLANGUAJE[0m: French	[1;35mCATEGORY[0m: Comedy	

[3, 23, 29, 32, 36, 57, 70, 71, 72, 74, 77, 78, 79, 81, 82, 85, 86, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

27

[1;32mChistes recomendados:
[0m
[1;36m1. [0mQ. What's the difference between a man and a toilet?   A. A toilet doesn't follow you around after you use it.
[1;36m2. [0mWhat do you get when you run over a parakeet with a lawnmower?  Shredded tweet.
[1;36m3. [0mQ: What's the difference between a Lawyer and a Plumber?  A: A Plumber works to unclog the system.
[1;36m4. [0mWhat do you call an American in the finals of the world cup?  "Hey Beer Man!"
[1;36m5. [0mA Jewish young man was seeing a psychiatrist for an 

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #3045
[1;35mAGE[0m: 31		[1;35mGENDER[0m: female	[1;35mCOUNTRY[0m: Spain	[1;35mLANGUAJE[0m: Spanish	[1;35mCATEGORY[0m: Comedy	

[73, 74, 77, 78, 79, 81, 86, 87, 88, 91, 92, 93, 95, 97]

14

[1;32mChistes recomendados:
[0m
[1;36m1. [0mQ: How many stalkers does it take to change a light bulb?  A: Two. One to replace the bulb, and the other to watch it day and night.
[1;36m2. [0mQ: Do you know the difference between an intelligent male and the Sasquatch?  A: There have been actual reported sightings of the Sasquatch.
[1;36m3. [0mQ: What's the difference between the government  and  the Mafia?  A: One of them is organized.
[1;36m4. [0mQ: Ever wonder why the IRS calls it Form 1040?  A: Because for every $50 that you earn, you get 10 and they get 40.
[1;36m5. [0mHillary, Bill Clinton and the Pope are sitting together on an airplane.  Bill says "I could throw one thousand dollar bill out of this p

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #8900
[1;35mAGE[0m: 16		[1;35mGENDER[0m: male	[1;35mCOUNTRY[0m: Canada	[1;35mLANGUAJE[0m: English	[1;35mCATEGORY[0m: Comedy	

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #10598
[1;35mAGE[0m: 76		[1;35mGENDER[0m: male	[1;35mCOUNTRY[0m: Russia	[1;35mLANGUAJE[0m: English	[1;35mCATEGORY[0m: Animation	

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #17392
[1;35mAGE[0m: 14		[1;35mGENDER[0m: male	[1;35mCOUNTRY[0m: Argentina	[1;35mLANGUAJE[0m: Spanish	[1;35mCATEGORY[0m: Animation	

[]

0

[1;31mNo hay recomendaciones para este usuario[0m


[1;33mUsuario #19822
[1;35mAGE[0m: 47		[1;35mGENDER[0m: male	[1;35mCOUNTRY[0m: Argentina	[1;35mLANGUAJE[0m: Italian	[1;35mCATEGORY[0m: Comedy	

[3, 9, 32, 42, 70, 72, 77, 81, 86, 91, 95, 97, 98]

13

[1;32mChistes recomendados:
[0m
[1;36m1. [0mQ. What's the difference between 

7- Programe un algoritmo para realizar recomendaciones a usuarios con base en la informacion básica en el dataset usuarios. Use el algortimo de vecinos más cercanos y el algoritmo de factorización de matrices para este objetivo.

In [10]:
def procesar_tabla(tabla):
    matriz = tabla.to_numpy().T
    
    for columna in range(len(matriz)):
        tipos = list()
        
        for fila in range(len(matriz[columna])):
            dato = matriz[columna][fila]
            
            if isinstance(dato, str):
                if not dato in tipos:
                    tipos.append(dato)
                matriz[columna][fila] = tipos.index(dato)
            else:
                break
    
    return matriz.T

def recomendar_chistes_por_usuario(usuario, k=10):
    matriz_usuarios = procesar_tabla(usuarios)[:,1:]
    similares = indices_cercanos(matriz_usuarios, [matriz_usuarios[usuario]])
    R = interpretar_indices(calificaciones.to_numpy(), similares)
    
    condicion = lambda calificacion: calificacion >= 7
    
    recomendacion = recomendar_chistes(R, condicion, k)
    return recomendacion

In [11]:
k = 50
lista_usuarios = [1, 470, 1241, 3044, 5758, 8105, 8899, 10597, 17391, 19821]

for usuario in lista_usuarios:
    imprimir_usuario(usuario)
    recomendaciones = recomendar_chistes_por_usuario(usuario, k)
    imprimir_chistes(recomendaciones)

[1;33mUsuario #2
[1;35mAGE[0m: 73		[1;35mGENDER[0m: female	[1;35mCOUNTRY[0m: U.S	[1;35mLANGUAJE[0m: English	[1;35mCATEGORY[0m: Adventure	

[5]

1

[1;32mChistes recomendados:
[0m
[1;36m1. [0mBill & Hillary are on a trip back to Arkansas. They're almost out of gas, so Bill pulls into a service station on the outskirts of town. The attendant runs out of the station to serve them when Hillary realizes it's an old boyfriend from high school. She and the attendant chat as he gases up their car and cleans the windows. Then they all say good-bye.   As Bill pulls the car onto the road, he turns to Hillary and says, 'Now aren't you glad you married me and not him ? You could've been the wife of a grease monkey !'   To which Hillary replied, 'No, Bill. If I would have married him, you'd be pumping gas and he would be the President !'
[1;34m
Recomendaciones totales: [0m1


[1;33mUsuario #471
[1;35mAGE[0m: 52		[1;35mGENDER[0m: other	[1;35mCOUNTRY[0m: Colombia	[1;35mLANGUAJ

[14, 70, 72, 75, 77, 79, 82, 87, 89, 91, 92, 95, 96, 97, 98, 99]

16

[1;32mChistes recomendados:
[0m
[1;36m1. [0mQ:  What did the blind person say when given some matzah?  A:  Who the hell wrote this?
[1;36m2. [0mAt a recent Sacramento PC Users Group meeting, a company was demonstrating its latest speech- recognition software.   A representative from the company was just about ready to start the demonstration and asked everyone in the room to quiet down.  Just then someone in the back of the room yelled, "Format C: Return."  Someone else chimed in: "Yes, Return"  Unfortunately, the software worked.
[1;36m3. [0mQ: What is the difference between George  Washington, Richard Nixon, and Bill Clinton?  A: Washington couldn't tell a lie, Nixon couldn't   tell the truth, and Clinton doesn't know the difference.
[1;36m4. [0mThere once was a man and a woman that both  got in  a terrible car wreck. Both of their vehicles   were completely destroyed, buy fortunately, no one  was   hurt.

[70, 72, 79, 93, 96, 99]

6

[1;32mChistes recomendados:
[0m
[1;36m1. [0mAt a recent Sacramento PC Users Group meeting, a company was demonstrating its latest speech- recognition software.   A representative from the company was just about ready to start the demonstration and asked everyone in the room to quiet down.  Just then someone in the back of the room yelled, "Format C: Return."  Someone else chimed in: "Yes, Return"  Unfortunately, the software worked.
[1;36m2. [0mQ: What is the difference between George  Washington, Richard Nixon, and Bill Clinton?  A: Washington couldn't tell a lie, Nixon couldn't   tell the truth, and Clinton doesn't know the difference.
[1;36m3. [0mHillary, Bill Clinton and the Pope are sitting together on an airplane.  Bill says "I could throw one thousand dollar bill out of this plane and make one person very happy."  Hillary says "I could throw 10 hundred dollar bills out of the plane and make 10 people very happy."  The Pope chips in and says "