# Proyecto Final

***Equipo 07***

- Aide Jazmín González Cruz
- Elena Villalobos Nolasco
- Carolina Acosta Tovany

#### Instrucciones

El proyecto/examen final consistirá en:

La implementación del algoritmo de filtrado colaborativo utilizando la metodología vista en clase (uso de otra metodología no se calificará).

Todos los algoritmos de aprendizaje de máquina que se utilicen deberán haber sido creados por ustedes. Sólo podrán utilizar Transformers y funciones de apoyo de scikit-learn (para realizar la división de los datos en entrenamiento y prueba, o el procedimiento de validación cruzada, etc.) mas ningún estimator (regresión logística, máquina de vectores de soporte, k medias, etc.). 

Se deberá explicar como se obtuvo la k con la que se generó el resultado final.

Se utilizarán los archivos con el conjunto pequeño de calificaciones y películas ubicado en la siguiente https://www.kaggle.com/rounakbanik/the-movies-dataset:

- **links_small.csv**: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.

- **ratings_small.csv**: The subset of 100,000 ratings from 700 users on 9,000 movies.

Con el fin de mejorar la calificación (opcional, puntos extra), se podrán utilizar los algoritmos desarrollado en las tareas del curso y los datos relevantes (los que hacen match con los datos anteriores) contenidos en los archivos:

- **movies_metadata.csv**: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.

- **keywords.csv**: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.

- **credits.csv**: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object.

La métrica con la que se determinará el desempeño del algoritmo es el NDCG 

(https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)

Una vez obtenida la matriz de calificaciones, el programa deberá ser capaz de regresar las 5 mejores recomendaciones del o de los usuarios que se consulten.

El proyecto se entregará en un Jupyter notebook. El readme file debe contener las instrucciones para que se ejecute el código. Deben cerciorarse que siguiendo esas instrucciones el programa corre sin errores. 

Se deberá subir a la carpeta proyecto_final/equipo_xx en el repositorio GitHub antes de las 7:00 am del día del examen final (14 de diciembre de 2020).    

In [83]:
# Importación de paqueterías necesarias
import pandas as pd
import numpy as np
import random

La función objetivo:
    
$$J(X) = \frac{1}{2} \displaystyle\sum_{(a,i)\in\mathbb{D}} \left(Y_{ai}-\left [ UV^T \right ]_{ai} \right)^2 + \frac{\lambda}{2} \displaystyle\sum_{a=1}^n \displaystyle\sum_{j=1}^k U_{aj}^2 + \frac{\lambda}{2} \displaystyle\sum_{i=1}^m \displaystyle\sum_{j=1}^k V_{ij}^2$$

In [1]:
def load_data():
    """
    Carga datos
    Regresa dataframe de Usuarios, Películas y raitings
    """
    
    # Carga de datos
    links_small = pd.read_csv('links_small.csv')
    ratings_small = pd.read_csv('ratings_small.csv')
    
    # Películas en catálogo que no han calificado los usuarios
    #df_mov_u = pd.DataFrame(ratings_small['movieId'])
    #df_mov = pd.DataFrame(links_small['movieId'])

    #common = df_mov.merge(df_mov_u, on=["movieId"])
    #result = df_mov[~df_mov.movieId.isin(common.movieId)]
    
    # Construyendo la matriz Y_ai
    y_ia = links_small.set_index('movieId').join(ratings_small.set_index('movieId'))
    y_ia = y_ia.reset_index()
    #y_ia.pivot(index="userId", columns="movieId", values="rating") 
    y_ia = pd.DataFrame(y_ia.pivot(index='userId', columns='movieId', values='rating'))
    y_ia = pd.DataFrame(y_ia.to_records())
    # Eliminando usuario Nan
    y_ia = y_ia[pd.notnull(y_ia['userId'])]
    # Borrando columna 1 con user_id
    y_ia = y_ia.drop(['userId'], axis=1)
    # Cambiando Nan por zeros
    #y_ia[np.isnan(y_ia)] = 0
    
    return y_ia


In [9]:
Y = load_data()
Y.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,161830,161918,161944,162376,162542,162672,163056,163949,164977,164979
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,4.0,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,4.0,...,,,,,,,,,,
5,,,4.0,,,,,,,,...,,,,,,,,,,


In [10]:
maxValues = Y.max() 
maxValues = pd.DataFrame(maxValues)
maxValues = maxValues.max() 
maxValues

0    5.0
dtype: float64

In [11]:
minValues = Y.min() 
minValues = pd.DataFrame(minValues)
minValues = minValues.min() 
minValues

0    0.5
dtype: float64

In [12]:
# Cambiando Nan por zeros
Y_0 = Y.copy()
Y_0[np.isnan(Y_0)] = 0
Y_0

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,161830,161918,161944,162376,162542,162672,163056,163949,164977,164979
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
667,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
668,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
669,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
670,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
# Cambiando Nan a boleanos
Y.isna()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,161830,161918,161944,162376,162542,162672,163056,163949,164977,164979
1,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
2,True,True,True,True,True,True,True,True,True,False,...,True,True,True,True,True,True,True,True,True,True
3,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
4,True,True,True,True,True,True,True,True,True,False,...,True,True,True,True,True,True,True,True,True,True
5,True,True,False,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
667,True,True,True,True,True,False,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
668,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
669,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
670,False,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True


In [14]:
Prueba = Y.iloc[[0,1,2,3,4], : 10]
Prueba

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
1,,,,,,,,,,
2,,,,,,,,,,4.0
3,,,,,,,,,,
4,,,,,,,,,,4.0
5,,,4.0,,,,,,,


In [15]:
Prueba.iloc[1][1] = 1
Prueba.iloc[1][4] = 5
Prueba

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
1,,,,,,,,,,
2,,1.0,,,5.0,,,,,4.0
3,,,,,,,,,,
4,,,,,,,,,,4.0
5,,,4.0,,,,,,,


In [16]:
# Fijando semilla
random.seed(10)
random_matrix = np.random.randint(1,5,(5,10))
random_matrix = pd.DataFrame(random_matrix)
random_matrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,4,1,1,3,3,3,3,3,1,1
1,2,3,1,1,2,4,1,2,2,3
2,2,1,3,2,1,2,3,2,4,4
3,4,2,4,1,1,1,4,2,2,1
4,2,1,4,2,4,4,2,4,2,2


In [18]:
Prueba_bool = Prueba.isna()
Prueba_bool

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
1,True,True,True,True,True,True,True,True,True,True
2,True,False,True,True,False,True,True,True,True,False
3,True,True,True,True,True,True,True,True,True,True
4,True,True,True,True,True,True,True,True,True,False
5,True,True,False,True,True,True,True,True,True,True


In [19]:
random_matrix.iloc[1][1]*Prueba_bool.iloc[1][1]
random_matrix.iloc[1]

0    2
1    3
2    1
3    1
4    2
5    4
6    1
7    2
8    2
9    3
Name: 1, dtype: int64

In [20]:
nvos = []
for i in range (len(random_matrix.iloc[1])):
    nvos.append([i+1,random_matrix.iloc[1][i]*Prueba_bool.iloc[1][i]])
    
nvos = pd.DataFrame(nvos)
nvos.columns = ['id_movie','rating_recom']
nvos = nvos.sort_values(by=['rating_recom'], ascending=False)
nvos

Unnamed: 0,id_movie,rating_recom
5,6,4
0,1,2
7,8,2
8,9,2
2,3,1
3,4,1
6,7,1
1,2,0
4,5,0
9,10,0


In [21]:
nvos = nvos[(nvos[['rating_recom']] != 0).all(axis=1)]
nvos

Unnamed: 0,id_movie,rating_recom
5,6,4
0,1,2
7,8,2
8,9,2
2,3,1
3,4,1
6,7,1


In [22]:
nvos['id_movie'].head(5).to_numpy()

array([6, 1, 8, 9, 3])

In [23]:
recomendaciones = nvos.iloc[[0,1,2,3,4], : 1].to_numpy()
recomendaciones

array([[6],
       [1],
       [8],
       [9],
       [3]])

In [24]:
np.concatenate(recomendaciones, axis=0 )

array([6, 1, 8, 9, 3])

In [28]:
users, movies = Y_0.shape

Tenemos una $k = 1$

In [29]:
# Fijando semilla
random.seed(0)
# Creando el vector V que son las películas al azar
V = np.random.randint(1,9,size = (1,movies))
V.shape

(1, 9125)

In [30]:
# Sacando el vector U que son los usuarios
U = np.random.randint(1,9,size = (1,users))
U.shape

(1, 671)

In [66]:
import sympy
from sympy import solve

La función objetivo:
    
$$J(X) = \frac{1}{2} \displaystyle\sum_{(a,i)\in\mathbb{D}} \left(Y_{ai}-\left [ UV^T \right ]_{ai} \right)^2 + \frac{\lambda}{2} \displaystyle\sum_{a=1}^n \displaystyle\sum_{j=1}^k U_{aj}^2 + \frac{\lambda}{2} \displaystyle\sum_{i=1}^m \displaystyle\sum_{j=1}^k V_{ij}^2$$

#### Fórmulas a aplicar

$$x_u^T = r_uY\left(Y^TY+\lambda_xI \right)^{-1}$$

$$y_i^T = r_iX\left(X^TX+\lambda_yI \right)^{-1}$$

In [115]:
Y = pd.DataFrame([[5,np.NaN,7],[1,2,np.NaN]])
Y

Unnamed: 0,0,1,2
0,5,,7.0
1,1,2.0,


In [116]:
r,c = Y.shape
print("rows: ", r, " cols: ", c)

rows:  2  cols:  3


In [117]:
# Para manejar NaN Values
Y_0 = Y.notna()
Y_0

Unnamed: 0,0,1,2
0,True,False,True
1,True,True,False


In [118]:
# Fijando semilla
# random.seed(0) -> noffunciona
# DESCOMENTAR CUANDO SEA RANDOM
# Creando el vector V que son las películas al azar
k = 2
#V = np.random.randint(1, 9, size=(k, c))
#pd.DataFrame(V)

In [120]:
# Para esta prueba
Vi = pd.DataFrame([[2,7,8]])
Vi

Unnamed: 0,0,1,2
0,2,7,8


In [123]:
V = Vi
V = V.append(V)
V = V.reset_index()
V = V.drop(['index'], axis=1)
V

Unnamed: 0,0,1,2
0,2,7,8
1,2,7,8


In [124]:
# Se coloca en 0 los valores a no multiplicar
Vaux = Y_0*V
Vaux

Unnamed: 0,0,1,2
0,2,0,8
1,2,7,0


In [125]:
def merge_list_f(lists):
    merged_list = []
    for l in lists:
        merged_list += l
    return merged_list

In [126]:
# Borrando 0
Aux = [[1],[2],[3]]
Aux = merge_list_f(Aux)
Aux

[1, 2, 3]

In [127]:
Vaux

Unnamed: 0,0,1,2
0,2,0,8
1,2,7,0


$$x_u^T = r_uY\left(Y^TY+\lambda_xI\right)^{-1}$$

In [114]:
#CREANDO VECTORES SIN NaN v1 para k = 1 PRUEBA 1 (INCOMPLETO ABAJO ESTA LA COMPLETA)
R = []
Yu = []
Aux = []
Aux_u = []
# Fijando V (Y)
for row in range(r):
    for col in range(c):
        #print("V: ",Vaux.iloc[row, col])
        if ((Vaux.iloc[row, col]) != 0):
            #print("Y: ",Y.iloc[row, col])
            Aux.append(Y.iloc[row, col])   
            Aux_u.append(Vaux.iloc[row, col])          
    
    #print("Aux: ",Aux)
    R.append(Aux)
    Yu.append(Aux_u)
    #print("r: ",R)
    Aux = []
    Aux_u = []

print("r: ",R," Yu: ",Yu)

r:  [[5, 7.0], [1, 2.0]]  Yu:  [[2, 8], [2, 7]]


### Extendiendo funcionalidad para k

In [95]:
# Para k = 1
# Para esta prueba
Vi = pd.DataFrame([[2,7,8]])
Vi

Unnamed: 0,0,1,2
0,2,7,8


In [96]:
#CREANDO VECTORES SIN NaN para k = 1
R = []
Yu = []
Aux = []
Aux_v = [] 
k, c = Vi.shape
Xu = []

In [97]:
def matrix_fac(LamI,R,Yu):
    RY = R*Yu.T
    RY = sum(RY)
    print("RY: ", RY)
    YTY = Yu.T*Yu
    YTY = sum(YTY)
    print("YTY: ", YTY)
    print("YTY+LamI: ", YTY+LamI)
    Xu.append(RY*(1/(YTY+LamI)))
    print("Xu: ",Xu)
    
    return Xu

In [106]:
print("k:",k)
# Fijando V (Y)
for row in range(r):
    for col in range(c):
        #print("V: ",Vaux.iloc[row, col])
        if ((Vaux.iloc[row, col]) != 0):
            #print("Y: ",Y.iloc[row, col])
            Aux.append(Y.iloc[row, col]) 
            print("col: ", col)
            print("V[col]: ", Vi[col].tolist())
            Aux_v.append(Vi[col].tolist())
            print("Aux_v: ", Aux_v)
    
    print("Aux: ",Aux)
    R = Aux
    if(k==1):
        Aux_v = merge_list_f(Aux_v)
    print("Aux_v: ", Aux_v)   
    Yu = np.array(Aux_v) 
    print("Yu: ",Yu)
    print("------")
    Aux = []
    Aux_v = []
    
    # Comenzamos fórmula
    RY = R*Yu.T
    RY = sum(RY)
    print("RY: ", RY)
    YTY = Yu.T*Yu
    YTY = sum(YTY)
    print("YTY: ", YTY)
    LamI = 1
    print("YTY+LamI: ", YTY+LamI)
    Xu.append(RY*(1/(YTY+LamI)))
    print("Xu: ",Xu)

k: 2
col:  0
V[col]:  [1, 4]
Aux_v:  [[1, 4]]
col:  2
V[col]:  [3, 6]
Aux_v:  [[1, 4], [3, 6]]
Aux:  [5, 7.0]
Aux_v:  [[1, 4], [3, 6]]
Yu:  [[1 4]
 [3 6]]
------
RY:  [25. 63.]
YTY:  [13 48]
YTY+LamI:  [14 49]
Xu:  [array([1.78571429, 1.28571429]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429])]
col:  0
V[col]:  [1, 4]
Aux_v:  [[1, 4]]
col:  1
V[col]:  [2, 5]
Aux_v:  [[1, 4], [2, 5]]
Aux:  [1, 2.0]
Aux_v:  [[1, 4], [2, 5]]
Yu:  [[1 4]
 [2 5]]
------
RY:  [ 5. 14.]
YTY:  [ 9 33]
YTY+LamI:  [10 34]
Xu:  [array([1.78571429, 1.28571429]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.78571429, 1.28571429]), array([0.5       , 0.41176471]), array([1.785

In [99]:
# Para k = 2
# Para esta prueba
Vi = pd.DataFrame([[1,2,3],[4,5,6]])
Vi.shape

(2, 3)

In [140]:
#CREANDO VECTORES SIN NaN para k = 1
R = []
Yu = []
Aux = []
Aux_v = [] 
k, c = Vi.shape
Xu = []

print("k:",k)
# Fijando V (Y)
for row in range(r):
    for col in range(c):
        #print("V: ",Vaux.iloc[row, col])
        if ((Vaux.iloc[row, col]) != 0):
            #print("Y: ",Y.iloc[row, col])
            Aux.append(Y.iloc[row, col]) 
            print("col: ", col)
            print("V[col]: ", Vi[col].tolist())
            Aux_v.append(Vi[col].tolist())
            print("Aux_v: ", Aux_v)
    
    print("Aux: ",Aux)
    R = Aux
    if(k==1):
        Aux_v = merge_list_f(Aux_v)
    print("Aux_v: ", Aux_v)   
    Yu = np.array(Aux_v) 
    print("Yu: ",Yu)
    print("------")
    Aux = []
    Aux_v = []
    
    # Comenzamos fórmula
    RY = R*Yu.T
    RY = sum(RY)
    print("RY: ", RY)
    YTY = Yu.T*Yu
    YTY = sum(YTY)
    print("YTY: ", YTY)
    LamI = 1
    print("YTY+LamI: ", YTY+LamI)
    Xu.append(RY*(1/(YTY+LamI)))
    print("Xu: ",Xu)

k: 1
col:  0
V[col]:  [2]
Aux_v:  [[2]]
col:  2
V[col]:  [8]
Aux_v:  [[2], [8]]
Aux:  [5, 7.0]
Aux_v:  [2, 8]
Yu:  [2 8]
------
RY:  66.0
YTY:  68
YTY+LamI:  69
Xu:  [0.9565217391304348]
col:  0
V[col]:  [2]
Aux_v:  [[2]]
col:  1
V[col]:  [7]
Aux_v:  [[2], [7]]
Aux:  [1, 2.0]
Aux_v:  [2, 7]
Yu:  [2 7]
------
RY:  16.0
YTY:  53
YTY+LamI:  54
Xu:  [0.9565217391304348, 0.2962962962962963]


In [141]:
Xu

[0.9565217391304348, 0.2962962962962963]

In [142]:
R

[1, 2.0]

In [46]:
#PENDIENTE
def costo(Yai,U,V):
    UV = U*V.T
    
    # Para U^2
    U2 = U**2
    r, k= U2shape
    
    suma_uk= []
    for a in range(r:
        for j in range(k):
           suma_uk
    

SyntaxError: invalid syntax (<ipython-input-46-9abda9758d7c>, line 10)

In [239]:
import numpy

def matrix_factorization(R, X_u, Y_i, K, steps=5000, alpha=0.0002, beta=0.02):
    '''
    R: rating matrix
    X_u: |U| * K (User features matrix)
    Y_i: |D| * K (Item features matrix)
    K: latent features
    steps: iterations
    alpha: learning rate
    beta: regularization parameter'''
    Y_i = Y_i.T

    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    # calculate error
                    eij = R[i][j] - numpy.dot(X_u[i,:],Y_i[:,j])

                    for k in range(K):
                        # calculate gradient with a and beta parameter
                        X_u[i][k] = X_u[i][k] + alpha * (2 * eij * Y_i[k][j] - beta * X_u[i][k])
                        Y_i[k][j] = Y_i[k][j] + alpha * (2 * eij * X_u[i][k] - beta * Y_i[k][j])

        eR = numpy.dot(X_u,Y_i)

        e = 0

        for i in range(len(R)):

            for j in range(len(R[i])):

                if R[i][j] > 0:

                    e = e + pow(R[i][j] - numpy.dot(X_u[i,:],Y_i[:,j]), 2)

                    for k in range(K):

                        e = e + (beta/2) * (pow(X_u[i][k],2) + pow(Y_i[k][j],2))
        # 0.001: local minimum
        if e < 0.001:

            break

    return X_u, Y_i.T

In [240]:

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
    '''
    R: rating matrix
    P: |U| * K (User features matrix)
    Q: |D| * K (Item features matrix)
    K: latent features
    steps: iterations
    alpha: learning rate
    beta: regularization parameter'''
    Q = Q.T

    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    # calculate error
                    eij = R[i][j] - numpy.dot(P[i,:],Q[:,j])

                    for k in range(K):
                        # calculate gradient with a and beta parameter
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])

        eR = numpy.dot(P,Q)

        e = 0

        for i in range(len(R)):

            for j in range(len(R[i])):

                if R[i][j] > 0:

                    e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2)

                    for k in range(K):

                        e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2))
        # 0.001: local minimum
        if e < 0.001:

            break

    return P, Q.T

In [256]:
R = [

     [5,3,0,1],

     [4,0,0,1],

     [1,1,0,5],

     [1,0,0,4],

     [0,1,5,4],
    
     [2,1,3,0],

    ]

In [257]:
R = numpy.array(Vaux)
R

array([[2, 0, 8],
       [2, 7, 0]])

In [258]:
# N: num of User
N = len(R)
# M: num of Movie
M = len(R[0])
# Num of Features
K = 3

In [259]:
R

array([[2, 0, 8],
       [2, 7, 0]])

In [260]:
N

2

In [261]:
M

3

In [262]:
K

3

In [263]:
P = numpy.random.rand(N,K)
Q = numpy.random.rand(M,K)

In [264]:
P

array([[0.86516585, 0.77237763, 0.00971461],
       [0.57301982, 0.66846116, 0.63967451]])

In [265]:
Q

array([[0.26845103, 0.94326522, 0.55720278],
       [0.85181374, 0.84351898, 0.46394568],
       [0.07900452, 0.82508552, 0.13731353]])

In [266]:
nP, nQ = matrix_factorization(R, P, Q, K)

In [267]:
nR = numpy.dot(nP, nQ.T)

In [268]:
nR

array([[2.00023742, 7.00336383, 7.98503721],
       [1.99252525, 6.9862109 , 6.42507528]])

In [254]:
nP

array([[2.04046265, 1.37290485, 1.13810573],
       [1.08204177, 1.82129759, 1.66677923]])

In [255]:
nQ

array([[0.49556667, 0.36048201, 0.46142083],
       [1.2942345 , 1.70805964, 1.487554  ],
       [2.28661161, 1.35692954, 1.27566496]])