# Sistema de Recomendación de Películas

#### Crea la sesión

In [0]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('recPel').getOrCreate()

#### Filtrado Colaborativo en Spark

Collaborative filtering is commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. spark.ml currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. spark.ml uses the alternating least squares (ALS) algorithm to learn these latent factors. The implementation in spark.ml has the following parameters:

- numBlocks is the number of blocks the users and items will be partitioned into in order to parallelize computation (defaults to 10, -1 autoconfigura).
- rank is the number of latent factors in the model (defaults to 10).
- maxIter is the maximum number of iterations to run (defaults to 10).
- regParam specifies the regularization parameter in ALS (defaults to 1.0).
- implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).
- alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0).
- nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false).

Note: The DataFrame-based API for ALS currently only supports integers for user and item ids. Other numeric types are supported for the user and item id columns, but the ids must be within the integer value range.

In [0]:
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator

#### Cargar los datos

In [0]:
ruta = 'dbfs:/FileStore/shared_uploads/jgamarramoreno@gmail.com/movielens_ratings.csv'

In [0]:
datos = spark.read.csv(ruta,inferSchema=True,header=True)

In [0]:
datos.head()

Out[5]: Row(movieId=2, rating=3.0, userId=0)

In [0]:
datos.describe().show()

+-------+------------------+------------------+------------------+
|summary|           movieId|            rating|            userId|
+-------+------------------+------------------+------------------+
|  count|              1501|              1501|              1501|
|   mean| 49.40572951365756|1.7741505662891406|14.383744170552964|
| stddev|28.937034065088994| 1.187276166124803| 8.591040424293272|
|    min|                 0|               1.0|                 0|
|    max|                99|               5.0|                29|
+-------+------------------+------------------+------------------+



#### Dividir en conjunto de entrenamiento y prueba

In [0]:
(datos_ent,datos_pru) = datos.randomSplit([0.8,0.2])

#### Construir el modelo de recomendación usando ALS

In [0]:
als = ALS(userCol='userId',itemCol='movieId',ratingCol='rating')

#### Entrenamiento del modelo

In [0]:
modelo = als.fit(datos_ent)

#### Evaluar el modelo

In [0]:
predicciones = modelo.transform(datos_pru)

In [0]:
predicciones.show()

+-------+------+------+----------+
|movieId|rating|userId|prediction|
+-------+------+------+----------+
|      3|   1.0|     0| 0.8710466|
|      4|   3.0|     2| 1.6662902|
|      4|   1.0|     5|  1.306968|
|      5|   1.0|     5| 1.1921194|
|      0|   1.0|     6|  1.388368|
|      2|   2.0|     7| 2.6569526|
|      3|   2.0|     8| 1.8246938|
|      4|   2.0|     8|0.66218996|
|      5|   1.0|     9|0.31576347|
|      0|   1.0|    13|0.71733046|
|      4|   2.0|    13|0.72928184|
|      1|   1.0|    14| 0.9383959|
|      4|   1.0|    14|0.32446116|
|      5|   1.0|    14| 1.1420451|
|      2|   1.0|    15| 1.3487983|
|      1|   1.0|    18| 2.2683802|
|      4|   3.0|    18| 1.2624813|
|      2|   1.0|    19| 1.0199541|
|      1|   1.0|    20| 1.5190728|
|      2|   1.0|    23| 1.3084271|
+-------+------+------+----------+
only showing top 20 rows



In [0]:
evaluador01 = RegressionEvaluator(metricName='rmse',labelCol='rating',
                                  predictionCol='prediction')

In [0]:
evaluador02 = RegressionEvaluator(metricName='r2',labelCol='rating',
                                  predictionCol='prediction')

In [0]:
rmse = evaluador01.evaluate(predicciones)
r2 = evaluador02.evaluate(predicciones)

In [0]:
print("Raíz del error al cuadrado promedio (rmse) = ",rmse)
print("r cuadrado = ",r2)

Raíz del error al cuadrado promedio (rmse) =  0.9725317073465098
r cuadrado =  0.2781094094290224


#### Ejemplo de Recomendación a un usuario

In [0]:
usuario_11 = datos_pru.filter(datos_pru['userId']==11).select(['movieId','userId'])

In [0]:
usuario_11.show()

+-------+------+
|movieId|userId|
+-------+------+
|     10|    11|
|     11|    11|
|     19|    11|
|     38|    11|
|     40|    11|
|     50|    11|
+-------+------+



In [0]:
recomendaciones = modelo.transform(usuario_11)

In [0]:
recomendaciones.orderBy('prediction',ascending=False).show()

+-------+------+----------+
|movieId|userId|prediction|
+-------+------+----------+
|     10|    11| 2.7570848|
|     50|    11| 2.4578009|
|     19|    11|  2.411734|
|     38|    11| 2.2508454|
|     40|    11|   1.16025|
|     11|    11| 1.0556972|
+-------+------+----------+

