# **Sistema de recomendaciones**

Proyecto desarrollado en Python usando las librerías Scikit learn, Pandas y los datos de películas obtenidos desde MovieLens 100k, para un sistema de recomendaciones basadas en similitud de usuarios, en similitud item-item (similitud en la película), en contenido así como combinados con popularidad de las películas.

In [1]:
!pip install pandas scikit-learn scipy numpy



# **Recomendaciones basadas en similitud de usuarios**

In [2]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
# Crear función para generar recomendaciones a un usuario
def recommend_movies(user_id, ratings_matrix, similarity_matrix, top_n=5):
    # Películas ya vistas por el usuario
    user_ratings = ratings_matrix.loc[user_id]
    unseen_movies = user_ratings[user_ratings.isna()].index

    # Ponderación: promedio ponderado por similitud de usuarios
    sim_scores = similarity_matrix.loc[user_id]

    # Calcular el score estimado para cada película no vista
    scores = {}
    for movie in unseen_movies:
        # Tomar ratings de otros usuarios para esta película
        movie_ratings = ratings_matrix[movie]

        # Filtrar solo usuarios que hayan calificado
        mask = movie_ratings.notna()
        relevant_ratings = movie_ratings[mask]
        relevant_sim = sim_scores[mask]

        # Promedio ponderado
        if relevant_sim.sum() > 0:
            score = (relevant_ratings * relevant_sim).sum() / relevant_sim.sum()
            scores[movie] = score

    # Ordenar y recomendar top N
    recommended = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]

    print(f"Recomendaciones para el usuario {user_id}:")
    for movie_id, score in recommended:
        print(f"{movie_dict.get(movie_id, 'Desconocido')} - Estimación: {score:.2f}")

In [4]:
# Crear función para generar recomendaciones a usuarios nuevos
def recommend_for_new_user(ratings_df, movie_dict, top_n=5, min_ratings=50):
    # Calcular promedio y número de ratings por película
    movie_stats = ratings_df.groupby('movie_id')['rating'].agg(['mean', 'count'])

    # Filtrar películas con suficientes ratings
    popular_movies = movie_stats[movie_stats['count'] >= min_ratings]

    # Ordenar por promedio
    top_movies = popular_movies.sort_values(by='mean', ascending=False).head(top_n)

    # Mostrar resultados
    recommendations = []
    for movie_id, row in top_movies.iterrows():
        recommendations.append((movie_dict.get(movie_id, "Desconocido"), row['mean']))

    print("Recomendaciones para usuario nuevo:")
    for title, avg_rating in recommendations:
        print(f"{title} - Promedio: {avg_rating:.2f}")

In [17]:
# Crear función para generar recomendaciones combinando similitud de usuarios y popularidad de la pelicula
def hybrid_recommendation(user_id, ratings_matrix, similarity_matrix, movie_stats, movie_dict, top_n=5, min_votes=20, alpha=0.7):
    user_ratings = ratings_matrix.loc[user_id]
    unseen_movies = user_ratings[user_ratings.isna()].index

    sim_scores = similarity_matrix.loc[user_id]

    scores = {}
    for movie in unseen_movies:
        movie_ratings = ratings_matrix[movie]
        mask = movie_ratings.notna()

        if mask.sum() == 0:
            continue

        relevant_ratings = movie_ratings[mask]
        relevant_sim = sim_scores[mask]

        if relevant_sim.sum() > 0:
            collab_score = (relevant_ratings * relevant_sim).sum() / relevant_sim.sum()
        else:
            collab_score = 0

        # Popularidad (ajustada por cantidad de votos)
        if movie in movie_stats.index and movie_stats.loc[movie, 'count'] >= min_votes:
            pop_score = movie_stats.loc[movie, 'mean']
        else:
            pop_score = 0

        # Combinar ambas
        final_score = alpha * collab_score + (1 - alpha) * pop_score
        scores[movie] = final_score

    # Ordenar y mostrar top N
    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
    print(f"Recomendaciones híbridas para el usuario {user_id}:")
    for movie_id, score in ranked:
        print(f"{movie_dict.get(movie_id, 'Desconocido')} - Puntuación combinada: {score:.2f}")

In [5]:
# Cargar ratings
url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.data"
columns = ['user_id', 'movie_id', 'rating', 'timestamp']
df = pd.read_csv(url, sep='\t', names=columns)
df.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [6]:
# Crear matriz usuario–película (utilidad)
ratings_matrix = df.pivot_table(index='user_id', columns='movie_id', values='rating')

# Llenar NaNs con 0
ratings_matrix_filled = ratings_matrix.fillna(0)

In [7]:
ratings_matrix_filled.head()

movie_id,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,5.0,4.0,1.0,5.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# Calcular la similitud entre usuarios con coseno
user_similarity = cosine_similarity(ratings_matrix_filled)

# Convertimos en DataFrame
user_similarity_df = pd.DataFrame(user_similarity,
                                   index=ratings_matrix.index,
                                   columns=ratings_matrix.index)

In [9]:
user_similarity_df.head()

user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.166931,0.04746,0.064358,0.378475,0.430239,0.440367,0.319072,0.078138,0.376544,...,0.369527,0.119482,0.274876,0.189705,0.197326,0.118095,0.314072,0.148617,0.179508,0.398175
2,0.166931,1.0,0.110591,0.178121,0.072979,0.245843,0.107328,0.103344,0.161048,0.159862,...,0.156986,0.307942,0.358789,0.424046,0.319889,0.228583,0.22679,0.161485,0.172268,0.105798
3,0.04746,0.110591,1.0,0.344151,0.021245,0.072415,0.066137,0.08306,0.06104,0.065151,...,0.031875,0.042753,0.163829,0.069038,0.124245,0.026271,0.16189,0.101243,0.133416,0.026556
4,0.064358,0.178121,0.344151,1.0,0.031804,0.068044,0.09123,0.18806,0.101284,0.060859,...,0.052107,0.036784,0.133115,0.193471,0.146058,0.030138,0.196858,0.152041,0.170086,0.058752
5,0.378475,0.072979,0.021245,0.031804,1.0,0.237286,0.3736,0.24893,0.056847,0.201427,...,0.338794,0.08058,0.094924,0.079779,0.148607,0.071459,0.239955,0.139595,0.152497,0.313941


In [10]:
# Cargar nombres de películas
movies = pd.read_csv(
    "http://files.grouplens.org/datasets/movielens/ml-100k/u.item",
    sep='|', encoding='latin-1', header=None, usecols=[0, 1], names=['movie_id', 'title']
)
movie_dict = dict(zip(movies.movie_id, movies.title))

In [11]:
# Recomendaciones para el usuario 50
recommend_movies(user_id=50, ratings_matrix=ratings_matrix, similarity_matrix=user_similarity_df)

Recomendaciones para el usuario 50:
Great Day in Harlem, A (1994) - Estimación: 5.00
They Made Me a Criminal (1939) - Estimación: 5.00
Prefontaine (1997) - Estimación: 5.00
Marlene Dietrich: Shadow and Light (1996)  - Estimación: 5.00
Star Kid (1997) - Estimación: 5.00


In [12]:
# Recomendaciones para el usuario 5
recommend_movies(user_id=5, ratings_matrix=ratings_matrix, similarity_matrix=user_similarity_df)

Recomendaciones para el usuario 5:
Saint of Fort Washington, The (1993) - Estimación: 5.00
Great Day in Harlem, A (1994) - Estimación: 5.00
They Made Me a Criminal (1939) - Estimación: 5.00
Marlene Dietrich: Shadow and Light (1996)  - Estimación: 5.00
Star Kid (1997) - Estimación: 5.00


In [13]:
# Recomendaciones para un usuario nuevo (sin historial)
recommend_for_new_user(df, movie_dict, top_n=5)

Recomendaciones para usuario nuevo:
Close Shave, A (1995) - Promedio: 4.49
Schindler's List (1993) - Promedio: 4.47
Wrong Trousers, The (1993) - Promedio: 4.47
Casablanca (1942) - Promedio: 4.46
Wallace & Gromit: The Best of Aardman Animation (1996) - Promedio: 4.45


In [14]:
# Calcular promedio de calificaciones por película
movie_avg = df.groupby('movie_id')['rating'].mean().sort_values(ascending=False).head(10)

# Mostrar recomendaciones por popularidad
print("Recomendaciones por popularidad (top promedio):")
for item_id, avg_rating in movie_avg.items():
    movie_name = movie_dict.get(item_id, "Desconocido")
    print(f"{movie_name} - Promedio: {avg_rating:.2f}")

Recomendaciones por popularidad (top promedio):
Prefontaine (1997) - Promedio: 5.00
Santa with Muscles (1996) - Promedio: 5.00
Marlene Dietrich: Shadow and Light (1996)  - Promedio: 5.00
Star Kid (1997) - Promedio: 5.00
Entertaining Angels: The Dorothy Day Story (1996) - Promedio: 5.00
They Made Me a Criminal (1939) - Promedio: 5.00
Great Day in Harlem, A (1994) - Promedio: 5.00
Saint of Fort Washington, The (1993) - Promedio: 5.00
Aiqing wansui (1994) - Promedio: 5.00
Someone Else's America (1995) - Promedio: 5.00


In [15]:
# Popularidad de películas
movie_stats = df.groupby('movie_id')['rating'].agg(['mean', 'count'])

In [21]:
# Recomendaciones para el usuario = 50
hybrid_recommendation(
    user_id=50,
    ratings_matrix=ratings_matrix,
    similarity_matrix=user_similarity_df,
    movie_stats=movie_stats,
    movie_dict=movie_dict,
    top_n=10,
    min_votes=30,
    alpha=0.7  # 70% colaborativo, 30% popularidad
)

Recomendaciones híbridas para el usuario 50:
Wallace & Gromit: The Best of Aardman Animation (1996) - Puntuación combinada: 4.49
Shawshank Redemption, The (1994) - Puntuación combinada: 4.46
Schindler's List (1993) - Puntuación combinada: 4.46
Wrong Trousers, The (1993) - Puntuación combinada: 4.45
Casablanca (1942) - Puntuación combinada: 4.45
Rear Window (1954) - Puntuación combinada: 4.40
Usual Suspects, The (1995) - Puntuación combinada: 4.39
12 Angry Men (1957) - Puntuación combinada: 4.37
Close Shave, A (1995) - Puntuación combinada: 4.36
Godfather, The (1972) - Puntuación combinada: 4.36


In [22]:
# Recomendaciones para el usuario = 100
hybrid_recommendation(
    user_id=100,
    ratings_matrix=ratings_matrix,
    similarity_matrix=user_similarity_df,
    movie_stats=movie_stats,
    movie_dict=movie_dict,
    top_n=10,
    min_votes=30,
    alpha=0.7  # 70% colaborativo, 30% popularidad
)

Recomendaciones híbridas para el usuario 100:
Schindler's List (1993) - Puntuación combinada: 4.48
Wallace & Gromit: The Best of Aardman Animation (1996) - Puntuación combinada: 4.46
Shawshank Redemption, The (1994) - Puntuación combinada: 4.45
Wrong Trousers, The (1993) - Puntuación combinada: 4.45
Casablanca (1942) - Puntuación combinada: 4.44
Close Shave, A (1995) - Puntuación combinada: 4.43
Usual Suspects, The (1995) - Puntuación combinada: 4.42
Star Wars (1977) - Puntuación combinada: 4.39
Rear Window (1954) - Puntuación combinada: 4.39
Third Man, The (1949) - Puntuación combinada: 4.37


# **Recomendaciones basadas en similitud item-item**

In [23]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

In [24]:
# Crear función para generar recomendaciones según peliculas similares
def recommend_similar_items(movie_id, similarity_matrix, movie_dict, top_n=5):
    if movie_id not in similarity_matrix:
        print("Película no encontrada.")
        return

    # Obtener los scores de similitud
    sim_scores = similarity_matrix[movie_id].drop(movie_id)

    # Seleccionar top N más similares
    top_similar = sim_scores.sort_values(ascending=False).head(top_n)

    print(f"Películas similares a '{movie_dict.get(movie_id, 'Desconocida')}'")
    for sim_id, score in top_similar.items():
        print(f"{movie_dict.get(sim_id, 'Desconocida')} - Similitud: {score:.2f}")

In [38]:
# Crear función para generar recomendaciones combinadas (similitud item-item y popularidad)
def recommend_for_user(user_id, ratings_matrix, item_similarity_df, popular_movies, top_n=10, alpha=0.7):
    user_ratings = ratings_matrix.loc[user_id]
    rated_items = user_ratings[user_ratings > 0].index.tolist()

    # Similitud: sumatoria ponderada de similitudes por ratings del usuario
    sim_scores = pd.Series(0, index=ratings_matrix.columns, dtype=float)

    for item in rated_items:
        sim_scores += item_similarity_df[item] * user_ratings[item]

    # Remover ítems ya vistos
    sim_scores[rated_items] = 0

    # Normalizar similitud
    sim_scores = sim_scores / sim_scores.max()

    # Popularidad
    pop_scores = popular_movies.set_index('movie_id')['popularity_score']
    pop_scores = pop_scores.reindex(ratings_matrix.columns).fillna(0)
    pop_scores = pop_scores / pop_scores.max()

    # Combinar scores
    final_scores = (alpha * sim_scores + (1 - alpha) * pop_scores) * 10

    # Obtener recomendaciones
    recommended_items = final_scores.sort_values(ascending=False).head(top_n).index
    return movies[movies['movie_id'].isin(recommended_items)].merge(
        final_scores.rename("score"), left_on='movie_id', right_index=True
    ).sort_values(by='score', ascending=False)

In [25]:
# Cargar datos
ratings_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.data"
columns = ['user_id', 'movie_id', 'rating', 'timestamp']
df = pd.read_csv(ratings_url, sep='\t', names=columns)

In [26]:
# Cargar nombres de películas
movies_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.item"
movies_df = pd.read_csv(
    movies_url,
    sep='|',
    encoding='latin-1',
    header=None,
    usecols=[0, 1],
    names=['movie_id', 'title']
)
movie_dict = dict(zip(movies_df.movie_id, movies_df.title))

In [27]:
# Crear matriz usuario–película
ratings_matrix = df.pivot_table(index='user_id', columns='movie_id', values='rating')
ratings_matrix_filled = ratings_matrix.fillna(0)

In [28]:
# Transponer para tener películas como filas
item_matrix = ratings_matrix_filled.T  # Cada fila es una película
item_matrix.head()

user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,4.0,0.0,0.0,4.0,4.0,0.0,0.0,0.0,4.0,...,2.0,3.0,4.0,0.0,4.0,0.0,0.0,5.0,0.0,0.0
2,3.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
3,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,3.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,4.0,...,5.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
5,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
# Calcular similitud entre ítems
item_similarity = cosine_similarity(item_matrix)
item_similarity_df = pd.DataFrame(item_similarity, index=item_matrix.index, columns=item_matrix.index)

In [30]:
item_similarity_df.head()

movie_id,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.402382,0.330245,0.454938,0.286714,0.116344,0.620979,0.481114,0.496288,0.273935,...,0.035387,0.0,0.0,0.0,0.035387,0.0,0.0,0.0,0.047183,0.047183
2,0.402382,1.0,0.273069,0.502571,0.318836,0.083563,0.383403,0.337002,0.255252,0.171082,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078299,0.078299
3,0.330245,0.273069,1.0,0.324866,0.212957,0.106722,0.372921,0.200794,0.273669,0.158104,...,0.0,0.0,0.0,0.0,0.032292,0.0,0.0,0.0,0.0,0.096875
4,0.454938,0.502571,0.324866,1.0,0.334239,0.090308,0.489283,0.490236,0.419044,0.252561,...,0.0,0.0,0.094022,0.094022,0.037609,0.0,0.0,0.0,0.056413,0.075218
5,0.286714,0.318836,0.212957,0.334239,1.0,0.037299,0.334769,0.259161,0.272448,0.055453,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.094211


In [31]:
# Recomendar películas similares a "Star Wars (1977)" (ID 50 en MovieLens 100k)
recommend_similar_items(movie_id=50, similarity_matrix=item_similarity_df, movie_dict=movie_dict, top_n=5)

Películas similares a 'Star Wars (1977)'
Return of the Jedi (1983) - Similitud: 0.88
Raiders of the Lost Ark (1981) - Similitud: 0.76
Empire Strikes Back, The (1980) - Similitud: 0.75
Toy Story (1995) - Similitud: 0.73
Godfather, The (1972) - Similitud: 0.70


In [32]:
# Recomendar películas similares a "Indiana Jones and the Last Crusade (1989)" (ID 210 en MovieLens 100k)
recommend_similar_items(movie_id=210, similarity_matrix=item_similarity_df, movie_dict=movie_dict, top_n=5)

Películas similares a 'Indiana Jones and the Last Crusade (1989)'
Empire Strikes Back, The (1980) - Similitud: 0.78
Raiders of the Lost Ark (1981) - Similitud: 0.77
Back to the Future (1985) - Similitud: 0.77
Terminator, The (1984) - Similitud: 0.71
Jurassic Park (1993) - Similitud: 0.70


In [33]:
# Calcular las películas más populares (por promedio de rating)
popular_movies = (
    df.groupby('movie_id')
    .agg({'rating': ['count', 'mean']})
    .reset_index()
)
popular_movies.columns = ['movie_id', 'rating_count', 'rating_mean']

In [34]:
# Recomendaciones iniciales para un nuevo usuario (por popularidad)
recommended_items = popular_movies.sort_values(
    by=['rating_count', 'rating_mean'], ascending=False
).head(10)

In [35]:
# Mostrar las recomendaciones
print("Recomendaciones para un usuario nuevo:")
for index, row in recommended_items.iterrows():
    movie_name = movie_dict.get(row['movie_id'], "Desconocido")
    print(f"{movie_name} - Promedio: {row['rating_mean']:.2f}")

Recomendaciones para un usuario nuevo:
Star Wars (1977) - Promedio: 4.36
Contact (1997) - Promedio: 3.80
Fargo (1996) - Promedio: 4.16
Return of the Jedi (1983) - Promedio: 4.01
Liar Liar (1997) - Promedio: 3.16
English Patient, The (1996) - Promedio: 3.66
Scream (1996) - Promedio: 3.44
Toy Story (1995) - Promedio: 3.88
Air Force One (1997) - Promedio: 3.63
Independence Day (ID4) (1996) - Promedio: 3.44


In [36]:
# Calcular popularidad (por rating promedio y cantidad)
popular_movies['popularity_score'] = (
    0.5 * popular_movies['rating_mean'] + 0.5 * (popular_movies['rating_count'] / popular_movies['rating_count'].max())
)

In [39]:
# Generar recomendaciones para el usuario 42
recommendations = recommend_for_user(
    user_id=42,
    ratings_matrix=ratings_matrix,
    item_similarity_df=item_similarity_df,
    popular_movies=popular_movies,
    top_n=10,
    alpha=0.7
)
for index, row in recommendations.iterrows():
    movie_name = movie_dict.get(row['movie_id'], "Desconocido")
    print(f"{movie_name} - Score: {row['score']:.2f}")

Pulp Fiction (1994) - Score: 9.65
Braveheart (1995) - Score: 9.60
Amadeus (1984) - Score: 9.23
Blade Runner (1982) - Score: 9.19
Blues Brothers, The (1980) - Score: 9.16
Mrs. Doubtfire (1993) - Score: 9.04
Die Hard (1988) - Score: 8.98
Fargo (1996) - Score: 8.95
Raising Arizona (1987) - Score: 8.82
Fish Called Wanda, A (1988) - Score: 8.80


In [40]:
# Generar recomendaciones para el usuario 5
recommendations = recommend_for_user(
    user_id=5,
    ratings_matrix=ratings_matrix,
    item_similarity_df=item_similarity_df,
    popular_movies=popular_movies,
    top_n=10,
    alpha=0.7
)
for index, row in recommendations.iterrows():
    movie_name = movie_dict.get(row['movie_id'], "Desconocido")
    print(f"{movie_name} - Score: {row['score']:.2f}")

Terminator, The (1984) - Score: 9.49
Pulp Fiction (1994) - Score: 9.33
Terminator 2: Judgment Day (1991) - Score: 9.33
Jurassic Park (1993) - Score: 9.28
Groundhog Day (1993) - Score: 9.20
Apollo 13 (1995) - Score: 9.03
Braveheart (1995) - Score: 8.97
Speed (1994) - Score: 8.84
Dead Poets Society (1989) - Score: 8.80
Shawshank Redemption, The (1994) - Score: 8.77


# **Matrix Factorization con TruncatedSVD**

In [42]:
from sklearn.decomposition import TruncatedSVD
import numpy as np

In [47]:
# Crear función para generar recomendaciones con base en SVD
def recommend_svd(user_id, predicted_ratings, movie_dict, ratings_matrix, top_n=5):
    user_row = predicted_ratings.loc[user_id]
    already_seen = ratings_matrix.loc[user_id][ratings_matrix.loc[user_id].notna()].index
    recommendations = user_row.drop(index=already_seen).sort_values(ascending=False).head(top_n)

    print(f"Recomendaciones SVD para el usuario {user_id}:")
    for movie_id, score in recommendations.items():
        print(f"{movie_dict.get(movie_id, 'Desconocido')} - Predicción: {score:.2f}")

In [43]:
# Preparar matriz
ratings_filled = ratings_matrix.fillna(0)

In [44]:
# SVD para factorizar (reduce a 20 dimensiones latentes)
svd = TruncatedSVD(n_components=20, random_state=42)
latent_matrix = svd.fit_transform(ratings_filled)

In [45]:
# Reconstruir matriz aproximada de ratings
approx_ratings = np.dot(latent_matrix, svd.components_)

In [46]:
# Crear DataFrame para fácil acceso
predicted_ratings = pd.DataFrame(approx_ratings, index=ratings_filled.index, columns=ratings_filled.columns)

In [85]:
predicted_ratings.head()

movie_id,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.228012,2.096937,1.276615,3.139637,0.551456,0.568562,4.419147,2.79892,3.134902,2.193979,...,-0.024103,0.008582,0.01456,0.009707,0.022546,-0.001981,-0.005944,-0.003963,0.031395,0.074819
2,2.024309,-0.008309,0.033837,0.277159,-0.008177,0.341411,1.627962,0.440471,2.541393,0.619027,...,0.000615,-0.021051,-0.008494,-0.005663,-0.00052,0.004875,0.014625,0.00975,-0.004853,-0.02822
3,-0.122395,-0.063842,0.169859,-0.205504,-0.097037,0.016595,-0.293257,-0.073747,-0.435728,0.096075,...,0.004596,-0.010258,0.023943,0.015962,-0.002083,0.011123,0.03337,0.022247,0.003056,0.002247
4,0.44912,-0.178459,0.092678,-0.07323,0.041396,-0.005179,0.338694,-0.103576,-0.099695,-0.169013,...,0.001569,-0.008776,-0.007678,-0.005119,-0.002431,0.005236,0.015708,0.010472,-0.002409,0.000591
5,3.697199,1.322204,0.353221,1.524847,0.507998,-0.143364,2.799689,1.310824,-0.428948,0.384931,...,-0.012855,0.003692,-0.032325,-0.02155,-0.014456,-0.001497,-0.00449,-0.002993,-0.000667,-0.015408


In [49]:
# Recomendaciones para el usuario = 50
recommend_svd(user_id=50, predicted_ratings=predicted_ratings, movie_dict=movie_dict, ratings_matrix=ratings_matrix, top_n=10)

Recomendaciones SVD para el usuario 50:
Godfather, The (1972) - Predicción: 1.31
Swingers (1996) - Predicción: 1.02
Welcome to the Dollhouse (1995) - Predicción: 1.01
Bound (1996) - Predicción: 1.00
Twelve Monkeys (1995) - Predicción: 0.96
Secrets & Lies (1996) - Predicción: 0.94
Full Monty, The (1997) - Predicción: 0.91
Boogie Nights (1997) - Predicción: 0.88
Heat (1995) - Predicción: 0.87
Postino, Il (1994) - Predicción: 0.81


In [50]:
# Recomendaciones para el usuario = 100
recommend_svd(user_id=100, predicted_ratings=predicted_ratings, movie_dict=movie_dict, ratings_matrix=ratings_matrix, top_n=10)

Recomendaciones SVD para el usuario 100:
Devil's Advocate, The (1997) - Predicción: 2.66
In & Out (1997) - Predicción: 2.43
Saint, The (1997) - Predicción: 2.00
Kiss the Girls (1997) - Predicción: 1.92
Devil's Own, The (1997) - Predicción: 1.84
Edge, The (1997) - Predicción: 1.73
Cop Land (1997) - Predicción: 1.72
Ulee's Gold (1997) - Predicción: 1.72
Midnight in the Garden of Good and Evil (1997) - Predicción: 1.52
Murder at 1600 (1997) - Predicción: 1.46


# **Sistema Content-Based**

In [51]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [57]:
# Crear función para generar recomendaciones basadas en contenido
def content_based_multi_favorite(favorite_titles, cosine_sim, movies_df, indices, top_n=5):
    valid_indices = []
    for title in favorite_titles:
        idx = indices.get(title)
        if idx is not None:
            valid_indices.append(idx)
        else:
            print(f"No se encontró la película: {title}")

    if not valid_indices:
        print("No se encontraron películas válidas.")
        return

    # Promedio de similitudes con todas las favoritas
    sim_scores = cosine_sim[valid_indices].mean(axis=0)

    # Excluir las favoritas
    for idx in valid_indices:
        sim_scores[idx] = 0

    # Top N recomendaciones
    top_indices = sim_scores.argsort()[::-1][:top_n]

    print(f"Recomendaciones basadas en tus favoritas:")
    for idx in top_indices:
        print(f"{movies_df.iloc[idx]['title']} - Similitud: {sim_scores[idx]:.2f}")

In [52]:
# Cargar datos
movie_cols = [
    'movie_id', 'title', 'release_date', 'video_release_date', 'IMDb_URL',
    'unknown', 'Action', 'Adventure', 'Animation', "Children's", 'Comedy', 'Crime',
    'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical',
    'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western'
]

movies_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.item"
movies_df = pd.read_csv(
    movies_url, sep='|', encoding='latin-1',
    header=None, names=movie_cols, usecols=range(24)
)

In [53]:
# Crear columna de géneros y contenido
genre_cols = movie_cols[5:]
movies_df['genres'] = movies_df[genre_cols].apply(lambda row: ' '.join(row.index[row == 1]), axis=1)
movies_df['content'] = movies_df['title'] + ' ' + movies_df['genres']

In [54]:
# Vectorizar contenido
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies_df['content'])

In [55]:
# Calcular similitud de coseno
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [56]:
# Índices mapeados
indices = pd.Series(movies_df.index, index=movies_df['title'])

In [58]:
# Recomendaciones según las peliculas favoritas
favorite_movies = ["Star Wars (1977)", "Alien (1979)", "Blade Runner (1982)"]
content_based_multi_favorite(favorite_movies, cosine_sim, movies_df, indices, top_n=10)

Recomendaciones basadas en tus favoritas:
Alien 3 (1992) - Similitud: 0.39
Alien: Resurrection (1997) - Similitud: 0.33
Escape from L.A. (1996) - Similitud: 0.29
Star Trek: The Motion Picture (1979) - Similitud: 0.28
Face/Off (1997) - Similitud: 0.27
Spawn (1997) - Similitud: 0.27
No Escape (1994) - Similitud: 0.26
Star Trek: First Contact (1996) - Similitud: 0.26
Stalker (1979) - Similitud: 0.26
Star Trek: The Wrath of Khan (1982) - Similitud: 0.26


In [59]:
# Recomendaciones según las peliculas favoritas
favorite_movies = ["Terminator 2: Judgment Day (1991)", "Indiana Jones and the Last Crusade (1989)"]
content_based_multi_favorite(favorite_movies, cosine_sim, movies_df, indices, top_n=10)

Recomendaciones basadas en tus favoritas:
Terminator, The (1984) - Similitud: 0.31
Abyss, The (1989) - Similitud: 0.27
Judgment Night (1993) - Similitud: 0.23
Escape from L.A. (1996) - Similitud: 0.22
Spawn (1997) - Similitud: 0.20
Batman (1989) - Similitud: 0.19
Star Trek V: The Final Frontier (1989) - Similitud: 0.19
Face/Off (1997) - Similitud: 0.18
Independence Day (ID4) (1996) - Similitud: 0.18
Stargate (1994) - Similitud: 0.18


# **Filtro temporal**

In [60]:
import pandas as pd
from datetime import datetime

In [66]:
# Crear función para generar recomendaciones según un rango de años
def recommend_by_time_popularity(popularity_df, year_from=1995, year_to=1997, min_votes=50, top_n=10):
    # Filtrar por año y número mínimo de votos
    filtered = popularity_df[
        (popularity_df['year'] >= year_from) &
        (popularity_df['year'] <= year_to) &
        (popularity_df['count'] >= min_votes)
    ]

    # Ordenar por promedio de rating
    top_movies = filtered.sort_values(by='mean', ascending=False).head(top_n)

    print(f"Recomendaciones populares entre {year_from} y {year_to}:")
    for _, row in top_movies.iterrows():
        print(f"{row['title']} ({int(row['year'])}) - Promedio: {row['mean']:.2f} - Votos: {row['count']}")

In [61]:
# Cargar ratings
ratings_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.data"
columns = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings_df = pd.read_csv(ratings_url, sep='\t', names=columns)

In [62]:
# Cargar películas
movies_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.item"
movie_cols = ['movie_id', 'title', 'release_date']
movies_df = pd.read_csv(
    movies_url, sep='|', encoding='latin-1',
    header=None, usecols=[0, 1, 2], names=movie_cols
)

In [63]:
# Convertir a datetime
movies_df['release_date'] = pd.to_datetime(movies_df['release_date'], errors='coerce')

In [64]:
# Unir ratings con películas
merged_df = ratings_df.merge(movies_df, on='movie_id')

# Agregar año
merged_df['year'] = merged_df['release_date'].dt.year

In [83]:
merged_df.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,title,release_date,video_release_date,IMDb_URL,unknown,Action,...,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,year
0,196,242,3,881250949,Kolya (1996),1997-01-24,,http://us.imdb.com/M/title-exact?Kolya%20(1996),0,0,...,0,0,0,0,0,0,0,0,0,1997.0
1,186,302,3,891717742,L.A. Confidential (1997),1997-01-01,,http://us.imdb.com/M/title-exact?L%2EA%2E+Conf...,0,0,...,1,0,0,1,0,0,1,0,0,1997.0
2,22,377,1,878887116,Heavyweights (1994),1994-01-01,,http://us.imdb.com/M/title-exact?Heavyweights%...,0,0,...,0,0,0,0,0,0,0,0,0,1994.0
3,244,51,2,880606923,Legends of the Fall (1994),1994-01-01,,http://us.imdb.com/M/title-exact?Legends%20of%...,0,0,...,0,0,0,0,1,0,0,1,1,1994.0
4,166,346,1,886397596,Jackie Brown (1997),1997-01-01,,http://us.imdb.com/M/title-exact?imdb-title-11...,0,0,...,0,0,0,0,0,0,0,0,0,1997.0


In [65]:
# Agregar estadísticas por película
popularity_df = merged_df.groupby(['movie_id', 'title', 'year'])['rating'].agg(['mean', 'count']).reset_index()

In [67]:
# Recomendaciones entre 1995 y 1997
recommend_by_time_popularity(popularity_df, year_from=1995, year_to=1997, min_votes=50, top_n=10)

Recomendaciones populares entre 1995 y 1997:
Close Shave, A (1995) (1996) - Promedio: 4.49 - Votos: 112
Wallace & Gromit: The Best of Aardman Animation (1996) (1996) - Promedio: 4.45 - Votos: 67
Usual Suspects, The (1995) (1995) - Promedio: 4.39 - Votos: 267
Secrets & Lies (1996) (1996) - Promedio: 4.27 - Votos: 162
Good Will Hunting (1997) (1997) - Promedio: 4.26 - Votos: 198
Titanic (1997) (1997) - Promedio: 4.25 - Votos: 350
Boot, Das (1981) (1997) - Promedio: 4.20 - Votos: 201
Sling Blade (1996) (1996) - Promedio: 4.20 - Votos: 136
As Good As It Gets (1997) (1997) - Promedio: 4.20 - Votos: 112
L.A. Confidential (1997) (1997) - Promedio: 4.16 - Votos: 297


In [70]:
# Recomendaciones entre 1993 y 1995
recommend_by_time_popularity(popularity_df, year_from=1993, year_to=1995, min_votes=50, top_n=10)

Recomendaciones populares entre 1993 y 1995:
Schindler's List (1993) (1993) - Promedio: 4.47 - Votos: 298
Wrong Trousers, The (1993) (1993) - Promedio: 4.47 - Votos: 118
Shawshank Redemption, The (1994) (1994) - Promedio: 4.45 - Votos: 283
Usual Suspects, The (1995) (1995) - Promedio: 4.39 - Votos: 267
Hoop Dreams (1994) (1994) - Promedio: 4.09 - Votos: 117
Much Ado About Nothing (1993) (1993) - Promedio: 4.06 - Votos: 176
Pulp Fiction (1994) (1994) - Promedio: 4.06 - Votos: 394
Three Colors: Red (1994) (1994) - Promedio: 4.06 - Votos: 83
Eat Drink Man Woman (1994) (1994) - Promedio: 4.05 - Votos: 80
Fugitive, The (1993) (1993) - Promedio: 4.04 - Votos: 336


# **Filtro temporal y género**

In [71]:
import pandas as pd

In [77]:
# Crear función para generar recomendaciones según un rango de años y un género
def recommend_by_time_genre_popularity(movies_df, popularity_df, genre, year_from=1990, year_to=1997, min_votes=50, top_n=10):
    # Filtrar por año y género
    genre_filtered = movies_df[
        (movies_df[genre] == 1) &
        (movies_df['year'] >= year_from) &
        (movies_df['year'] <= year_to)
    ][['movie_id', 'title', 'year']]

    # Unir con métricas de popularidad
    merged = genre_filtered.merge(popularity_df, on=['movie_id', 'title', 'year'])

    # Filtro por votos
    filtered = merged[merged['rating_count'] >= min_votes]

    # Ordenar y mostrar
    top = filtered.sort_values(by='rating_mean', ascending=False).head(top_n)

    print(f"Recomendaciones populares de género '{genre}' entre {year_from}-{year_to}:")
    for _, row in top.iterrows():
        print(f"{row['title']} ({int(row['year'])}) - Promedio: {row['rating_mean']:.2f}, Votos: {row['rating_count']}")

In [72]:
# Ratings
ratings_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.data"
ratings_df = pd.read_csv(ratings_url, sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])

In [73]:
# Películas + géneros
movies_url = "http://files.grouplens.org/datasets/movielens/ml-100k/u.item"
movie_cols = [
    'movie_id', 'title', 'release_date', 'video_release_date', 'IMDb_URL',
    'unknown', 'Action', 'Adventure', 'Animation', "Children's", 'Comedy', 'Crime',
    'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical',
    'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western'
]
movies_df = pd.read_csv(
    movies_url, sep='|', encoding='latin-1',
    header=None, names=movie_cols, usecols=range(24)
)

In [74]:
# Convertir fechas
movies_df['release_date'] = pd.to_datetime(movies_df['release_date'], errors='coerce')
movies_df['year'] = movies_df['release_date'].dt.year

In [75]:
# Unir peliculas y ratings
merged_df = ratings_df.merge(movies_df, on='movie_id')

In [82]:
merged_df.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,title,release_date,video_release_date,IMDb_URL,unknown,Action,...,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,year
0,196,242,3,881250949,Kolya (1996),1997-01-24,,http://us.imdb.com/M/title-exact?Kolya%20(1996),0,0,...,0,0,0,0,0,0,0,0,0,1997.0
1,186,302,3,891717742,L.A. Confidential (1997),1997-01-01,,http://us.imdb.com/M/title-exact?L%2EA%2E+Conf...,0,0,...,1,0,0,1,0,0,1,0,0,1997.0
2,22,377,1,878887116,Heavyweights (1994),1994-01-01,,http://us.imdb.com/M/title-exact?Heavyweights%...,0,0,...,0,0,0,0,0,0,0,0,0,1994.0
3,244,51,2,880606923,Legends of the Fall (1994),1994-01-01,,http://us.imdb.com/M/title-exact?Legends%20of%...,0,0,...,0,0,0,0,1,0,0,1,1,1994.0
4,166,346,1,886397596,Jackie Brown (1997),1997-01-01,,http://us.imdb.com/M/title-exact?imdb-title-11...,0,0,...,0,0,0,0,0,0,0,0,0,1997.0


In [76]:
# Agrupar por película y generar estadisticas
popularity_df = merged_df.groupby(['movie_id', 'title', 'year']).agg(
    rating_mean=('rating', 'mean'),
    rating_count=('rating', 'count')
).reset_index()

In [78]:
# Recomendaciones entre 1995 y 1997, para el género Comedia
recommend_by_time_genre_popularity(movies_df, popularity_df, genre='Comedy', year_from=1995, year_to=1997, min_votes=50, top_n=10)

Recomendaciones populares de género 'Comedy' entre 1995-1997:
Close Shave, A (1995) (1996) - Promedio: 4.49, Votos: 112
As Good As It Gets (1997) (1997) - Promedio: 4.20, Votos: 112
Babe (1995) (1995) - Promedio: 4.00, Votos: 219
Kolya (1996) (1997) - Promedio: 3.99, Votos: 117
Full Monty, The (1997) (1997) - Promedio: 3.93, Votos: 315
Toy Story (1995) (1995) - Promedio: 3.88, Votos: 452
Swingers (1996) (1996) - Promedio: 3.83, Votos: 157
Cold Comfort Farm (1995) (1996) - Promedio: 3.82, Votos: 125
Men in Black (1997) (1997) - Promedio: 3.75, Votos: 303
Welcome to the Dollhouse (1995) (1996) - Promedio: 3.72, Votos: 112


In [80]:
# Recomendaciones entre 1995 y 1997, para el género Aventura
recommend_by_time_genre_popularity(movies_df, popularity_df, genre='Adventure', year_from=1995, year_to=1997, min_votes=50, top_n=10)

Recomendaciones populares de género 'Adventure' entre 1995-1997:
Return of the Jedi (1983) (1997) - Promedio: 4.01, Votos: 507
City of Lost Children, The (1995) (1995) - Promedio: 3.79, Votos: 96
Men in Black (1997) (1997) - Promedio: 3.75, Votos: 303
Rock, The (1996) (1996) - Promedio: 3.69, Votos: 378
Star Trek: First Contact (1996) (1996) - Promedio: 3.66, Votos: 365
Edge, The (1997) (1997) - Promedio: 3.54, Votos: 113
Fly Away Home (1996) (1996) - Promedio: 3.54, Votos: 149
Hercules (1997) (1997) - Promedio: 3.52, Votos: 66
Con Air (1997) (1997) - Promedio: 3.46, Votos: 137
Rumble in the Bronx (1995) (1996) - Promedio: 3.45, Votos: 174


In [81]:
# Recomendaciones entre 1993 y 1995, para el género Fantasia
recommend_by_time_genre_popularity(movies_df, popularity_df, genre='Fantasy', year_from=1993, year_to=1995, min_votes=50, top_n=10)

Recomendaciones populares de género 'Fantasy' entre 1993-1995:
Heavenly Creatures (1994) (1994) - Promedio: 3.67, Votos: 70
Jumanji (1995) (1995) - Promedio: 3.31, Votos: 96
Mask, The (1994) (1994) - Promedio: 3.19, Votos: 129
