Pobierz dane o filmach i ich ocenach z plików movies.csv i ratings.csv. Zbiór zawiera 100000 ocen około 9000 filmów. Dane pochodzą z projektu MovieLens. Przyjmij, że ocena 4 lub więcej jest pozytywna, a ocena 2 lub mniej jest negatywna. Zbuduj dwa modele rekomendacyjne do generowania pozytywnych i negatywnych rekomendacji. Postaraj się odpowiedzieć na następujące pytania:

Objerzałem już „Pulp Fiction” i “Reservoir Dogs”, oba filmy bardzo mi się podobały. Jaki film należy mi zarekomendować?

Bardzo nie podobał mi się film ”Maska”. Jakich filmów powinienem unikać?

In [114]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [115]:
movies = pd.read_csv('data/movies.csv')
ratings = pd.read_csv('data/ratings.csv')

In [116]:
ratings = ratings.merge(movies, on='movieId')
ratings = ratings.drop(['timestamp', 'genres'], axis=1)

## Movies to recommend


In [117]:
ratings_positive = ratings.copy()
ratings_positive['liked'] = ratings_positive['rating'] >= 4


In [118]:
liked_transactions = ratings_positive[ratings_positive['liked']].groupby('userId')['title'].apply(list)

In [119]:
def process_transactions(transactions):
    te = TransactionEncoder()
    te_ary = te.fit(transactions).transform(transactions)
    df = pd.DataFrame(te_ary, columns=te.columns_)
    return df

In [120]:
liked_df = process_transactions(liked_transactions)


In [121]:
liked_itemsets = apriori(liked_df, min_support=0.1, use_colnames=True)


In [122]:
liked_rules = association_rules(liked_itemsets, metric="confidence", min_threshold=0.7)


In [123]:
recommended_movies = liked_rules[(liked_rules['antecedents'].apply(lambda x: "Pulp Fiction (1994)" in x or "Reservoir Dogs (1992)" in x)) & (liked_rules['confidence'] > 0.7)]

In [124]:
recommended_titles = set()
for index, row in recommended_movies.iterrows():
    recommended_titles.update(row['consequents'])

print("Recommendations:")
for title in recommended_titles:
    print(title)

Recommendations:
Lord of the Rings: The Fellowship of the Ring, The (2001)
Star Wars: Episode IV - A New Hope (1977)
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
Lord of the Rings: The Two Towers, The (2002)
Lord of the Rings: The Return of the King, The (2003)
Star Wars: Episode V - The Empire Strikes Back (1980)
Pulp Fiction (1994)
Star Wars: Episode VI - Return of the Jedi (1983)
Silence of the Lambs, The (1991)
Shawshank Redemption, The (1994)


## Movies to avoid

In [125]:
ratings_negative = ratings.copy()
ratings_negative['disliked'] = ratings_negative['rating'] <= 2

In [126]:
disliked_transactions = ratings_negative[ratings_negative['disliked']].groupby('userId')['title'].apply(list)

In [127]:
disliked_df = process_transactions(disliked_transactions)

In [128]:
disliked_itemsets = apriori(disliked_df, min_support=0.01, use_colnames=True)

In [129]:
disliked_rules = association_rules(disliked_itemsets, metric="confidence", min_threshold=0.7)

In [130]:
movies_to_avoid = disliked_rules[(disliked_rules['antecedents'].apply(lambda x: "Mask, The (1994)" in x)) & (disliked_rules['confidence'] > 0.7)]

In [131]:
titles_to_avoid = set()
for index, row in movies_to_avoid.iterrows():
    titles_to_avoid.update(row['consequents'])

print("Movies to avoid:")
for title in titles_to_avoid:
    print(title)

Movies to avoid:
Ace Ventura: Pet Detective (1994)
Ace Ventura: When Nature Calls (1995)
