# KNN

Do wykonania naszego modelu wykorzystaliśmy algorytm KNN, z racji na swoją prostotę oraz zasadę działania. <br/>
Pozwala on na znajdowanie podobnych rzeczy, z tego to powodu jest dobrym wyborem do sugerowania filmów i seriali. 

# Import Libraries

In [1]:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer, StandardScaler, OneHotEncoder
from sklearn.neighbors import NearestNeighbors
import numpy as np
from collections import defaultdict

pd.options.display.width = 1000

# Read data

In [2]:
df = pd.read_csv('dataset/cleaned_movies.csv')

# Data transforming

Jako `genres` jest listą, w format plików csv nie posiada natywnej reprezentacji listy, musimy zmapować wszystkie `genres` na listy.

In [3]:
df['genres'] = df['genres'].fillna('').apply(lambda x: x.split(', ') if x else [])

# Data encoding

Po przekrztałceniu danych na wymagane dla algorytmu dane musimy utworzyć i pogrupować parametry na podstawie których będą porównywane filmy.

Biblioteka, której użyliśmy, pozwala na wektoryzacje list, enumeracji oraz oczywiście skalarów.

In [4]:
mlb = MultiLabelBinarizer()
genres_encoded = mlb.fit_transform(df['genres'])

# One-hot encoding dla języka
ohe = OneHotEncoder(sparse_output=False)
language_encoded = ohe.fit_transform(df[['original_language']])

# Skalowanie runtime, vote_average, vote_count
scaler = StandardScaler()
numeric_features = scaler.fit_transform(df[['runtime', 'vote_average', 'vote_count']])

# Vectors

In [5]:
X = np.hstack([genres_encoded, language_encoded, numeric_features])

# Model

Po utworzeniu stosu parametrów wejściowych to strukturę odpowiedzialną za wykonywanie obliczeń algorytmu KNN.

In [6]:
knn = NearestNeighbors(metric='cosine', algorithm='brute')
knn.fit(X)

# Functions

Dodatkowo dla celów użytkowych stowżyliśmy kilka funkcji pomocniczych, ułatwiających wykonywanie zapytań do naszego modelu.

Z racji na zasadę działania algorytmu (przyjmuje 1 punkt) oraz na naszą chęć możliwości podawania listy lubianych filmów i seriali,<br>
zmuszeni zostaliśmy do rozważenia kilku sposobów obejcia tego problemu:
- uśrednie wektorów cech dla polubianych filmów oraz wykonanie jednego zapytania do modelu.<br>
- wykonanie zapytania do modelu dla każdego polubianego filmu oraz uśrednienie wyniku.<br>

In [7]:
def get_movie_index_by_title(title):
    matches = df[df['original_title'].str.lower() == title.lower()]
    if not matches.empty:
        return matches.index[0]
    else:
        raise ValueError(f"Movie '{title}' not found.")

In [8]:
def recommend_mean_vector(movie_indices, n_recommendations=5):
    # Średnia wektorów cech dla wybranych filmów
    mean_features = np.mean(X[movie_indices], axis=0).reshape(1, -1)
    distances, indices = knn.kneighbors(mean_features, n_neighbors=n_recommendations + len(movie_indices))
    
    # Usuń z wyników filmy już znane użytkownikowi
    recommendations = []
    for idx in indices.flatten():
        if idx not in movie_indices:
            recommendations.append(idx)
        if len(recommendations) == n_recommendations:
            break
    return df.iloc[recommendations][['original_title', 'genres', 'vote_average']]

In [9]:
def recommend_mean_result(movie_indices, n_recommendations=5):
    recommendation_scores = defaultdict(lambda: {'count': 0, 'distances': []})

    for idx in movie_indices:
        distances, indices = knn.kneighbors([X[idx]], n_neighbors=n_recommendations + 1)
        for dist, i in zip(distances[0][1:], indices[0][1:]):  # pomijamy sam film
            if i not in movie_indices:  # pomijamy filmy wejściowe
                recommendation_scores[i]['count'] += 1
                recommendation_scores[i]['distances'].append(dist)

    # Zamień na DataFrame
    results = []
    for idx, score in recommendation_scores.items():
        avg_distance = np.mean(score['distances'])
        results.append({
            'index': idx,
            'original_title': df.iloc[idx]['original_title'],
            'genres': df.iloc[idx]['genres'],
            'vote_average': df.iloc[idx]['vote_average'],
            'times_recommended': score['count'],
            'avg_distance': avg_distance
        })

    results_df = pd.DataFrame(results)
    results_df = results_df.sort_values(by=['times_recommended', 'avg_distance'], ascending=[False, True])
    return results_df[['original_title', 'genres', 'vote_average']].head(n_recommendations)
    # return results_df.head(n_recommendations)

# Test

In [10]:
tests = [
    ["Toy Story"],
    ["Toy Story 2", "Toy Story"],
    ["Toy Story", "Jumanji"],
    ["Toy Story", "Jumanji", "Nadja"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King", "The Nightmare Before Christmas"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King", "The Nightmare Before Christmas", "The Matrix"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King", "The Nightmare Before Christmas", "The Matrix", "Pulp Fiction"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King", "The Nightmare Before Christmas", "The Matrix", "Pulp Fiction", "Forrest Gump"],
    ["Toy Story", "Jumanji", "Nadja", "Thinner", "The Lion King", "The Nightmare Before Christmas", "The Matrix", "Pulp Fiction", "Forrest Gump", "Fight Club"],
]

In [11]:
for test_movies in tests:
    try:
        print(f"=========================================================================================")
        movie_indices = [get_movie_index_by_title(title) for title in test_movies]
        recommendations = recommend_mean_vector(movie_indices, n_recommendations=5)
        print("Rekomendacje na podstawie filmów:", test_movies)
        print(recommendations)
        print(f"=========================================================================================\n")
    except ValueError as e:
        print(e)


Rekomendacje na podstawie filmów: ['Toy Story']
        original_title                              genres  vote_average
4746    Monsters, Inc.         [Animation, Comedy, Family]           7.5
21108  Despicable Me 2         [Animation, Comedy, Family]           7.0
15328      Toy Story 3         [Animation, Family, Comedy]           7.6
2989       Toy Story 2         [Animation, Comedy, Family]           7.3
30205       Inside Out  [Drama, Comedy, Animation, Family]           7.9

Rekomendacje na podstawie filmów: ['Toy Story 2', 'Toy Story']
        original_title                                  genres  vote_average
21108  Despicable Me 2             [Animation, Comedy, Family]           7.0
15328      Toy Story 3             [Animation, Family, Comedy]           7.6
4746    Monsters, Inc.             [Animation, Comedy, Family]           7.5
30205       Inside Out      [Drama, Comedy, Animation, Family]           7.9
13710               Up  [Animation, Comedy, Family, Adventure]   

In [12]:
for test_movies in tests:
    try:
        print(f"=========================================================================================")
        movie_indices = [get_movie_index_by_title(title) for title in test_movies]
        recommendations = recommend_mean_result(movie_indices, n_recommendations=5)
        print("Rekomendacje na podstawie filmów:", test_movies)
        print(recommendations)
        print(f"=========================================================================================\n")
    except ValueError as e:
        print(e)


Rekomendacje na podstawie filmów: ['Toy Story']
    original_title                              genres  vote_average
0   Monsters, Inc.         [Animation, Comedy, Family]           7.5
1  Despicable Me 2         [Animation, Comedy, Family]           7.0
2      Toy Story 3         [Animation, Family, Comedy]           7.6
3      Toy Story 2         [Animation, Comedy, Family]           7.3
4       Inside Out  [Drama, Comedy, Animation, Family]           7.9

Rekomendacje na podstawie filmów: ['Toy Story 2', 'Toy Story']
    original_title                                  genres  vote_average
1  Despicable Me 2             [Animation, Comedy, Family]           7.0
0      Toy Story 3             [Animation, Family, Comedy]           7.6
2   Monsters, Inc.             [Animation, Comedy, Family]           7.5
4       Inside Out      [Drama, Comedy, Animation, Family]           7.9
3         Zootopia  [Animation, Adventure, Family, Comedy]           7.7

Rekomendacje na podstawie filmów: [

# App

Aby wyżej stowrzony model był przydany, stwierdziliśmy, że stwożymy dla nie go prosty server http, który będzie podawał użytkownikom listę polecanych filmów do obejżenia.

In [None]:
from flask import Flask, jsonify, send_file, request
import threading
import os

app = Flask(__name__)

# Define routes
@app.route('/')
def home():
    try:
        # Sprawdź czy plik istnieje w bieżącym katalogu
        if os.path.exists('app.html'):
            return send_file('app.html')
        else:
            return f"""
            <html>
                <body>
                    <h1>Błąd: Plik app.html nie został znaleziony</h1>
                    <p>Upewnij się, że plik app.html znajduje się w katalogu: {os.getcwd()}</p>
                </body>
            </html>
            """, 404
    except Exception as e:
        return f"Błąd podczas ładowania pliku: {str(e)}", 500

@app.route('/api/data', methods=['GET'])
def get_data():
    try:
        movies = [{'id': idx, 'title': row['original_title'] } for idx, row in df.iterrows()]
        return jsonify(movies)
    except ValueError as e:
        return jsonify({"error": str(e)}), 400

@app.route('/api/recomend', methods=['POST'])
def get_recommendations():
    try:
        movie_titles = request.get_json()
        if not movie_titles:
            return jsonify({"error": "No movies provided"}), 400
        
        movie_indices = [get_movie_index_by_title(title) for title in movie_titles['movies']]
        recommendations = recommend_mean_result(movie_indices, n_recommendations=5)
        result = []
        for _, row in recommendations.iterrows():
            result.append(row['original_title'])

        print(result)
        return jsonify(result)
    except ValueError as e:
        return jsonify({"error": str(e)}), 400

# Function to run Flask app
def run_flask():
    app.run(host='127.0.0.1', port=5000, debug=False, use_reloader=False)

run_flask()

# # Start Flask in a separate thread
# flask_thread = threading.Thread(target=run_flask, daemon=True)
# flask_thread.start()

# print("Flask server started!")
# print("Visit: http://localhost:5000")


 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [27/May/2025 16:44:06] "POST /api/recomend HTTP/1.1" 200 -


['Maximum Risk', 'Drop Zone', 'Arabesque', 'Blood Alley', 'The Fourth Angel']


127.0.0.1 - - [27/May/2025 16:44:19] "POST /api/recomend HTTP/1.1" 200 -


['Maximum Risk', 'Drop Zone', 'Arabesque', 'Blood Alley', 'The Fourth Angel']


127.0.0.1 - - [27/May/2025 16:44:36] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:44:36] "GET /.well-known/appspecific/com.chrome.devtools.json HTTP/1.1" 404 -
127.0.0.1 - - [27/May/2025 16:44:37] "GET /api/data HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:44:40] "POST /api/recomend HTTP/1.1" 200 -


['Maximum Risk', 'Drop Zone', 'Arabesque', 'Blood Alley', 'The Fourth Angel']


127.0.0.1 - - [27/May/2025 16:44:55] "POST /api/recomend HTTP/1.1" 200 -


['Monsters, Inc.', 'Despicable Me 2', 'Toy Story 3', 'Toy Story 2', 'Inside Out']


127.0.0.1 - - [27/May/2025 16:45:18] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [27/May/2025 16:45:18] "GET /.well-known/appspecific/com.chrome.devtools.json HTTP/1.1" 404 -
127.0.0.1 - - [27/May/2025 16:45:19] "GET /api/data HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:45:41] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:45:41] "GET /.well-known/appspecific/com.chrome.devtools.json HTTP/1.1" 404 -
127.0.0.1 - - [27/May/2025 16:45:42] "GET /api/data HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:46:29] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:46:30] "GET /.well-known/appspecific/com.chrome.devtools.json HTTP/1.1" 404 -
127.0.0.1 - - [27/May/2025 16:46:31] "GET /api/data HTTP/1.1" 200 -
127.0.0.1 - - [27/May/2025 16:46:40] "POST /api/recomend HTTP/1.1" 200 -


['Monsters, Inc.', 'Despicable Me 2', 'Toy Story 3', 'Toy Story 2', 'Inside Out']


127.0.0.1 - - [27/May/2025 16:47:11] "POST /api/recomend HTTP/1.1" 200 -


['Despicable Me 2', 'Toy Story 3', 'Monsters, Inc.', 'Inside Out', 'Zootopia']


127.0.0.1 - - [27/May/2025 16:47:26] "POST /api/recomend HTTP/1.1" 200 -


['Monsters, Inc.', 'Despicable Me 2', 'Toy Story 3', 'Finding Nemo', 'WALL·E']


127.0.0.1 - - [27/May/2025 16:47:37] "POST /api/recomend HTTP/1.1" 200 -


['Monsters, Inc.', 'Despicable Me 2', 'Toy Story 3', 'Finding Nemo', 'WALL·E']
