In [24]:
import os
print(os.getcwd())  # Prints the current working directory


c:\Users\arsen\Desktop\new_ML


Might need to change directory
```python
import os
os.chdir('c:/Users/arsen/Desktop/new_ML')
```

In [25]:
import os
os.chdir('c:/Users/arsen/Desktop/new_ML')



#### Overview
This code defines a function `recommend_movies` that recommends movies based on movie titles given by user<br> 
It uses 
1) `cosine similarity` - compares this movies with each other and finds their similarity
2) `TF-IDF vectorization` - converts our data to numbers that computer can uderstand <br>

and adjustable weights to compute similarity scores across different movie features (e.g., title, keywords, genres). The function then ranks and returns the top recommendations.

All logic is inside recommend_movies function in order to reuse further in `evaluation` part<br>
recomendation_movies takes these as input
  - `input_titles`: List of movie titles to base recommendations on. 
  - `data_path`: Path to the CSV file containing movie data.
  - `top_n`: Number of recommendations to return.
  - `weights`: Dictionary of weights for different features <br>

each has default value, so writing only `input_titles` is mandatory<br>
as output it gives titles of movies.


<br>

### How features are used?
1) computed cosine similarity for `title`, `keywords`, `director` and *separatly* one hot encoded version of `genres`
<br>

2) using `user_rating` preferebly showing higher rating movies, but it weight is "powered", so slightest change could change recomendatino entirely <br>

3) penalizing 'adult' feature if there is no adult movie in given as input<br>

<br>
<br>

## Boring code part explanation(you may skeep)

```python
if weights is None:
    weights = {
        'title': 1.0,
        'user_rating': 0.1,
        'keywords': 1.5,
        'director': 0.8,
        'adult': 1.0,
        'genres': 0.4
    }
```
- **Default Weights**: Assigns default weights to features if none are provided. 
  - `title`, `keywords`, `director`, etc. are given specific weights for their influence on recommendations.
  <br>




### Computing TF-IDF and Similarities

```python
tfidf_title = TfidfVectorizer(stop_words='english')
title_matrix = tfidf_title.fit_transform(movies_df['title'].fillna(''))
```
- **TF-IDF**: Vectorizes feature to compute similarities.
- **Stop Words** we need `stop_words` parameter for removing unessecary text parts like `a`, `the`, `an` etc.<br>
title, keywords, director features <br>

```python
title_similarity = cosine_similarity(title_matrix, title_matrix) * weights['title']
keywords_similarity = cosine_similarity(keywords_matrix, keywords_matrix) * weights['keywords']
director_similarity = cosine_similarity(director_matrix, director_matrix) * weights['director']
```
- **Cosine Similarity**: Computes pairwise similarity scores for titles, keywords, and directors. Scales these scores using the respective weights.


### Genre Similarity

```python
genre_columns = ['Action', 'Adventure', 'Animation', ...]
genre_matrix = movies_df[genre_columns].values
genre_similarity = cosine_similarity(genre_matrix, genre_matrix)
```
- **Genre Features**: Uses binary-encoded genre columns for similarity computation.

```python
total_similarity = (
    textual_similarity * (1 - weights['genres']) +
    genre_similarity * weights['genres']
)
```
- **Combining Similarities**: Merges textual and genre similarities, weighted appropriately.

### Input Movie Indices

```python
movie_indices = []
for title in input_titles:
    if title in movies_df['title'].values:
        idx = movies_df[movies_df['title'] == title].index
        if len(idx) > 0:
            movie_indices.append(idx[0])
```
- **Find Input Indices**: Locates indices of the user-specified movie titles in the dataset.

### Adjusting and Ranking Recommendations

```python
sim_scores = total_similarity[movie_indices].sum(axis=0)
sim_scores = sim_scores * (movies_df['user_rating'] ** weights['user_rating'])
```
- **Aggregate Scores**: Computes cumulative similarity scores for input movies.
- **Rating Adjustment**: Multiplies scores by user ratings, weighted accordingly.

```python
if not any(movies_df.iloc[idx]['adult'] for idx in movie_indices):
    sim_scores[movies_df['adult'] == True] *= 0.5
```
- **Adult Movies Penalty**: Reduces scores for adult movies if none of the input movies are marked as adult.

```python
sim_scores = [(i, score) for i, score in enumerate(sim_scores)]
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
```
- **Sort Results**: Sorts similarity scores in descending order.

### Filter and Return Results

```python
input_indices = set(movie_indices)
recommendations = [(i, score) for i, score in sim_scores if i not in input_indices and score > 0]
```
- **Exclude Input Movies**: Removes input movies from the recommendation list.

```python
top_recommendations = recommendations[:top_n]
recommended_titles = [movies_df.iloc[i]['title'] for i, _ in top_recommendations]
```
- **Return Top N**: Retrieves titles for the top `n` recommendations.


In [26]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error

def recommend_movies(input_titles, data_path='cleaned_movies.csv', top_n=20, weights=None):
    """
    Recommend movies based on input movie titles using multiple features with adjustable weights.
    """
    
    # loading data
    movies_df = pd.read_csv(data_path)

    # Recommendation settings(Weights), 
    # that will apply if not provided
    if weights is None:
        weights = {
            'title': 1.0,                                    
            'user_rating': 0.1,
            'keywords': 1.5,
            'director': 0.8,
            'adult': 1.0,
            'genres': 0.4
        }

    # separate computation of TF-IDF for each feature
    tfidf_title = TfidfVectorizer(stop_words='english')
    title_matrix = tfidf_title.fit_transform(movies_df['title'].fillna(''))

    tfidf_keywords = TfidfVectorizer(stop_words='english')
    keywords_matrix = tfidf_keywords.fit_transform(movies_df['keywords'].fillna(''))

    tfidf_director = TfidfVectorizer(stop_words='english')
    director_matrix = tfidf_director.fit_transform(movies_df['director'].fillna(''))

    # Compute cosine similarities for each feature
    # when you pass itself twice(title_matrix, title_matrix) it will compute similarity between each movie with each other
    title_similarity = cosine_similarity(title_matrix, title_matrix) * weights['title']
    keywords_similarity = cosine_similarity(keywords_matrix, keywords_matrix) * weights['keywords']
    director_similarity = cosine_similarity(director_matrix, director_matrix) * weights['director']

    # Combining similarities
    textual_similarity = title_similarity + keywords_similarity + director_similarity


    # for Genre hot coded version computing sim_score
    genre_columns = [
        'Action', 'Adventure', 'Animation', 'Comedy', 'Crime', 'Documentary',
        'Drama', 'Family', 'Fantasy', 'History', 'Horror', 'Music', 'Mystery',
        'Romance', 'Science Fiction', 'TV Movie', 'Thriller', 'Unknown', 'War', 'Western'
    ]
    genre_matrix = movies_df[genre_columns].values
    genre_similarity = cosine_similarity(genre_matrix, genre_matrix)

    # Genre + (title+keywords+director) similarity
    total_similarity = (
    textual_similarity * (1 - weights['genres']) +
    genre_similarity * weights['genres']
    )

    # finding movie indices of input titles
    movie_indices = []
    for title in input_titles:
        if title in movies_df['title'].values:
            idx = movies_df[movies_df['title'] == title].index
            if len(idx) > 0:
                movie_indices.append(idx[0])

    # Handle empty input case
    if not movie_indices:
        print("No valid movie indices found. Check input titles.")
        return []

    # total_sim is matrix of sim_score between all movies
    # movie_indices is movies we want 
    # then we just sum all sim_score of movies we want
    sim_scores = total_similarity[movie_indices].sum(axis=0)

    # Adjust similarity scores based on user ratings
    # so it will recommend movies with higher user rating
    sim_scores = sim_scores * (movies_df['user_rating'] ** weights['user_rating'])
    

    # Penalize adult movies if the input movie is not marked as adult
    if not any(movies_df.iloc[idx]['adult'] for idx in movie_indices):
        sim_scores[movies_df['adult'] == True] *= 0.5  # Penalizing for 50%

    # Sort and filter results
    # most similar movies will be on top
    sim_scores = [(i, score) for i, score in enumerate(sim_scores)]
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)


    # filtering input movies, so they wont be in output
    input_indices = set(movie_indices)
    recommendations = [(i, score) for i, score in sim_scores if i not in input_indices and score > 0]

    # Getin top N recommendations
    top_recommendations = recommendations[:top_n]
    recommended_titles = [movies_df.iloc[i]['title'] for i, _ in top_recommendations]

    return recommended_titles


# Evaluation part

Because we don't have any truth ground(list of recomendation that user 100% likes) and no one could possible get that <br>
So we will evaluate without numbers but our sheer logic and eyes by using/testing
<br>
First let's change weights little bit (you may change it too for your testing)<br>
and then test cases will appear

In [None]:
# i think it is best settings
weights = {
    'title': 0.5,
    'user_rating': 0.1,
    'keywords': 1.5,
    'director': 0.4,
    'adult': 1.0,
    'genres': 0.4
}


Recommended Movies: ['No Manches Frida', 'Odio el verano', 'The Woman in Black', 'The Hidden Face', 'Exorcist: The Beginning', "Mothers' Instinct", 'Thir13en Ghosts', 'Chloe', 'Girl', 'Usogui', 'The Wolfman', 'Bad Genius', 'The Medium', 'Hereditary', 'Inheritance', 'Speak No Evil', 'The Ritual', 'Cargo', 'Hidden', 'Crawl']



### **Test Case 1: Family-Friendly Animated Movies**

**Objective**: Validate that the system recommends movies suitable for all ages in the animated genre.  
**Input Movies**:  
*"Moana"*, *"Atlantis: The Lost Empire"*, *"Ruby Gillman, Teenage Kraken"*  
**Expected Outcome**: Recommendations include similar family-friendly animated movies like *"Frozen"*, *"The Incredibles"*, or *"Zootopia"*.


In [36]:
print("Test Case 1: Family-Friendly Animated Movies")
print("Recommended Movies:", recommend_movies(["Moana", "Atlantis: The Lost Empire", "Ruby Gillman, Teenage Kraken"], weights=weights))

Test Case 1: Family-Friendly Animated Movies
Recommended Movies: ['Moana 2', 'The Croods', 'The SpongeBob Movie: Sponge Out of Water', 'Spellbound', 'The Lego Movie', 'Christopher Robin', 'Monsters, Inc.', "The Emperor's New Groove", 'The Secret Life of Pets 2', 'Luca', 'Once Upon a Studio', 'The Loud House Movie', 'Tom & Jerry', 'Vivo', 'The Smurfs', 'Playmobil: The Movie', 'The Pirates! In an Adventure with Scientists!', 'The Garfield Movie', 'Bolt', 'The Emoji Movie']



---

### **Test Case 2: Movies by a Specific Director**
  
**Objective**: Test the system's ability to recommend movies directed by a specific filmmaker.  
**Input Movies**:  
*"Inception"*, *"Interstellar"*, *"The Dark Knight"*  
**Expected Outcome**: Recommendations should include other Christopher Nolan movies, such as *"Tenet"*, *"Dunkirk"*, or *"The Prestige"*.


In [29]:
print("Test Case 2: Movies by a Specific Director")
print("Recommended Movies:", recommend_movies(["Inception", "Interstellar", "The Dark Knight"], weights=weights))


Test Case 2: Movies by a Specific Director
Recommended Movies: ['The Dark Knight Rises', 'Batman Begins', 'Tenet', 'The Prestige', 'Dunkirk', 'Insomnia', 'Oppenheimer', 'Jack Reacher', 'Planet of the Apes', 'Spaceman', 'Stargate: The Ark of Truth', 'Elysium', 'Rebel Moon - Part One: A Child of Fire', 'The Midnight Sky', 'Sunshine', 'Memento', 'Avengers: Endgame', 'Law Abiding Citizen', 'Space Battleship Yamato', 'Arrival']



---

### **Test Case 3: Post-Apocalyptic or Dystopian Movies**

**Objective**: Test recommendations for movies with themes of apocalypse or dystopia.  
**Input Movies**:  
*"Mad Max: Fury Road"*, *"The Hunger Games"*, *"Children of Men"*  
**Expected Outcome**: Recommendations should include similar movies, such as *"The Road"*, *"Snowpiercer"*, or *"Edge of Tomorrow"*.


In [30]:
print("Test Case 3: Post-Apocalyptic or Dystopian Movies")
print("Recommended Movies:", recommend_movies(["Mad Max: Fury Road", "The Hunger Games", "Children of Men"], weights=weights))

Test Case 3: Post-Apocalyptic or Dystopian Movies
Recommended Movies: ['The Hunger Games: Mockingjay - Part 1', 'The Hunger Games: Mockingjay - Part 2', 'Chaos Walking', 'Furiosa: A Mad Max Saga', 'Mad Max Beyond Thunderdome', 'Mad Max', 'The Hunger Games: Catching Fire', 'Battle for the Planet of the Apes', 'Mortal Engines', 'Divergent', 'Allegiant', 'The Day', 'Maze Runner: The Scorch Trials', 'The Postman', 'After Earth', 'The Time Machine', 'Vampire Hunter D: Bloodlust', 'The 5th Wave', 'The Running Man', 'The Maze Runner']


---

### **Test Case 4: Holiday-Themed Movies**

**Objective**: Test recommendations for movies centered around Christmas or similar holidays.  
**Input Movies**:  
*"Home Alone"*, *"Elf"*, *"The Polar Express"*  
**Expected Outcome**: Recommendations could include other holiday-themed films, such as *"The Grinch"*, *"The Santa Clause"*, or *"Love Actually"*.


In [31]:

print("Test Case 4: Holiday-Themed Movies")
print("Recommended Movies:", recommend_movies(["Home Alone", "Elf", "The Polar Express"], weights=weights))


Test Case 4: Holiday-Themed Movies
Recommended Movies: ['The Boss Baby: Christmas Bonus', 'A Christmas Carol', 'Prep & Landing', 'The Search for Santa Paws', 'The Flight Before Christmas', 'The Nightmare Before Christmas', 'The Santa Clause 3: The Escape Clause', 'Rudolph the Red-Nosed Reindeer', 'A Boy Called Christmas', 'Joseph: King of Dreams', 'Noelle', 'Prep & Landing Stocking Stuffer: Operation: Secret Santa', 'Anastasia', 'The Naughty Nine', 'Rudolph the Red-Nosed Reindeer & the Island of Misfit Toys', 'Arthur Christmas', 'An Almost Christmas Story', "Roald Dahl's The Witches", 'Beowulf', "Olaf's Frozen Adventure"]


---

### **Test Case 5: Movies Featuring Monsters**
 
**Objective**: Test recommendations for films where monsters play a central role.  
**Input Movies**:  
*"Monsters, Inc."*, *"Jurassic Park"*, *"King Kong"*  
**Expected Outcome**: Recommendations could include other monster-related films like *"Godzilla"*, *"Pacific Rim"*, or *"Cloverfield"*.


In [32]:

print("Test Case 5: Movies Featuring Monsters")
print("Recommended Movies:", recommend_movies(["Monsters, Inc.", "Jurassic Park", "King Kong"], weights=weights))


Test Case 5: Movies Featuring Monsters
Recommended Movies: ['Godzilla vs. Kong', 'Jurassic World', 'Godzilla x Kong: The New Empire', 'Kong: Skull Island', 'Monsters vs Aliens', 'Rampage', 'The Son of Kong', 'The Good Dinosaur', 'Love and Monsters', 'Inside Out', 'Up', 'Wonder Park', 'Pacific Rim', 'Spy Kids 2: The Island of Lost Dreams', 'Indiana Jones and the Temple of Doom', 'Cast Away', 'The Angry Birds Movie', 'Ant-Man and the Wasp', 'The Nut Job', 'Godzilla: King of the Monsters']




---

### **Test Case 6: Biographical Movies about Musicians**

**Objective**: Validate recommendations for biographical movies that focus on the lives of musicians.  
**Input Movies**:  
*"Bohemian Rhapsody"*, *"Rocketman"*, *"Walk the Line"*  
**Expected Outcome**: Recommendations could include *"Ray"*, *"Straight Outta Compton"*, or *"La Bamba"*.



In [33]:

print("Test Case 6: Biographical Movies about Musicians")
print("Recommended Movies:", recommend_movies(["Bohemian Rhapsody", "Rocketman", "Walk the Line"], weights=weights))


Test Case 6: Biographical Movies about Musicians
Recommended Movies: ['Maria', 'T√ÅR', 'A Star Is Born', 'Maestro', 'Ray', 'Inside Llewyn Davis', 'Back to Black', 'Almost Famous', 'TAYLOR SWIFT | THE ERAS TOUR', 'The Bee Gees: How Can You Mend a Broken Heart', 'I Still Believe', '9 Songs', 'My Policeman', 'CODA', 'Jesus Christ Superstar', 'Vox Lux', 'Unsung Hero', 'Straight Outta Compton', '8 Mile', 'Bob Marley: One Love']




---

### **Test Case 7: Superhero Movies with Team Dynamics**
  
**Objective**: Test recommendations for superhero movies where team dynamics are a significant theme.  
**Input Movies**:  
*"The Avengers"*, *"Guardians of the Galaxy"*, *"X-Men: Days of Future Past"*  
**Expected Outcome**: Recommendations could include *"Justice League"*, *"Fantastic Four"*, or *"Watchmen"*.



In [34]:

print("Test Case 7: Superhero Movies with Team Dynamics")
print("Recommended Movies:", recommend_movies(["The Avengers", "Guardians of the Galaxy", "X-Men: Days of Future Past"], weights=weights))


Test Case 7: Superhero Movies with Team Dynamics
Recommended Movies: ['Guardians of the Galaxy Vol. 2', 'Guardians of the Galaxy Vol. 3', 'The Marvels', 'Ant-Man and the Wasp: Quantumania', 'Iron Man 2', 'The Guardians of the Galaxy Holiday Special', 'Captain America: The Winter Soldier', 'X-Men: Apocalypse', 'Spider-Man: Far From Home', 'Avengers: Infinity War', 'Iron Man', 'Spider-Man: Homecoming', 'Iron Man 3', 'Avengers: Endgame', 'Thor: Ragnarok', 'Doctor Strange in the Multiverse of Madness', 'Superman Returns', 'Black Widow', 'Ant-Man', 'Spider-Man: Into the Spider-Verse']




---

### **Test Case 8: Sci-Fi with AI and Robotics**
 
**Objective**: Test recommendations for sci-fi movies that feature AI and robotics as central elements.  
**Input Movies**:  
*"Ex Machina"*, *"I, Robot"*, *"Blade Runner 2049"*  
**Expected Outcome**: Recommendations include *"The Matrix"*, *"Ghost in the Shell"*, or *"Westworld"*.



In [35]:

print("Test Case 8: Sci-Fi with AI and Robotics")
print("Recommended Movies:", recommend_movies(["Ex Machina", "I, Robot", "Blade Runner 2049"], weights=weights))


Test Case 8: Sci-Fi with AI and Robotics
Recommended Movies: ['Blade Runner', 'Cloud Atlas', 'Bicentennial Man', 'Moon', 'Arrival', 'A.I. Rising', 'The Man Who Fell to Earth', 'The Giver', 'Melancholia', 'JUNG_E', 'The Midnight Sky', 'The Circle', 'The Beast', 'Interstellar', 'Logan', 'The Man from Earth', 'WarGames', 'Proximity', 'Aire: Just Breathe', 'Space Sweepers']
