# 1. Introduction

We have the following data (examples are taken from the description on the coursera.forum):
1.	List of movies (their names);  
Example: `movies = ["Parasite", "1917", "Ford v Ferrari", "Jojo Rabbit", "Joker"]`;  
The number of films will be denoted as $N$.


2.	List of similarities between movies (pairs of movies that are similar);  
Example: `similarities = [["Parasite", "1917"], ["Parasite", "Jojo Rabbit"], ["Joker", "Ford v Ferrari"]]`;  
The number of similar pairs of movies will be denoted as $E$ (the number of graph edges).  


3.	List of user's friends and for each friend a list of movies that he has already seen.  
Example: `friends = [["Joker"], ["Joker", "1917"], ["Joker"], ["Parasite"], ["1917"], ["Jojo Rabbit", "Joker"]]`;  
The number of friends will be denoted as $M$.  


We can imagine films and their similarity as a graph, in which films (movie names) are the vertices of the graph, and the similarity of films is the edges of the graph. Accordingly, a pair of similar films will be connected by an edge. Since the similarities of the films are considered equivalent by the problem statement, all nonzero edges of the graph have the same weight. The similarity of films can be considered a genre: action, horror, fantasy, drama, melodrama (not from my friends list), comedy, etc.

By the condition of the problem, we have no requirements for the fully connected components of the graph, i.e. one film can be similar to different films that are no similar each other. Or, in other words, one film can belong to several genres (for example, action movie with elements of melodrama).

Let's also assume that there are no duplicated films in the lists, there are no symmetrical pairs of similar films (i.e., each pair of similar films is counted 1 time up to symmetry), and there are no duplicated films in the friends list either.

In order to recommend one film with the highest discussability and uniqueness, we will use the algorithm described below.


# 2. Algorithm description

### Dictionary  
To store the results, let's create a dictionary with the following structure:  
`The key is the name of the movie`;  
`Value is a tuple with a set of similar films and the number of friends`.


### Step 1  
In the first step, the algorithm traverses the similarity list and builds an adjacency list for graph of films. The results are saved to the created dictionary.  
Time complexity of this step: $O(E)$  
Space complexity: $O(N+E)$

We could consider another structure – the adjacency matrix of graph, which will have space complexity $O(N^2)$. But since the components of our graph are not fully connected, then we can assume that $E << N^2$, and then the adjacency list looks more preferable in terms of space complexity.

### Step 2  
In the second step, the algorithm traverses the movie graph and builds a list of connected components of graph.
Each connected component contains a set of movies, each of which is similar to the others. To store the results, we will use a list of sets, in which each set is a connected component.  
Time complexity of this step: $O(N+E)$  
Space complexity: $O(N)$  


### Step 3  
At the third step, the algorithm traverses the list of friends and for each movie counts the number of friends who have watched this movie. We use our dictionary to store the results.  
Maximum time complexity of this step (if each friend has watched all the movies): $O(M*N)$  
Space complexity: $O(N)$


### Step 4  
At the fourth step, the algorithm traverses each connected component of the graph and executes the following operations:


1)	Calculates the number of friends who watched movies from this component.  
Time complexity: $O(N)$ (if all movies fall into the same connected component).  


2)	Calculates the length of the component.
Time complexity: $O(1)$;  


3)	For each movie in the component, the following is calculated:
- the number of friends who watched this movie;
- the average number of friends who watched similar movies for the current movie;
- the final score for the film.  

The calculations check the following conditions:
- If division by zero is present, the film is assigned a zero point.
- If the current rating of a movie is greater than the maximum rating that was calculated for previous films, then the rating of the movie is updated to the maximum.
The average time complexity is $O(1)$, since it is equal to the difference between the total number of friends and the number of friends for the evaluated movie divided by the length of the connected component of the graph minus one.


After completing this step, the algorithm returns the movie with the highest score.
Time complexity of the step: $O(N)$ (since we are watching all the movies).
Space complexity: $O(1)$.



## Time complexity

Step 1: O(E);  
Step 2: O(N+E);  
Step 3: O(M*N);  
Step 4: O(N);  


Thus, the final time complexity of the algorithm is $O(M*N + E)$.



## Space complexity

Step 1: O(N+E);  
Step 2: O(N);  
Step 3: O(N);  
Step 4: O(1);  

Thus, the final spatial complexity of the algorithm is $O(N+E)$.


# 3. Program Code

In [1]:
from copy import deepcopy

#************************** DFS Algorithm *******************************
"""
The function builds connected component of graph using DFS algorithm.
The function returns connected component of graph.
"""

def dfs_movies(movie, visited, connected_component, movie_info, start_info):
    visited.add(movie)
    connected_component.add(movie)
    
    for similar_movie in movie_info.get(movie, start_info)['similar_movies']:
        if similar_movie not in visited:
            dfs_movies(similar_movie, visited, connected_component, movie_info, start_info)
    
    return connected_component


#******************* Film Recommendation Algorithm **********************

"""
The function recommends to user the most relevant movie.

The function arguments:
1) 'movies' is a list of all movies;
2) 'similarities' is a list of pairs of similar movies;
3) 'friends' is a list of movies, that the user's friends have already seen.

The function returns the the recommended movie for user and maximum score of the movie.
"""

def Movie_Recommender(movies, similarities, friends):

    highest_relevant = ''
    max_score = -1

    start_info = {'views': 0, 'similar_movies': set()}
    movie_info = {}
    
    visited = set()
    similar_sets = []
    
    # Step 1    
    for movie1, movie2 in similarities:
        movie_info.setdefault(movie1, deepcopy(start_info))['similar_movies'].add(movie2)
        movie_info.setdefault(movie2, deepcopy(start_info))['similar_movies'].add(movie1)

    # Step 2
    for movie in movies:
        if movie not in visited:
            connected_component = set()
            similar_sets.append(dfs_movies(movie, visited, connected_component, movie_info, start_info))
            
    # Step 3            
    for views in friends:
        for movie in views:
            movie_info.setdefault(movie, deepcopy(start_info))['views'] += 1

    # Step 4
    for similar_movies in similar_sets:
        k = len(similar_movies)
        all_views = sum(map(lambda i: movie_info.get(i, start_info)['views'], similar_movies))
        for movie in similar_movies:
            score = 0
            views = movie_info.get(movie, start_info)['views']
            similar_views = all_views - views
            if similar_views:
                score = views * (k - 1) / similar_views
            if score > max_score:
                highest_relevant = movie
                max_score = score

    return highest_relevant, max_score


# 4. Tests

In [2]:
if __name__ == '__main__':
    
    movies = ["1917", "Joker", "Parasite", "Ford v Ferrari", "Jojo Rabbit"]
    similarities = [["Parasite", "1917"],
                    ["Parasite", "Jojo Rabbit"],
                    ["Joker", "Ford v Ferrari"]]
    friends = [['Joker'], ['Joker', '1917'],
               ['Joker'], ['Parasite'],
               ['1917'], ['Jojo Rabbit', 'Joker']]

    recom_movie = Movie_Recommender(movies, similarities, friends)
    print("The recommended movie:\t", recom_movie[0])
    print("Max score of the movie:\t", recom_movie[1])

The recommended movie:	 1917
Max score of the movie:	 2.0


In [3]:
if __name__ == "__main__":
    import random
    import string
    import time
    
    movies = ["Parasite", "1917", "Ford v Ferrari", "Jojo Rabbit", "Joker"]
    t = 100000
    
    for _ in range(t):
        movie = "".join(random.choice(string.ascii_letters) for j in range(random.randint(1, 10))
                    for i in range(random.randint(1, 10)))
        movies.append(movie)
    
    similarities = [["Parasite", "1917"],
                    ["Parasite", "Jojo Rabbit"],
                    ["Joker", "Ford v Ferrari"]]
    friends = [['Joker'], ['Joker', '1917'],
               ['Joker'], ['Parasite'],
               ['1917'], ['Jojo Rabbit', 'Joker']] * t

    start = time.time()
    recom_movie = Movie_Recommender(movies, similarities, friends)
    finish = time.time()
    print("The recommended movie:\t", recom_movie[0])
    print("Max score of the movie:\t", recom_movie[1])
    print(f"Seconds for {t} times:", round(finish - start, 2))


The recommended movie:	 1917
Max score of the movie:	 2.0
Seconds for 100000 times: 16.77
