### Let's say you have a small dataset of movies and we want to recommend movies similar to a given movie based on genres. cosine similarity will be used to find the similarity between movies.

## Cosine Similarity:
Cosine similarity measures the angle between two vectors in a multi-dimensional space. It's often used in text analysis to find similarity between docuemnts and paragraphs. The formula for cosine similarity between two vectors $A$ and $B$ is: <br>
$Cosine Similarity = \frac{A * B}{||A|| * ||B||}$ <br>
- $A * B$ is the dot product of $A$ and $B$ <br>
- $||A||$ and $||B||$ are magnitudes (lengths) of vectors $A$ and $B$<br>
Cosine similarity ranges from -1 to 1, but in text analysis it can be from 0 to 1, where the closer to 1 the higher the similarity.

## Imports:

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import pandas as pd # DataFrame structure and analysis tools

## Generate Data:

In [3]:
movies = pd.DataFrame({ # Type of ready-made Data Structure courtesy of pandas.
    'title': ['Star Wars', 'The Matrix', 'Avatar', 'Inception', 'The Avengers'],
    'genres': ['Sci-Fi', 'Sci-Fi', 'Sci-Fi', 'Sci-Fi', 'Action']
})

## Text Vectorization:

In [7]:
# Term Frequency-Inverse Document Frequency. 
# Way to represent text by considering how important each term is in a specific document 
# relative to all other documents. A weight is assigned to each term/word.
vectorizer = TfidfVectorizer() 

tfidf_matrix = vectorizer.fit_transform(movies['genres'])

## Compute Similarities:

In [5]:
# calculates the cosine similarity between the TF-IDF vectors.
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix) 

## Make Recommendations:

In [6]:
def get_recommendations(title):
    # finding the index of the movie from the DataFrame that matches the given title.
    idx = movies.index[movies['title'] == title].tolist()[0]
    # gets the list of cosine similarity scores between the movie at idx and all other movies.
    # stored in the format (index, score).
    sim_scores = list(enumerate(cosine_sim[idx]))
    # sort the list based on similarity scores in descending order (most similar at top).
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # slice the list to only get top 2 results.
    sim_scores = sim_scores[1:3]
    # extract indices.
    movie_indices = [i[0] for i in sim_scores]
    # return the titles of the most similar movies 
    return movies['title'].iloc[movie_indices]

# Test the function
print(get_recommendations('Star Wars'))

1    The Matrix
2        Avatar
Name: title, dtype: object
