<h1 align="center">Electronics Recommender System</h1>

### Introduction to Recommender Systems: Addressing Information Overload
We live in an era saturated with content, where the sheer volume of movies, news articles, shopping products, and websites overwhelms individual attention spans. The average Google search yields over a million results, yet how often do we venture beyond the first page of links? This phenomenon, known as the "long tail problem," highlights how a small fraction of content receives disproportionate attention, while the majority remains undiscovered.

In the face of this challenge, service providers must ask: "How do I curate a manageable selection of content for users that is both relevant and desired?" Thankfully, decades of research have produced a solution: recommender systems.

Understanding Recommender Systems
Recommender systems predict a user's preference for an item, allowing service providers to offer a tailored selection of content, thereby enhancing user engagement and broadening content exploration.

Fundamental Concepts
Terminology: Users, Items, and Ratings
In the realm of recommender systems, two primary entities exist: Users and Items.

Items are the content being consumed—movies, articles, products, etc. They remain passive, with fixed properties.
Users interact with these items, providing ratings based on their preferences. Ratings can be explicit (e.g., giving a movie a star rating) or implicit (e.g., watching a movie without rating it directly).
Implementing Content-Based Filtering: An Example
Let's delve into one of the primary methods employed in recommender systems: content-based filtering. In this context, we'll focus on building an "Electronics Recommender System."






## Measuring Similarity 

<br></br>

<div align="center" style="width: 600px; font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Cosine_similarity.jpg"
     alt="Cosine Similarity "
     style="float: center; padding-bottom=0.5em"
     width=600px/>
Measuring the similarity between the ratings of two users (A) and (B) for the books 'Harry Potter and the Philosopher's Stone' and 'The Diary of a Young Girl', using the Cosine similarity metric.  
</div>


Having learnt about the entities which exist within recommender systems, we may wonder how they function. While this is something that we'll learn throughout this entire train, one fundamental principal that we need to understand is that recommender systems are built up by utilising the _relations_ which  exist between items and users. As such, these systems always need a mechanism to measure how related or _similar_ a user is to another user, or an item is to another item. 

We accomplish this measurement of similarity through, you guessed it, a _similarity metric_.  

Generally speaking, a similarity metric can be thought of as being the inverse of a distance measure: if two things are considered to be very similar they should be assigned a high similarity value (close to 1), while dissimilar items should receive a low similarity value (close to zero). Other [important properties](https://online.stat.psu.edu/stat508/lesson/1b/1b.2/1b.2.1) include:
 - (Symmetry) $Sim(A,B) = Sim(B,A)$ 
 - (Identity) $Sim(A,A) = 1$
 - (Uniqueness) $Sim(A,B) = 1 \leftrightarrow A = B$
 
While there are many similarity metrics to choose from when building a recommender system (and more than one can certainly be used simultaneously), a popular choice is the **Cosine similarity**. We won't go into the fundamental trig here (we hope that you remember this from high school), but recall that as an angle becomes smaller (approaching $0^o$) the value of its cosine increases. Conversely, as the angle increases the cosine value decreases. It turns out that this behavior makes the cosine of the angle between two p-dimensional vectors desirable as a [similarity metric](https://en.wikipedia.org/wiki/Cosine_similarity) which can easily be computed.

Using the figure above to help guide our understanding, the Cosine similarity between two p-dimensional vectors ${A}$ and $B$ can be given as:

$$ \begin{align}
Sim(A,B)  &= \frac{A \cdot B}{||A|| \times ||B||} \\ \\
& = \frac{\sum_{i=1}^{p}A_{i}B_{i}}{\sqrt{{\sum_{i=1}^{p}A_{i}^2}} \sqrt{\sum_{i=1}^{p}B_{i}^2}}, \\
\end{align} $$ 
  

To make things a little more concrete, let's work out the cosine similarity using our provided example above. Here, each vector represents the ratings given by one of two *users*, $A$ and $B$, who have each rated two books (rating#1 $ \rightarrow r_1$, and rating#2 $ \rightarrow r_2$). To work out how similar these two users are based on their supplied ratings, we can use the Cosine similarity definition as follows:   


$$ \begin{align}
Sim(A,B)  & = \frac{(A_{r1} \times B_{r1})+(A_{r2} \times B_{r2})}{\sqrt{A_{r1}^2 + A_{r2}^2} \times \sqrt{B_{r1}^2 + B_{r2}^2}} \\ \\
& = \frac{(3 \times 5) + (4 \times 2)}{\sqrt{9 + 16} \times \sqrt{25 + 4}} \\ \\
& = \frac{23}{26.93} \\ \\
& = 0.854
\end{align} $$

It would be a pain to work this out manually each time! Thankfully, we can obtain this same result using the `cosine_similarity` function provided to us in `sklearn`. 

As usual before we can go ahead and use this function we need to import the libraries that we will need.  

In [None]:
##Importing Libraries


# Import our regular old heroes 
import numpy as np
import pandas as pd
import scipy as sp # <-- The sister of Numpy, used in our code for numerical efficientcy. 
import matplotlib.pyplot as plt
import seaborn as sns

# Entity featurization and similarity computation
from sklearn.metrics.pairwise import cosine_similarity 
from sklearn.feature_extraction.text import TfidfVectorizer
from surprise import SVD, Reader, Dataset

# Libraries used during sorting procedures.
import operator # <-- Convienient item retrieval during iteration 
import heapq # <-- Efficient sorting of large lists

# Imported for our sanity
import warnings
warnings.filterwarnings('ignore')

: 