<a href="https://colab.research.google.com/github/SaurabhSRP/02-Recommendation-System-Projects/blob/main/Marvel_Comics_Recommendation_System(using_TF_IDF_and_Sigmoid_Kernel).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**We will build a Recommendation System for Marvel Comics** 

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import pandas as pd
import numpy as np


In [3]:
comics_df=pd.read_csv('/content/drive/MyDrive/datasets/Recommendation system  dataset/Marvel Comics Recom/Marvel_Comics.csv')
comics_df.head()

Unnamed: 0,comic_name,active_years,issue_title,publish_date,issue_description,penciler,writer,cover_artist,Imprint,Format,Rating,Price
0,A Year of Marvels: April Infinite Comic (2016),(2016),A Year of Marvels: April Infinite Comic (2016) #1,"April 01, 2016",The Infinite Comic that will have everyone tal...,Yves Bigerel,Yves Bigerel,Jamal Campbell,Marvel Universe,Infinite Comic,Rated T+,Free
1,A Year of Marvels: August Infinite Comic (2016),(2016),A Year of Marvels: August Infinite Comic (2016...,"August 10, 2016","It’s August, and Nick Fury is just in time to ...",Jamal Campbell,"Chris Sims, Chad Bowers",,Marvel Universe,Infinite Comic,,Free
2,A Year of Marvels: February Infinite Comic (2016),(2016),A Year of Marvels: February Infinite Comic (20...,"February 10, 2016",Join us in a brand new Marvel comics adventure...,"Danilo S. Beyruth, M Mast",Ryan North,,Marvel Universe,Infinite Comic,Rated T+,Free
3,A Year of Marvels: July Infinite Comic (2016),(2016),A Year of Marvels: July Infinite Comic (2016) #1,"June 29, 2016",Celebrating the Fourth of July is complicated ...,Juanan Ramirez,Chuck Wendig,Jamal Campbell,Marvel Universe,Infinite Comic,,Free
4,A Year of Marvels: June Infinite Comic (2016),(2016),A Year of Marvels: June Infinite Comic (2016) #1,"June 15, 2016",Sam Alexander’s finding it hard to cope with t...,Diego Olortegui,Paul Allor,Jamal Campbell,Marvel Universe,Infinite Comic,,Free


##As this is a huge dataset we will be dropping 15k records so that Google collab doesnt crashes due to maximum usage of RAM

In [4]:
comics_df.drop(comics_df.tail(15000).index,inplace=True) ##Dropping 15k reocrds as google collab ram will crash

In [5]:
comics_df.shape

(19992, 12)

In [6]:
comics_df.head()['issue_description']

0    The Infinite Comic that will have everyone tal...
1    It’s August, and Nick Fury is just in time to ...
2    Join us in a brand new Marvel comics adventure...
3    Celebrating the Fourth of July is complicated ...
4    Sam Alexander’s finding it hard to cope with t...
Name: issue_description, dtype: object

#**Content Based Recommendation system**

####We will use TF-IDF method to convert the word to vector , which will help us create the matrix based on the comics description

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfv=TfidfVectorizer(min_df=3,max_features=None,strip_accents='unicode',analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1,3),stop_words='english')


In [8]:
##Fit the TF-IDF on the issue_description feature
tfv_matrix=tfv.fit_transform(comics_df['issue_description'])

In [9]:
tfv_matrix.shape

(19992, 46923)

##We will find the similarity scores using the sigmoid function to our matrix

In [10]:
from sklearn.metrics.pairwise import sigmoid_kernel

In [11]:
sig=sigmoid_kernel(tfv_matrix,tfv_matrix)

In [12]:
sig[0]

array([0.76160311, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
       0.76159416])

In [13]:
##Reverse mapping of indices and comics titles
indices=pd.Series(comics_df.index,index=comics_df['issue_title']).drop_duplicates()

In [14]:
indices

issue_title
A Year of Marvels: April Infinite Comic (2016) #1              0
A Year of Marvels: August Infinite Comic (2016) #1             1
A Year of Marvels: February Infinite Comic (2016) #1           2
A Year of Marvels: July Infinite Comic (2016) #1               3
A Year of Marvels: June Infinite Comic (2016) #1               4
                                                           ...  
Peter Porker, the Spectacular Spider-Ham (1985) #2         19987
Peter Porker, the Spectacular Spider-Ham (1985) #1         19988
Phoenix Resurrection: The Return of Jean Grey (2017) #5    19989
Phoenix Resurrection: The Return of Jean Grey (2017) #4    19990
Phoenix Resurrection: The Return of Jean Grey (2017) #3    19991
Length: 19992, dtype: int64

The reason behind doing this was that when a movie name is given it should have a unique index value such that when same index value is provided to our sigmoid ,it will show its sigmoid values

In [15]:
print(indices['Peter Porker, the Spectacular Spider-Ham (1985) #2']) 
print(sig[indices['Peter Porker, the Spectacular Spider-Ham (1985) #2']])

19987
[0.76159416 0.76159416 0.76159416 ... 0.76159416 0.76159416 0.76159416]


####We will enumerate all the sigmoid similarity scores from zero to all the values available such that even sigmoid similarity scores list will also have its own index value

In [16]:
list(enumerate(sig[indices['Peter Porker, the Spectacular Spider-Ham (1985) #2']]))

[(0, 0.7615941559557649),
 (1, 0.7615941559557649),
 (2, 0.7615941559557649),
 (3, 0.7615941559557649),
 (4, 0.7615941559557649),
 (5, 0.761594402205152),
 (6, 0.7615941559557649),
 (7, 0.7615941559557649),
 (8, 0.7615941559557649),
 (9, 0.7615941559557649),
 (10, 0.7615943032473373),
 (11, 0.7615941559557649),
 (12, 0.7615941559557649),
 (13, 0.7615941559557649),
 (14, 0.7615941559557649),
 (15, 0.7615941559557649),
 (16, 0.7615941559557649),
 (17, 0.7615941559557649),
 (18, 0.7615941559557649),
 (19, 0.7615941559557649),
 (20, 0.7615941559557649),
 (21, 0.7615941559557649),
 (22, 0.7615941559557649),
 (23, 0.7615941559557649),
 (24, 0.7615941559557649),
 (25, 0.7615941559557649),
 (26, 0.7615941559557649),
 (27, 0.7615941559557649),
 (28, 0.7615941559557649),
 (29, 0.7615941559557649),
 (30, 0.7615941559557649),
 (31, 0.7615941559557649),
 (32, 0.7615941559557649),
 (33, 0.7615941559557649),
 (34, 0.7615941559557649),
 (35, 0.7615941559557649),
 (36, 0.7615941559557649),
 (37, 0.7615

###Lets create our own function based on the above understanding 

In [17]:
def comic_name(title,sig=sig):
  #Get the index corresponding to original_title
  idx=indices[title]

  #Get the pairwise similarity scores
  sig_scores=list(enumerate(sig[idx])) 

  #sort movies in descending order of pairwise value and define the lambda function which skips [0] value as its the same as title 
  sig_scores=sorted(sig_scores,key=lambda x:x[1],reverse=True)


  #Scores of the 10 most similar movies 
  sig_scores=sig_scores[1:11]

  #comic indices
  comic_indices=[i[0] for i in sig_scores]

  #top 10 most similar movies
  return comics_df['issue_title'].iloc[comic_indices]

In [18]:
title=input('Provide the name of the comic\n')
print('\n')
comic_name(title)

Provide the name of the comic
Peter Porker, the Spectacular Spider-Ham (1985) #2




951                         Amazing Spider-Man (1999) #670
19967    Peter Parker: The Spectacular Spider-Man (2017...
5761                                 Daily Bugle (1996) #2
19668    Peter Parker, the Spectacular Spider-Man (1976...
19971    Peter Parker: The Spectacular Spider-Man Annua...
19454                              Original Sins (2014) #3
10075          Friendly Neighborhood Spider-Man (2005) #23
75               Absolute Carnage: Miles Morales (2019) #3
5182                Civil War II: Choosing Sides (2016) #4
19817    Peter Parker, the Spectacular Spider-Man (1976...
Name: issue_title, dtype: object

#Alogrithmn looks good , when we give input of a spider man comics, it is showing all relevant comics of spider-man