# Resources

Sources:

1. [AnalyticsVidhya](https://www.analyticsvidhya.com/blog/2021/07/recommendation-system-understanding-the-basic-concepts/)
2. [Medium](https://medium.com/@prateekgaurav/step-by-step-content-based-recommendation-system-823bbfd0541c) 

# Imports

In [1]:
import os

import pandas as pd
import numpy as np

import torch
import torch.nn as nn

# Dataset

Source: [Kaggle](https://www.kaggle.com/datasets/crxxom/manhwa-dataset)

In [2]:
path_to_dataset = os.path.join(os.getcwd(), "Dataset\manhwa_mal.csv\manhwa_mal.csv")
path_to_dataset

'd:\\Projects\\Recommender\\Dataset\\manhwa_mal.csv\\manhwa_mal.csv'

In [3]:
data = pd.read_csv(path_to_dataset)
data.head()

Unnamed: 0.1,Unnamed: 0,type,title,chapters,status,genres,favorites,popularity,rank,score,members,synopsis,volumns,authors,publish_time
0,0,manhwa,Solo Leveling,201,Finished,"Action,Adventure,Fantasy",40014,#7,#56,8.68,431289,"Ten years ago, ""the Gate"" appeared and connect...",Unknown,"Chugong (Story), Jang, Sung-rak (Art), Discipl...","Mar 4, 2018 to May 31, 2023"
1,1,manhwa,The Horizon,21,Finished,"Adventure,Drama",4047,#187,#58,8.67,75806,"In a world ravaged by war, a young boy walks d...",3,"Jeong, Ji-Hoon (Story & Art)","Mar 30, 2016 to Jul 21, 2016"
2,2,manhwa,Wind Breaker,Unknown,Publishing,"Action,Drama,Sports",2688,#368,#94,8.58,42434,"Burdened with expectations since childhood, se...",Unknown,"Jo, Yongseok (Story & Art)","Dec 15, 2013 to ?"
3,3,manhwa,Bastard,94,Finished,"Drama,Horror,Mystery,Romance",6455,#84,#140,8.5,126088,There is nowhere that Seon Jin can find solace...,5,"Kim, Carnby (Story), Hwang, Young-chan (Art)","Jul 4, 2014 to May 6, 2016"
4,4,manhwa,Who Made Me a Princess,125,Finished,"Comedy,Fantasy,Romance",2648,#349,#175,8.44,44428,"In the novel The Lovely Princess, the secondar...",9,"Plutus (Story), Spoon (Art)","Dec 20, 2017 to Apr 30, 2022"


In [8]:
data.columns

Index(['Unnamed: 0', 'type', 'title', 'chapters', 'status', 'genres',
       'favorites', 'popularity', 'rank', 'score', 'members', 'synopsis',
       'volumns', 'authors', 'publish_time'],
      dtype='object')

In [9]:
titles = data['title']
genres = data['genres'].apply(lambda x: x.split(','))

titles

0                Solo Leveling
1                  The Horizon
2                 Wind Breaker
3                      Bastard
4       Who Made Me a Princess
                 ...          
2938           Lessons in Lust
2939            Cradle of Imae
2940        The Desire to Kill
2941                   Bad Boy
2942             Core Scramble
Name: title, Length: 2943, dtype: object

In [10]:
genres

0                         [Action, Adventure, Fantasy]
1                                   [Adventure, Drama]
2                              [Action, Drama, Sports]
3                    [Drama, Horror, Mystery, Romance]
4                           [Comedy, Fantasy, Romance]
                             ...                      
2938                                   [Love, Erotica]
2939                          [Love, Fantasy, Erotica]
2940                                         [Erotica]
2941                            [Love, Drama, Erotica]
2942    [Action, Love, Comedy, Drama, Sci-Fi, Erotica]
Name: genres, Length: 2943, dtype: object

In [15]:
title_map = {}
def title_mapper(t):
    global title_map
    if t in title_map: title_map[t] += 1
    else: title_map[t] = 1

titles.apply(title_mapper)
title_map

{'Solo Leveling': 1,
 'The Horizon': 1,
 'Wind Breaker': 1,
 'Bastard': 1,
 'Who Made Me a Princess': 1,
 'The Boxer': 1,
 'The Breaker': 1,
 'Eleceed': 1,
 'Tower of God': 1,
 'Omniscient Reader': 1,
 'The Legend of the Northern Blade': 1,
 'The Breaker: New Waves': 1,
 'Seasons of Blossom': 1,
 'After School Lessons for Unripe Apples': 1,
 'Annarasumanara': 1,
 'Sweet Home': 1,
 'Villains Are Destined to Die': 1,
 'Daytime Star': 1,
 'Spirit Fingers': 1,
 'The Greatest Estate Developer': 1,
 'Your Throne': 1,
 'Return of the Blossoming Blade': 1,
 'I Shall Master This Family': 1,
 'Something About Us': 1,
 "King's Maker": 1,
 "Can't See Can't Hear But Love": 1,
 'Cheese in the Trap Season 4': 1,
 "Why Raeliana Ended Up at the Duke's Mansion": 1,
 'Cheese in the Trap Season 3': 1,
 'Noblesse': 1,
 'See You in My 19th Life': 1,
 'Her Tale of Shim Chong': 1,
 'Vagrant Soldier Ares': 1,
 'Your Letter': 1,
 "A Stepmother's Märchen": 1,
 'Cheese in the Trap Season 2': 1,
 'Let Dai': 1,
 'T

In [17]:
genre_map = {}
def genre_mapper(t):
    global genre_map
    for i in t:
        if i in genre_map: genre_map[i] += 1
        else: genre_map[i] = 1

genres.apply(genre_mapper)
genre_map

{'Action': 667,
 'Adventure': 240,
 'Fantasy': 1093,
 'Drama': 757,
 'Sports': 21,
 'Horror': 70,
 'Mystery': 94,
 'Romance': 1187,
 'Comedy': 527,
 'Suspense': 77,
 'Life': 147,
 'Love': 354,
 'Supernatural': 346,
 'Sci-Fi': 91,
 'Ecchi': 42,
 'Unknown': 59,
 'Gourmet': 16,
 'Erotica': 332}

In [34]:
data_c = data['genres'].apply(lambda x: x.split(','))
data_c = pd.DataFrame.join(data['title'], data_c)
data_c

Unnamed: 0,title,genres
0,Solo Leveling,"[Action, Adventure, Fantasy]"
1,The Horizon,"[Adventure, Drama]"
2,Wind Breaker,"[Action, Drama, Sports]"
3,Bastard,"[Drama, Horror, Mystery, Romance]"
4,Who Made Me a Princess,"[Comedy, Fantasy, Romance]"
...,...,...
2938,Lessons in Lust,"[Love, Erotica]"
2939,Cradle of Imae,"[Love, Fantasy, Erotica]"
2940,The Desire to Kill,[Erotica]
2941,Bad Boy,"[Love, Drama, Erotica]"


In [35]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.base import BaseEstimator, TransformerMixin

class MultiLabelBinarizerTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.mlb = MultiLabelBinarizer()
    
    def fit(self, X, y=None):
        self.mlb.fit(X)
        return self
    
    def transform(self, X):
        return self.mlb.transform(X)
    
    def get_feature_names_out(self, input_features=None):
        return self.mlb.classes_

In [37]:
pipeline = Pipeline(steps=[
    ('one_hot_encode_genres', MultiLabelBinarizerTransformer())
])

encoded_genres = pipeline.fit_transform(data_c['genres'])
encoded_genres_df = pd.DataFrame(encoded_genres, columns=pipeline.named_steps['one_hot_encode_genres'].get_feature_names_out())
encoded_genres_df

Unnamed: 0,Action,Adventure,Comedy,Drama,Ecchi,Erotica,Fantasy,Gourmet,Horror,Life,Love,Mystery,Romance,Sci-Fi,Sports,Supernatural,Suspense,Unknown
0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0
4,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2938,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
2939,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0
2940,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2941,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0


In [38]:
result_df = pd.concat([data_c['title'], encoded_genres_df], axis=1)
result_df

Unnamed: 0,title,Action,Adventure,Comedy,Drama,Ecchi,Erotica,Fantasy,Gourmet,Horror,Life,Love,Mystery,Romance,Sci-Fi,Sports,Supernatural,Suspense,Unknown
0,Solo Leveling,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,The Horizon,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Wind Breaker,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,Bastard,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0
4,Who Made Me a Princess,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2938,Lessons in Lust,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
2939,Cradle of Imae,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0
2940,The Desire to Kill,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2941,Bad Boy,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0


In [63]:
# Remove duplicates
result_df_no_duplicates = result_df.drop_duplicates()

# Set movie_title as the index
result_df_no_duplicates.set_index('title', inplace=True)

# Display the final DataFrame
result_df_no_duplicates


Unnamed: 0_level_0,Action,Adventure,Comedy,Drama,Ecchi,Erotica,Fantasy,Gourmet,Horror,Life,Love,Mystery,Romance,Sci-Fi,Sports,Supernatural,Suspense,Unknown
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Solo Leveling,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
The Horizon,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Wind Breaker,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0
Bastard,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0
Who Made Me a Princess,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Lessons in Lust,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
Cradle of Imae,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0
The Desire to Kill,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
Bad Boy,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0


In [64]:
from sklearn.metrics.pairwise import cosine_similarity
cs = cosine_similarity(result_df_no_duplicates)
cs

array([[1.        , 0.40824829, 0.33333333, ..., 0.        , 0.        ,
        0.23570226],
       [0.40824829, 1.        , 0.40824829, ..., 0.        , 0.40824829,
        0.28867513],
       [0.33333333, 0.40824829, 1.        , ..., 0.        , 0.33333333,
        0.47140452],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.57735027,
        0.40824829],
       [0.        , 0.40824829, 0.33333333, ..., 0.57735027, 1.        ,
        0.70710678],
       [0.23570226, 0.28867513, 0.47140452, ..., 0.40824829, 0.70710678,
        1.        ]])

In [65]:
f = pd.DataFrame(cs, index=result_df_no_duplicates.index, columns=result_df_no_duplicates.index)
f

title,Solo Leveling,The Horizon,Wind Breaker,Bastard,Who Made Me a Princess,The Boxer,The Breaker,Eleceed,Tower of God,Omniscient Reader,...,How Sweet Is a Sugar Daddy?,Dispar,Behind the Scenes,Red Fox,The Rabbit Hole,Lessons in Lust,Cradle of Imae,The Desire to Kill,Bad Boy,Core Scramble
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Solo Leveling,1.000000,0.408248,0.333333,0.000000,0.333333,0.000000,0.333333,0.577350,0.774597,1.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.333333,0.000000,0.000000,0.235702
The Horizon,0.408248,1.000000,0.408248,0.353553,0.000000,0.500000,0.408248,0.000000,0.632456,0.408248,...,0.500000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.408248,0.288675
Wind Breaker,0.333333,0.408248,1.000000,0.288675,0.000000,0.816497,0.666667,0.577350,0.516398,0.333333,...,0.408248,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.333333,0.471405
Bastard,0.000000,0.353553,0.288675,1.000000,0.288675,0.353553,0.288675,0.000000,0.447214,0.000000,...,0.353553,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.288675,0.204124
Who Made Me a Princess,0.333333,0.000000,0.000000,0.288675,1.000000,0.000000,0.333333,0.000000,0.258199,0.333333,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.333333,0.000000,0.000000,0.235702
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Lessons in Lust,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.500000,1.000000,1.000000,0.500000,0.707107,1.000000,0.816497,0.707107,0.816497,0.577350
Cradle of Imae,0.333333,0.000000,0.000000,0.000000,0.333333,0.000000,0.000000,0.000000,0.258199,0.333333,...,0.408248,0.816497,0.816497,0.408248,0.577350,0.816497,1.000000,0.577350,0.666667,0.471405
The Desire to Kill,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.707107,0.707107,0.707107,0.707107,1.000000,0.707107,0.577350,1.000000,0.577350,0.408248
Bad Boy,0.000000,0.408248,0.333333,0.288675,0.000000,0.408248,0.333333,0.000000,0.258199,0.000000,...,0.816497,0.816497,0.816497,0.408248,0.577350,0.816497,0.666667,0.577350,1.000000,0.707107


In [77]:
idx = f.index.get_loc('Lessons in Lust')

recommends = f.iloc[idx].sort_values(ascending=False)[1:11]
print("Recommendations are:")
for i in recommends:
    print(i)

Recommendations are:
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998
0.9999999999999998


In [78]:
print(recommends)

title
BL Motel                      1.0
Quit Writing, Dear Author!    1.0
Reversal                      1.0
On or Off                     1.0
Penthouse XXX                 1.0
Forging Rock'n Roll           1.0
Paid                          1.0
Keep a Dog                    1.0
Pheromone Phobia              1.0
Do You Still Like Me?         1.0
Name: Lessons in Lust, dtype: float64


In [83]:
name = input("Enter manhwa name:")
idx = f.index.get_loc(name)
idx

2723

In [84]:
data['synopsis'].iloc[2723]

"After a major screw-up at work, Hyunho is pretty much ready to give up on life. What he receives instead of punishment, however, is a delectable strawberry cream custard tart. Thinking it's his last meal alive, Hyunho gobbles it up, licking every last dollop of cream off of Seung-yun's fingers. This lights a certain spark within Seung-yun, which gets him thinking about all the other wild things he could make Hyunho do with his tongue...\n\r\n(Source: Lezhin Entertainment)"

In [85]:
save_path = os.path.join(os.getcwd(), "Similarity_scores.csv")
f.to_csv(save_path)