üîπ 1. User-based Collaborative Filtering (UBCF)

$$
\hat{r}_{u,i} = \frac{\sum_{v \in N(u)} sim(u,v) \cdot r_{v,i}}
{\sum_{v \in N(u)} |sim(u,v)|}
$$

- $\hat{r}_{u,i}$ = Predicted rating of user *u* for movie *i*  
- $N(u)$ = Set of users similar to *u*  
- $sim(u,v)$ = Similarity between user *u* and user *v*  
- $r_{v,i}$ = Rating given by user *v* for movie *i*  
üëâ Matlab: tum dekho ki jo users tumhare jaise taste rakhte hain, unhone movie i ko kaise rate kiya. Uska weighted average leke tumhare liye prediction ban gaya.

‚∏ª

üîπ 2. Item-based Collaborative Filtering (IBCF)

$$
\hat{r}_{u,i} = \frac{\sum_{j \in N(i)} sim(i,j) \cdot r_{u,j}}
{\sum_{j \in N(i)} |sim(i,j)|}
$$

- $\hat{r}_{u,i}$ = Predicted rating of user *u* for movie *i*  
- $N(i)$ = Set of movies similar to *i*  
- $sim(i,j)$ = Similarity between item *i* and item *j*  
- $r_{u,j}$ = Rating given by user *u* for movie *j*  

üëâ Matlab: agar tumne movie j dekhi aur uske similar hai movie i, to us similarity ke hisaab se hum estimate karenge ki tum movie i ko kaise rate karoge.

‚∏ª

üîπ # Hybrid Recommendation (User-based + Item-based + Popularity)

$$
FinalScore(u,i) = \alpha \cdot UserBased(u,i) +
                  \beta \cdot ItemBased(u,i) +
                  \gamma \cdot Popularity(i)
$$

- $FinalScore(u,i)$ = Final recommendation score of movie *i* for user *u*  
- $UserBased(u,i)$ = Predicted score from User-based CF  
- $ItemBased(u,i)$ = Predicted score from Item-based CF  
- $Popularity(i)$ = Popularity score of movie *i* (from global stats)  
- $\alpha, \beta, \gamma$ = Weights (decide importance of each part)  

‚∏ª

‚ö° Simple words me:
	‚Ä¢	User-based CF ‚Üí ‚ÄúLog jo tumhare jaise taste wale hain unko ye movie pasand aayi.‚Äù
	‚Ä¢	Item-based CF ‚Üí ‚ÄúTumne jo movie dekhi uske jaisi aur ye movie hai.‚Äù
	‚Ä¢	Hybrid + Popularity ‚Üí ‚ÄúAur ye wali movies sab log dekh rahe hain, tumhe bhi pasand aa sakti hai.‚Äù



In [None]:
 'import pandas as pd
import numpy as np

In [None]:
books=pd.read_csv('/content/Books.csv')
books.head()

  books=pd.read_csv('/content/Books.csv')


Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [None]:
ratings=pd.read_csv('/content/Ratings.csv')
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,155061224,5
2,276727,446520802,0
3,276729,052165615X,3
4,276729,521795028,6


In [None]:
users=pd.read_csv('/content/Users.csv')
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [None]:
df=pd.merge(books,ratings,on='ISBN').merge(users,on='User-ID')
df.head()


Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,User-ID,Book-Rating,Location,Age
0,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner,http://images.amazon.com/images/P/074322678X.0...,http://images.amazon.com/images/P/074322678X.0...,http://images.amazon.com/images/P/074322678X.0...,8,5,"timmins, ontario, canada",
1,080652121X,Hitler's Secret Bankers: The Myth of Swiss Neu...,Adam Lebor,2000,Citadel Press,http://images.amazon.com/images/P/080652121X.0...,http://images.amazon.com/images/P/080652121X.0...,http://images.amazon.com/images/P/080652121X.0...,8,0,"timmins, ontario, canada",
2,1552041778,Jane Doe,R. J. Kaiser,1999,Mira Books,http://images.amazon.com/images/P/1552041778.0...,http://images.amazon.com/images/P/1552041778.0...,http://images.amazon.com/images/P/1552041778.0...,8,5,"timmins, ontario, canada",
3,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,http://images.amazon.com/images/P/1558746218.0...,http://images.amazon.com/images/P/1558746218.0...,8,0,"timmins, ontario, canada",
4,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,http://images.amazon.com/images/P/1558746218.0...,http://images.amazon.com/images/P/1558746218.0...,3363,0,"knoxville, tennessee, usa",29.0


In [None]:
df.shape

(179091, 12)

In [None]:
df.drop(columns=['Image-URL-S','Image-URL-L','Location',"Age"],inplace=True)

In [None]:
df.dropna(inplace=True)

In [None]:
df.shape

(179088, 8)

In [None]:
df.drop_duplicates(inplace=True,keep='first')

In [None]:
df.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
0,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner,http://images.amazon.com/images/P/074322678X.0...,8,5
1,080652121X,Hitler's Secret Bankers: The Myth of Swiss Neu...,Adam Lebor,2000,Citadel Press,http://images.amazon.com/images/P/080652121X.0...,8,0
2,1552041778,Jane Doe,R. J. Kaiser,1999,Mira Books,http://images.amazon.com/images/P/1552041778.0...,8,5
3,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,8,0
4,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,3363,0


# 50 Popular books based on ratings

In [None]:
popular_books=df.groupby('ISBN')['Book-Rating'].count().reset_index()
popular_books

Unnamed: 0,ISBN,Book-Rating
0,000104687X,1
1,000104799X,2
2,000123207X,1
3,000160418X,2
4,000184251X,1
...,...,...
67366,B000234N3A,1
67367,B000234NC6,1
67368,B00029DGGO,1
67369,B0002JV9PY,1


In [None]:
popular_books=popular_books.rename(columns={'Book-Rating':'num_of_ratings'})
popular_books

Unnamed: 0,ISBN,num_of_ratings
0,000104687X,1
1,000104799X,2
2,000123207X,1
3,000160418X,2
4,000184251X,1
...,...,...
67366,B000234N3A,1
67367,B000234NC6,1
67368,B00029DGGO,1
67369,B0002JV9PY,1


In [None]:
popular_books['avg_rating']=df.groupby('ISBN')['Book-Rating'].mean().values
popular_books

Unnamed: 0,ISBN,num_of_ratings,avg_rating
0,000104687X,1,6.0
1,000104799X,2,7.5
2,000123207X,1,0.0
3,000160418X,2,3.5
4,000184251X,1,0.0
...,...,...,...
67366,B000234N3A,1,9.0
67367,B000234NC6,1,0.0
67368,B00029DGGO,1,0.0
67369,B0002JV9PY,1,0.0


In [None]:
popular_books=popular_books[popular_books['num_of_ratings'] > 250]

In [None]:
popular_books=popular_books.sort_values('avg_rating',ascending=False).head(50)
popular_books

Unnamed: 0,ISBN,num_of_ratings,avg_rating
6813,043935806X,305,5.586885
9497,059035342X,515,4.943689
16751,1400034779,390,3.348718
11208,067976402X,552,3.278986
7117,044023722X,595,3.166387
6996,044021145X,487,3.162218
7415,044651652X,330,2.99697
11454,068484477X,304,2.947368
7046,044022165X,359,2.699164
769,006101351X,327,2.406728


In [None]:
popular_books_df=popular_books.merge(df,on='ISBN').drop(columns=['User-ID','Book-Rating'],axis=1)
popular_books_df

Unnamed: 0,ISBN,num_of_ratings,avg_rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M
0,043935806X,305,5.586885,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
1,043935806X,305,5.586885,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
2,043935806X,305,5.586885,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
3,043935806X,305,5.586885,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
4,043935806X,305,5.586885,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
...,...,...,...,...,...,...,...,...
4159,006101351X,327,2.406728,The Perfect Storm : A True Story of Men Agains...,Sebastian Junger,1998,HarperTorch,http://images.amazon.com/images/P/006101351X.0...
4160,006101351X,327,2.406728,The Perfect Storm : A True Story of Men Agains...,Sebastian Junger,1998,HarperTorch,http://images.amazon.com/images/P/006101351X.0...
4161,006101351X,327,2.406728,The Perfect Storm : A True Story of Men Agains...,Sebastian Junger,1998,HarperTorch,http://images.amazon.com/images/P/006101351X.0...
4162,006101351X,327,2.406728,The Perfect Storm : A True Story of Men Agains...,Sebastian Junger,1998,HarperTorch,http://images.amazon.com/images/P/006101351X.0...


# User based similarity

In [None]:
df

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
0,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner,http://images.amazon.com/images/P/074322678X.0...,8,5
1,080652121X,Hitler's Secret Bankers: The Myth of Swiss Neu...,Adam Lebor,2000,Citadel Press,http://images.amazon.com/images/P/080652121X.0...,8,0
2,1552041778,Jane Doe,R. J. Kaiser,1999,Mira Books,http://images.amazon.com/images/P/1552041778.0...,8,5
3,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,8,0
4,1558746218,A Second Chicken Soup for the Woman's Soul (Ch...,Jack Canfield,1998,Health Communications,http://images.amazon.com/images/P/1558746218.0...,3363,0
...,...,...,...,...,...,...,...,...
179086,8401424828,"Hija de Homero, La",Robert Graves,1996,"Plaza &amp; Janes Editores, S.A.",http://images.amazon.com/images/P/8401424828.0...,249320,0
179087,1565120752,Winter Birds,Jim Grimsley,1994,Algonquin Books of Chapel Hill,http://images.amazon.com/images/P/1565120752.0...,249536,0
179088,067167806X,From Winchester to Cedar Creek: The Shenandoah...,Jeffry D. Wert,1989,Touchstone,http://images.amazon.com/images/P/067167806X.0...,249791,7
179089,1561709085,Inner Peace for Busy People: Simple Strategies...,"Joan, Ph.D. Borysenko",2001,Hay House Audio Books,http://images.amazon.com/images/P/1561709085.0...,250405,9


In [None]:
famous_books=df.groupby('User-ID')['Book-Rating'].count().reset_index()
famous_books=famous_books[famous_books['Book-Rating'] > 200]
famous_books

Unnamed: 0,User-ID,Book-Rating
1194,8890,242
1535,11601,227
1545,11676,2899
2230,16795,413
3191,23768,284
...,...,...
32785,238781,247
33184,241198,204
33419,242824,220
33858,245963,279


In [None]:
base_table=df.copy()

In [None]:
base_table=base_table[base_table['User-ID'].isin(famous_books['User-ID'])]
base_table

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
19,1841721522,New Vegetarian: Bold and Beautiful Recipes for...,Celia Brooks Brown,2001,Ryland Peters &amp; Small Ltd,http://images.amazon.com/images/P/1841721522.0...,11676,10
42,1841721522,New Vegetarian: Bold and Beautiful Recipes for...,Celia Brooks Brown,2001,Ryland Peters &amp; Small Ltd,http://images.amazon.com/images/P/1841721522.0...,182987,0
46,1841721522,New Vegetarian: Bold and Beautiful Recipes for...,Celia Brooks Brown,2001,Ryland Peters &amp; Small Ltd,http://images.amazon.com/images/P/1841721522.0...,217740,0
68,038078243X,Miss Zukas and the Raven's Dance,Jo Dereske,1996,Avon,http://images.amazon.com/images/P/038078243X.0...,190925,0
69,038078243X,Miss Zukas and the Raven's Dance,Jo Dereske,1996,Avon,http://images.amazon.com/images/P/038078243X.0...,242824,0
...,...,...,...,...,...,...,...,...
178865,082177560X,Can't Cay Goodbye,Janet Dailey,2002,Zebra,http://images.amazon.com/images/P/082177560X.0...,242824,0
178866,1555472451,Swansdowne,Daniel Farson,1988,Critics Choice Paperbacks/Lorevan Publishing,http://images.amazon.com/images/P/1555472451.0...,242824,0
178867,1557736189,A Captain's Lady (Regency Romance),Helen Argers,1991,Diamond Books (NY),http://images.amazon.com/images/P/1557736189.0...,242824,0
178868,1557736758,Run Wild My Heart,Maureen Child,1992,Berkley Pub Group (Mm),http://images.amazon.com/images/P/1557736758.0...,242824,0


In [None]:
reader_user=df.groupby('ISBN')['Book-Rating'].count().reset_index()
reader_user=reader_user[reader_user['Book-Rating'] > 50]
reader_user

Unnamed: 0,ISBN,Book-Rating
69,000649840X,80
111,002026478X,62
153,002542730X,153
271,006000438X,58
287,006001203X,51
...,...,...
44049,1931561648,67
50399,3257229534,52
51930,3423202327,53
53763,3442541751,56


In [None]:
base_table=base_table[base_table['ISBN'].isin(reader_user['ISBN'])]
base_table.drop_duplicates(keep='first')
base_table.shape

(2052, 8)

In [None]:
base_table

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
76,042518630X,Purity in Death,J.D. Robb,2002,Berkley Publishing Group,http://images.amazon.com/images/P/042518630X.0...,278418,7
79,042518630X,Purity in Death,J.D. Robb,2002,Berkley Publishing Group,http://images.amazon.com/images/P/042518630X.0...,11676,10
86,042518630X,Purity in Death,J.D. Robb,2002,Berkley Publishing Group,http://images.amazon.com/images/P/042518630X.0...,23768,0
93,042518630X,Purity in Death,J.D. Robb,2002,Berkley Publishing Group,http://images.amazon.com/images/P/042518630X.0...,35859,0
98,042518630X,Purity in Death,J.D. Robb,2002,Berkley Publishing Group,http://images.amazon.com/images/P/042518630X.0...,52584,0
...,...,...,...,...,...,...,...,...
80383,1551669390,Dark Water (Mira Romantic Suspense),Sharon Sala,2002,Mira,http://images.amazon.com/images/P/1551669390.0...,175003,0
80384,1551669390,Dark Water (Mira Romantic Suspense),Sharon Sala,2002,Mira,http://images.amazon.com/images/P/1551669390.0...,182085,0
80387,1551669390,Dark Water (Mira Romantic Suspense),Sharon Sala,2002,Mira,http://images.amazon.com/images/P/1551669390.0...,190925,0
80391,1551669390,Dark Water (Mira Romantic Suspense),Sharon Sala,2002,Mira,http://images.amazon.com/images/P/1551669390.0...,212898,0


In [None]:
user_vs_books=base_table.pivot_table(index='User-ID',columns='ISBN',values='Book-Rating')
user_vs_books

ISBN,000649840X,002026478X,002542730X,006000438X,006001203X,006016848X,006019491X,006092988X,006098824X,006099486X,...,1576737330,1592400876,1857022424,1878424319,1885171080,1931561648,3257229534,3423202327,3442541751,3492045170
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11601,,,,,,,,0.0,,,...,,0.0,,,,,,,,
11676,8.0,,6.0,6.0,0.0,9.0,10.0,9.0,5.0,10.0,...,,9.0,8.0,,,10.0,0.0,8.0,7.0,9.0
16795,,,0.0,,8.0,,,,,,...,,,,,,,,,,
23768,,,,,0.0,0.0,0.0,,,,...,0.0,,,,0.0,,,,,
23902,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238781,,,,,,,0.0,,,,...,0.0,,,,,,,,,
241198,,,,,,,,,,,...,,,,,,,0.0,,,
242824,,0.0,,,,,,,,,...,,,,,,,,,,
245963,,0.0,,,,0.0,,,,,...,,,,,,,,,,


In [None]:
user_vs_books_filled=user_vs_books.fillna(0)
user_vs_books_filled

ISBN,000649840X,002026478X,002542730X,006000438X,006001203X,006016848X,006019491X,006092988X,006098824X,006099486X,...,1576737330,1592400876,1857022424,1878424319,1885171080,1931561648,3257229534,3423202327,3442541751,3492045170
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11601,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11676,8.0,0.0,6.0,6.0,0.0,9.0,10.0,9.0,5.0,10.0,...,0.0,9.0,8.0,0.0,0.0,10.0,0.0,8.0,7.0,9.0
16795,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
23768,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
23902,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238781,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
241198,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
242824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
245963,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


// calculate user similarity

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
user_similarity=cosine_similarity(user_vs_books_filled)
user_similarity

array([[1.        , 0.11692108, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.11692108, 1.        , 0.23329212, ..., 0.        , 0.        ,
        0.1495098 ],
       [0.        , 0.23329212, 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.1495098 , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [None]:
user_similarity.shape

(62, 62)

In [None]:
user_similarity_df=pd.DataFrame(user_similarity,index=user_vs_books.index,columns=user_vs_books.index)
user_similarity_df

User-ID,11601,11676,16795,23768,23902,26544,35859,36606,36836,52584,...,230522,231210,232131,234623,235105,238781,241198,242824,245963,278418
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11601,1.000000,0.116921,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.646162,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000
11676,0.116921,1.000000,0.233292,0.111996,0.178214,0.086725,0.211210,0.184504,0.0,0.210782,...,0.098003,0.135990,0.206883,0.0,0.130756,0.093537,0.0,0.0,0.0,0.14951
16795,0.000000,0.233292,1.000000,0.000000,0.071355,0.000000,0.069568,0.079858,0.0,0.066761,...,0.000000,0.000000,0.167408,0.0,0.000000,0.238165,0.0,0.0,0.0,0.00000
23768,0.000000,0.111996,0.000000,1.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000
23902,0.000000,0.178214,0.071355,0.000000,1.000000,0.000000,0.070324,0.000000,0.0,0.000000,...,0.154743,0.000000,0.112224,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238781,0.000000,0.093537,0.238165,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.237826,0.0,0.000000,1.000000,0.0,0.0,0.0,0.00000
241198,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000
242824,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000
245963,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000


In [None]:
book_similarity_df=pd.DataFrame(cosine_similarity(user_vs_books_filled.T),index=user_vs_books.columns,columns=user_vs_books.columns)
book_similarity_df

ISBN,000649840X,002026478X,002542730X,006000438X,006001203X,006016848X,006019491X,006092988X,006098824X,006099486X,...,1576737330,1592400876,1857022424,1878424319,1885171080,1931561648,3257229534,3423202327,3442541751,3492045170
ISBN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000649840X,1.000000,0.0,0.514496,1.000000,0.0,1.000000,1.000000,1.000000,1.000000,0.743294,...,0.0,0.511992,1.000000,0.0,0.0,0.633724,0.0,1.000000,1.000000,1.000000
002026478X,0.000000,1.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000
002542730X,0.514496,0.0,1.000000,0.514496,0.0,0.514496,0.514496,0.514496,0.514496,0.382422,...,0.0,0.263418,0.514496,0.0,0.0,0.326048,0.0,0.514496,0.514496,0.514496
006000438X,1.000000,0.0,0.514496,1.000000,0.0,1.000000,1.000000,1.000000,1.000000,0.743294,...,0.0,0.511992,1.000000,0.0,0.0,0.633724,0.0,1.000000,1.000000,1.000000
006001203X,0.000000,0.0,0.000000,0.000000,1.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1931561648,0.633724,0.0,0.326048,0.633724,0.0,0.633724,0.633724,0.633724,0.633724,0.471044,...,0.0,0.324462,0.633724,0.0,0.0,1.000000,0.0,0.633724,0.633724,0.633724
3257229534,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,1.0,0.000000,0.000000,0.000000
3423202327,1.000000,0.0,0.514496,1.000000,0.0,1.000000,1.000000,1.000000,1.000000,0.743294,...,0.0,0.511992,1.000000,0.0,0.0,0.633724,0.0,1.000000,1.000000,1.000000
3442541751,1.000000,0.0,0.514496,1.000000,0.0,1.000000,1.000000,1.000000,1.000000,0.743294,...,0.0,0.511992,1.000000,0.0,0.0,0.633724,0.0,1.000000,1.000000,1.000000


## Approach of recomend function

1. Input Cases Handle Karna
	‚Ä¢	Case 1 (User-based Hybrid): Agar user_id diya hai ‚Üí user ke liye recommend karna.
	‚Ä¢	Case 2 (Item-based): Agar book_name diya hai ‚Üí similar books recommend karna.
	‚Ä¢	Case 3 (Popularity-based): Agar dono nahi hai ‚Üí top popular books return karna.

‚∏ª

2. User-based Part
	‚Ä¢	Target user ke unrated books identify karo.
	‚Ä¢	Har unrated book ke liye:
	‚Ä¢	Check kaun users ne us book ko rate kiya hai.
	‚Ä¢	Similarity (user ‚Üî other users) √ó rating ka dot product nikalo.
	‚Ä¢	Normalize by sum of similarities.
	‚Ä¢	Ye deta hai user-based score.

‚∏ª

3. Item-based Part
	‚Ä¢	Target user ke rated books le lo.
	‚Ä¢	Har unrated book ke liye:
	‚Ä¢	Unrated book ka similarity vector with rated books nikalo.
	‚Ä¢	Dot product of (similarities √ó ratings).
	‚Ä¢	Normalize by sum of similarities.
	‚Ä¢	Ye deta hai item-based score.

‚∏ª

4. Hybrid Combination
	‚Ä¢	Final score = Œ± √ó user_based + Œ≤ √ó item_based
	‚Ä¢	Default: Œ± = Œ≤ = 0.5 (equal weight).

‚∏ª

5. Ranking
	‚Ä¢	Scores ko sort descending order me rakho.
	‚Ä¢	Top-N books recommend karo.

In [None]:
def recommend(user_id=None, book_name=None, alpha=0.6572, top_n=5):
    """
    Optimized Hybrid Recommendation System
    alpha = user-based weight
    (1 - alpha) = item-based weight
    """
    beta = 1 - alpha  # automatic set

    # Case 1: User-based Hybrid
    if user_id is not None and user_id in user_vs_books.index:
        user_ratings = user_vs_books.loc[user_id]

        # Unrated books (NaN)
        unrated_books = user_ratings[user_ratings.isna()].index

        # Similar users vector
        sim_users = user_similarity_df[user_id]

        scores = {}

        for book in unrated_books:
            # ----- User-based part -----
            rated_by = user_vs_books[book].dropna()
            if not rated_by.empty:
                numerator = np.dot(sim_users[rated_by.index], rated_by.values)
                denominator = sim_users[rated_by.index].sum()
                user_based_score = numerator / denominator if denominator > 0 else 0
            else:
                user_based_score = 0

            # ----- Item-based part -----
            rated_books = user_ratings.dropna()
            if not rated_books.empty:
                sim_books = book_similarity_df.loc[book, rated_books.index]
                numerator = np.dot(sim_books.values, rated_books.values)
                denominator = sim_books.sum()
                item_based_score = numerator / denominator if denominator > 0 else 0
            else:
                item_based_score = 0

            # ----- Hybrid score -----
            final_score = alpha * user_based_score + beta * item_based_score
            scores[book] = final_score

        ranked_books = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
        return [b for b, s in ranked_books]

    # Case 2: Item-based (if book given)
    elif book_name is not None and book_name in book_similarity_df.index:
        return book_similarity_df.loc[book_name].nlargest(top_n+1).iloc[1:].index.tolist()

    # Case 3: Popularity-based
    else:
        return popular_books['ISBN'].head(top_n).tolist()

In [None]:
recommend(book_name='000649840')

['043935806X', '059035342X', '1400034779', '067976402X', '044023722X']

In [None]:
!pip install optuna



In [None]:
import numpy as np
import optuna
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# --------------------------------
# Dummy dataset (replace with your ratings/features data)
# Suppose X = features (user+item), y = ratings
X = np.random.rand(1000, 10)   # 1000 samples, 10 features (replace with actual data)
y = np.random.rand(1000) * 5   # ratings between 0-5

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --------------------------------
# User-based score (dummy function for demo)
def compute_user_based(X_train, y_train, X_test, alpha):
    # Apna user-based similarity logic yaha lagana
    return np.random.rand(len(X_test))

# Content-based score (dummy function for demo)
def compute_content_based(X_train, y_train, X_test, alpha):
    # Apna content-based similarity logic yaha lagana
    return np.random.rand(len(X_test))

# RMSE function
def rmse(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))

# --------------------------------
# Optuna objective
def objective(trial):
    alpha = trial.suggest_float("alpha", 0.0, 1.0)

    user_based_scores = compute_user_based(X_train, y_train, X_test, alpha)
    content_based_scores = compute_content_based(X_train, y_train, X_test, alpha)

    # Blend
    final_scores = alpha * user_based_scores + (1 - alpha) * content_based_scores

    # RMSE
    error = rmse(y_test, final_scores)
    return error

# --------------------------------
# Run Optuna
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=30)

print("Best Params:", study.best_params)
print("Best RMSE:", study.best_value)

[I 2025-08-26 22:54:14,441] A new study created in memory with name: no-name-12c6bdf2-c46f-40b0-ba6d-a31fe742bdb5
[I 2025-08-26 22:54:14,446] Trial 0 finished with value: 2.509014786321677 and parameters: {'alpha': 0.3926202320969939}. Best is trial 0 with value: 2.509014786321677.
[I 2025-08-26 22:54:14,450] Trial 1 finished with value: 2.475942567219347 and parameters: {'alpha': 0.831411986877625}. Best is trial 1 with value: 2.475942567219347.
[I 2025-08-26 22:54:14,453] Trial 2 finished with value: 2.460242058974679 and parameters: {'alpha': 0.9418148906166721}. Best is trial 2 with value: 2.460242058974679.
[I 2025-08-26 22:54:14,456] Trial 3 finished with value: 2.451256260261807 and parameters: {'alpha': 0.7654065883911912}. Best is trial 3 with value: 2.451256260261807.
[I 2025-08-26 22:54:14,459] Trial 4 finished with value: 2.466981244092202 and parameters: {'alpha': 0.21761380665676944}. Best is trial 3 with value: 2.451256260261807.
[I 2025-08-26 22:54:14,463] Trial 5 finis

Best Params: {'alpha': 0.6572048288931307}
Best RMSE: 2.435899685891245


In [None]:
from joblib import dump

# User similarity
dump(user_similarity_df, 'user_similarity.joblib')

# Book/Item similarity
dump(book_similarity_df, 'book_similarity.joblib')

# User vs Books pivot table
dump(user_vs_books, 'user_vs_books.joblib')

# Popular_books_df table
dump(popular_books, 'popular_books.joblib')

['popular_books.joblib']