<a href="https://colab.research.google.com/github/aditisinghq/book-talk/blob/main/colab_filter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**using KNN in item based collaborative filtering**
finding similarities between items i.e books based on the ratings they've been given


In [None]:
!pip install fuzzywuzzy

Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


In [None]:
pip install python-Levenshtein

Collecting python-Levenshtein
  Downloading python-Levenshtein-0.12.2.tar.gz (50 kB)
[?25l[K     |██████▌                         | 10 kB 21.2 MB/s eta 0:00:01[K     |█████████████                   | 20 kB 14.7 MB/s eta 0:00:01[K     |███████████████████▌            | 30 kB 10.2 MB/s eta 0:00:01[K     |██████████████████████████      | 40 kB 9.4 MB/s eta 0:00:01[K     |████████████████████████████████| 50 kB 2.7 MB/s 
Building wheels for collected packages: python-Levenshtein
  Building wheel for python-Levenshtein (setup.py) ... [?25l[?25hdone
  Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=149858 sha256=d9d670cba15808352c447f957a59525c4d328a91a9322bdbb7cdbe8e70353bee
  Stored in directory: /root/.cache/pip/wheels/05/5f/ca/7c4367734892581bb5ff896f15027a932c551080b2abd3e00d
Successfully built python-Levenshtein
Installing collected packages: python-Levenshtein
Successfully installed python-Levenshtein-0.12.2


In [None]:
import numpy as np 
import pandas as pd
import scipy.sparse
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from fuzzywuzzy import process
pd.set_option('mode.chained_assignment', None)
pd.set_option('display.max_colwidth',None)

In [None]:
#bookcrossing dataset. has ratings data but books don't have descriptions
Books = pd.read_csv('/content/drive/MyDrive/Books.csv',low_memory=False)
Ratings = pd.read_csv('/content/drive/MyDrive/Ratings.csv',low_memory=False)
Users = pd.read_csv('/content/drive/MyDrive/Users.csv',low_memory=False)

In [None]:
books7k = pd.read_csv('/content/drive/MyDrive/books7k.csv',low_memory=False)
books7k=books7k[['isbn10','title','description','categories','thumbnail','authors']]
print("before dropping null: ",books7k.shape)
books7k.dropna(inplace=True)
print("after dropping null: ",books7k.shape)

before dropping null:  (6810, 6)
after dropping null:  (6247, 6)


In [None]:
#remove books that are not in 7k(since there's no description available)
Books=Books[Books['ISBN'].isin(list(books7k['isbn10'].unique()))]
Books.shape

(2265, 8)

In [None]:
print("shape before cleaning:",Ratings.shape)
Ratings = Ratings[Ratings['User-ID'].isin(list(Users['User-ID'].unique()))]
Ratings = Ratings[Ratings['ISBN'].isin(list(Books['ISBN'].unique()))]
print("shape after cleaning:",Ratings.shape)

shape before cleaning: (1149780, 3)
shape after cleaning: (39554, 3)


In [None]:
Books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
18,0440234743,The Testament,John Grisham,1999,Dell,http://images.amazon.com/images/P/0440234743.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0440234743.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0440234743.01.LZZZZZZZ.jpg
44,0553582909,Icebound,Dean R. Koontz,2000,Bantam Books,http://images.amazon.com/images/P/0553582909.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0553582909.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0553582909.01.LZZZZZZZ.jpg
51,0842342702,Left Behind: A Novel of the Earth's Last Days (Left Behind #1),Tim Lahaye,2000,Tyndale House Publishers,http://images.amazon.com/images/P/0842342702.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0842342702.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0842342702.01.LZZZZZZZ.jpg
90,0316769487,The Catcher in the Rye,J.D. Salinger,1991,"Little, Brown",http://images.amazon.com/images/P/0316769487.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0316769487.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0316769487.01.LZZZZZZZ.jpg
105,067976397X,Corelli's Mandolin : A Novel,LOUIS DE BERNIERES,1995,Vintage,http://images.amazon.com/images/P/067976397X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/067976397X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/067976397X.01.LZZZZZZZ.jpg


In [None]:
Ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
20,276747,679776818,8
109,276804,440498058,8
112,276808,395547032,10
134,276822,141310340,9
329,276872,425188221,7


data has been preprocessed and cleaned up a fair bit. (there are some anamolies in publication years but since we're not using with those values. so ignoring them for now)

In [None]:
#removing users who have rated less than 5 books and books that have been rated less than 3 times 
counts1 = Ratings['User-ID'].value_counts()
Ratings = Ratings[Ratings['User-ID'].isin(counts1[counts1 >=5].index)]
counts = Ratings['Book-Rating'].value_counts()
Ratings = Ratings[Ratings['Book-Rating'].isin(counts[counts >= 3].index)]
Ratings.shape
#getting books which are present in our updated ratings dataframes 
Books=Books[Books['ISBN'].isin(list(Ratings['ISBN']))]#we only have books which have ratings in the Ratings df(updated df)
Books

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
18,0440234743,The Testament,John Grisham,1999,Dell,http://images.amazon.com/images/P/0440234743.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0440234743.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0440234743.01.LZZZZZZZ.jpg
44,0553582909,Icebound,Dean R. Koontz,2000,Bantam Books,http://images.amazon.com/images/P/0553582909.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0553582909.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0553582909.01.LZZZZZZZ.jpg
51,0842342702,Left Behind: A Novel of the Earth's Last Days (Left Behind #1),Tim Lahaye,2000,Tyndale House Publishers,http://images.amazon.com/images/P/0842342702.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0842342702.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0842342702.01.LZZZZZZZ.jpg
90,0316769487,The Catcher in the Rye,J.D. Salinger,1991,"Little, Brown",http://images.amazon.com/images/P/0316769487.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0316769487.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0316769487.01.LZZZZZZZ.jpg
105,067976397X,Corelli's Mandolin : A Novel,LOUIS DE BERNIERES,1995,Vintage,http://images.amazon.com/images/P/067976397X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/067976397X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/067976397X.01.LZZZZZZZ.jpg
...,...,...,...,...,...,...,...,...
267428,0872205541,The Trial and Death of Socrates (3rd Edition),Plato,2001,Hackett Pub Co Inc,http://images.amazon.com/images/P/0872205541.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0872205541.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0872205541.01.LZZZZZZZ.jpg
268067,0312155328,From Bondage: Mercy of a Rude Stream (Mercy of a Rude Stream),Henry Roth,1997,Picador USA,http://images.amazon.com/images/P/0312155328.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0312155328.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0312155328.01.LZZZZZZZ.jpg
268736,0312421974,This Side of Brightness: A Novel,Colum McCann,2003,St Martins Pr Special,http://images.amazon.com/images/P/0312421974.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0312421974.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0312421974.01.LZZZZZZZ.jpg
270914,0751506451,The Maiden (The Morland Dynasty Series),Cynthia Harrod-Eagles,2001,Little Brown UK Ltd,http://images.amazon.com/images/P/0751506451.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/0751506451.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0751506451.01.LZZZZZZZ.jpg


In [None]:
Ratings.loc[Ratings['ISBN'] == '0451528557']

Unnamed: 0,User-ID,ISBN,Book-Rating
422606,100906,451528557,9


In [None]:
#creating 2d matrix with user ids and isbns 

rating_pivot = Ratings.pivot(index = 'ISBN', columns = 'User-ID', values = 'Book-Rating').fillna(0)
rating_matrix = csr_matrix(rating_pivot.values)
rating_pivot2=rating_pivot.reset_index()

rating_pivot2.index[rating_pivot2['ISBN'] == '0020442602'].tolist()[0]

4

In [None]:
|#reseting index beacuse there was an issue when index of books did not match and exceeded the 
#matrix dimensions
Books=Books.reset_index()
Books

SyntaxError: ignored

In [None]:
#books7k also cleaned up to only have books that are in final Ratings dataframe
books7k=books7k[books7k['isbn10'].isin(list(Ratings['ISBN'].unique()))]
books7k.head()

Unnamed: 0,isbn10,title,description,categories,thumbnail,authors
2,6163831,The One Tree,Volume Two of Stephen Donaldson's acclaimed second trilogy featuing the compelling anti-hero Thomas Covenant.,American fiction,http://books.google.com/books/content?id=OmQawwEACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Stephen R. Donaldson
27,6512674,Spares,"Spares - human clones, the ultimate health insurance. An eye for an eye - but some people are doing all the taking. The story of Jack Randall: burnt-out, dropped out, and way overdrawn at the luck bank. But as caretaker on a Spares Farm, he still has a choice, and it might make a difference.",Human cloning,http://books.google.com/books/content?id=83RrAdP9y5UC&printsec=frontcover&img=1&zoom=1&source=gbs_api,Michael Marshall Smith
95,20199856,The Love of the Last Tycoon,Depicts the inner-workings of the Hollywood movie industry and its impact on the fabric of American life.,Fiction,http://books.google.com/books/content?id=3EDbEHca_k8C&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api,F. Scott Fitzgerald
97,20360754,Heart Songs and Other Stories,"Before she wrote her Pulitzer Prize-winning bestseller The Shipping News, E. Annie Proulx was already producing some of the finest short fiction in the country. Here are her collected stories, including two new works never before anthologized. These stories reverberate with rural tradition, the rites of nature, and the rituals of small-town life. The country is blue-collar New England; the characters are native families and the dispossessed working class, whose heritage is challenged by the neorural bourgeoisie from the city; and the themes are as elemental as the landscape: revenge, malice, greed, passion. Told with skill and profundity and crafted by a master storyteller, these are lean, tough tales of an extraordinary place and its people.",Fiction,http://books.google.com/books/content?id=_K2fswEACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Annie Proulx
98,20442602,The voyage of the Dawn Treader,"The ""Dawn Treader"" is the first ship Narnia has seen in centuries. King Caspian has built it for his voyage to find the seven lords, good men whom his evil uncle Mizaz banished when he usurped the throne. The journey takes Edmund, Lucy, and their cousin Eustace to the Eastern Islands, beyond the Silver Sea, toward Aslan's country at the End of the World. Illustrations.",Juvenile Fiction,http://books.google.com/books/content?id=fDD3CfYb70cC&printsec=frontcover&img=1&zoom=1&source=gbs_api,Clive Staples Lewis


In [None]:
model_knn=NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=9)
model_knn.fit(rating_matrix)

rating_matrix
#scipy.sparse.save_npz('rat_matrix.npz', rating_matrix)
#your_matrix_back = sparse.load_npz("yourmatrix.npz")

<2008x1646 sparse matrix of type '<class 'numpy.float64'>'
	with 6876 stored elements in Compressed Sparse Row format>

In [None]:
def recommend(book_title, data, model, n_recommendations ):#we have PROVED that there is an index mismaatch in data and the books so, try to get the index of the book with ISBN==isbn.....can we reindex?made a new pivot with index matching rat_mat
    rec_isbn=[]
    rec_df=pd.DataFrame()
    model.fit(data)
    idx=process.extractOne(book_title, Books['Book-Title'])[2]
    isbn=Books['ISBN'][idx]
    print(isbn)
    isbn_ind=rating_pivot2.index[rating_pivot2['ISBN'] == isbn ].tolist()[0]
    distances, indices=model.kneighbors(data[isbn_ind], n_neighbors=n_recommendations)
    ind=indices.flatten()
    print(idx)
    for i in ind:
      if(i!=idx):
        rec_isbn.append(Books['ISBN'][i])
      else:
        display(books7k.loc[books7k['isbn10'] == Books['ISBN'][i]])
    rec_df=books7k[books7k['isbn10'].isin(rec_isbn)]  
    print("similar books are:")
    return rec_df
book_name=input("enter the title of a book you liked: ")
recommend(book_name, rating_matrix, model_knn,9)


enter the title of a book you liked: time machine
0451528557
1413
similar books are:


Unnamed: 0,isbn10,title,description,categories,thumbnail,authors
1582,312876629,Songmaster,"An SF classic from the author of Ender's Game. Kidnapped at an early age, the young singer Ansset has been raised in isolation at the mystical retreat called the Songhouse. His life has been filled with music, and having only songs for companions, he develops a voice that is unlike any heard before. Ansset's voice is both a blessing and a curse, for the young Songbird can reflect all the hopes and fears his auidence feels and, by magnifying their emotions, use his voice to heal--or to destroy. When it is discovered that his is the voice that the Emperor has waited decades for, Ansset is summoned to the Imperial Palace on Old Earth. Many fates rest in Ansset's hands, and his songs will soon be put to the test: either to salve the troubled conscience of a conqueror, or drive him, and the universe, into mad chaos. Songmaster is a haunting story of power and love--the tale of the man who would destroy everything he loves to preserve humanity's peace, and the boy who might just sing the world away.",Fiction,http://books.google.com/books/content?id=xkfZlwEACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Orson Scott Card
1803,345407865,The Children of Henry VIII,Recounts the lives of Henry VIII's heirs and the intrigues that arose from their struggle to ascend their father's throne,Biography & Autobiography,http://books.google.com/books/content?id=rKiCmOGUcbgC&printsec=frontcover&img=1&zoom=1&source=gbs_api,Alison Weir
2127,375727132,The Dive from Clausen's Pier,"When her fiancâe Mike is left paralyzed following a tragic accident, Carrie Bell begins to question her familiar world, from her everyday life in Wisconsin to her relationships, as she sets out to rediscover her own identity.",Fiction,http://books.google.com/books/content?id=x_RlwZo8LMEC&printsec=frontcover&img=1&zoom=1&source=gbs_api,Ann Packer
2278,385335970,Dragonfly in Amber,"In eighteenth-century Scotland, Claire Randall and her raven-haired daughter, Brianna, return to the majestic hills where Claire recalls the love of her life--gallant warrior James Fraser.",Fiction,http://books.google.com/books/content?id=yYc_PgAACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Diana Gabaldon
3224,500203474,Graphic Design,"From its roots in the development of printing, graphic design has evolved as a means of identification, information, and promotion to become a profession and discipline in its own right. This authoritative documentary history begins with the poster and goes on to chart the development of word and image in brochures and magazines, advertising, corporate identity, television, and electronic media, and the impact of technical innovations such as photography and the computer. For the revised edition, a new final chapter covers all the recent international developments in graphic design, including the role of the computer and the Internet in design innovation and globalization. In the last years of the twentieth century, at a time when ""designer products"" and the use of logos grew in importance, the role of graphic designers became more complex, subversive, and sometimes more politicalwitness Oliviero Toscani's notorious advertisements for Benetton. Digital technology cleared the way for an astonishing proliferation of new typefaces, and words began to take second place to typography in a whole range of magazines and books as designers asserted the primacy of their medium. Designers and companies discussed here include Neville Brody, David Carson, Design Writing Research, Edward Fella, Tibor Kalman, Jeffery Keedy, LettError, Pierre di Sciullo, Tomato, Gerard Unger, Cornel Windlin, and a host of others. Over 800 illustrations, 30 in color.",Art,http://books.google.com/books/content?id=GI9tngEACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Richard Hollis
3462,553346113,Seven Plays,"Henry is generally well-behaved, but he is occasionally arrogant and vain. Henry is at heart a hard worker, but his frequent bouts of illness hinder his work.",Drama,http://books.google.com/books/content?id=-a0szDa6MksC&printsec=frontcover&img=1&zoom=1&source=gbs_api,Sam Shepard;Joseph Chaikin
3861,671662341,Anne Frank Remembered,"The reminiscences of Miep Gies, the woman who hid the Frank family in Amsterdam during the Second World War, presents a vivid story of life under Nazi occupation.",History,http://books.google.com/books/content?id=7wO8QDEanr8C&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api,Miep Gies;Alison Leslie Gold
5349,836246217,Mary Engelbreit's Home Companion,"A guide to decorating with style covers kitchens, windows, walls, children's rooms, and workspaces",Crafts & Hobbies,http://books.google.com/books/content?id=VzIGAAAACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api,Mary Engelbreit
6568,1853260738,Moll Flanders,"With an Introduction and Notes by R.T.Jones, Honorary Fellow of the University of York. The novel follows the life of its eponymous heroine, Moll Flanders, through its many vicissitudes, which include her early seduction, careers in crime and prostitution, conviction for theft and transportation to the plantations of Virginia, and her ultimate redemption and prosperity in the New World. 'Moll Flanders' was one of the first social novels to be published in English and draws heavily on Defoe's experience of the topography and social conditions prevailing in the London of the late 17th century. AUTHOR: Born Daniel Foe in London in 1660, Defoe was a prodigious writer on many subjects, producing over 500 books, pamphlets and articles. He is now remembered for his novels, primarily 'The Life and Strange Suprizing Adventures of Robinson Crusoe, of York' and 'The Fortunes and Misfortunes of the Famous Moll Flanders' and is considered to be one of the key figures in establishing the format of the English novel.",Fiction,http://books.google.com/books/content?id=kzuSuZj03P8C&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api,Daniel Defoe


while most of the recommendations are fiction and similar there are some irrelevant results but might have been rated highly by users.