## Imports

In [14]:
import pandas as pd
import numpy as np

import faiss
import requests

dim = 4096 # Dimensinality of response getting from Llama2

index = faiss.IndexFlatL2(dim)


## Gather Data

In [5]:
df = pd.read_csv('books.csv')

df.head()

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count
0,9780002005883,2005883,Gilead,,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,361.0
1,9780002261982,2261987,Spider's Web,A Novel,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0
2,9780006163831,6163831,The One Tree,,Stephen R. Donaldson,American fiction,http://books.google.com/books/content?id=OmQaw...,Volume Two of Stephen Donaldson's acclaimed se...,1982.0,3.97,479.0,172.0
3,9780006178736,6178731,Rage of angels,,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,29532.0
4,9780006280897,6280897,The Four Loves,,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,33684.0


Create a function take a row and turn into a textual representation

In [6]:
def textual_rep(row):
    textual_rep = f"""Title: {row['title']}
Authors: {row['authors']}
Description: {row['description']}
Categories: {row['categories']}
Publishing Year: {row['published_year']}
Average Rating: {row['average_rating']}
Number of Pages: {row['num_pages']} """
    return textual_rep

In [8]:
print(df.iloc[:5].apply(textual_rep,axis=1).values[0])

Title: Gilead
Authors: Marilynne Robinson
Description: A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration

Apply our function to all df

In [9]:
df['textual_representation'] = df.apply(textual_rep,axis=1)

In [11]:
df.head(2)

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,textual_representation
0,9780002005883,2005883,Gilead,,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,361.0,Title: Gilead\nAuthors: Marilynne Robinson\nDe...
1,9780002261982,2261987,Spider's Web,A Novel,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0,Title: Spider's Web\nAuthors: Charles Osborne;...


In [15]:
# Initialse input full of zeros

X = np.zeros((len(df['textual_representation']),dim), dtype='float32')

We need to get embeding for Llama2

In [17]:
for i, representation in enumerate(df['textual_representation']):
    if i % 100 ==0:
        print(i)
    res = requests.post('http://localhost:11434/api/embeddings',
                        json={
                            'model':'llama2',
                            'prompt': representation
                        })
    embedding = res.json()['embedding']

    X[i] = np.array(embedding)

index.add(X)

0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
3500
3600
3700
3800
3900
4000
4100
4200
4300
4400
4500
4600
4700
4800
4900
5000
5100
5200
5300
5400
5500
5600
5700
5800
5900
6000
6100
6200
6300
6400
6500
6600
6700
6800


In [18]:
faiss.write_index(index, 'index')

In [19]:
index = faiss.read_index('index')

In [20]:
df[df.title.str.contains('Friends')]

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,textual_representation
502,9780064420808,64420809,Little House Friends,,Heather Henson;Laura Ingalls Wilder,Juvenile Fiction,http://books.google.com/books/content?id=gMZbA...,Laura Ingalls shares adventures and good times...,1998.0,3.99,80.0,70.0,Title: Little House Friends\nAuthors: Heather ...
612,9780099498599,99498596,Circle of Friends,,Maeve Binchy,College attendance,http://books.google.com/books/content?id=P4-i_...,"It began with Benny Hogan and Eve Malone, grow...",2006.0,4.02,722.0,50668.0,Title: Circle of Friends\nAuthors: Maeve Binch...
986,9780142300848,142300845,"Oliver and Albert, Friends Forever",,Jean Van Leeuwen,Juvenile Fiction,http://books.google.com/books/content?id=-EBFs...,"Oliver makes friends with Albert, the new boy ...",2002.0,3.62,48.0,22.0,"Title: Oliver and Albert, Friends Forever\nAut..."
1776,9780345323903,345323904,With Friends Like These,,Alan Dean Foster,Fiction,http://books.google.com/books/content?id=IKUcA...,Willie Whitehorse could have been just another...,1984.0,3.94,236.0,2233.0,Title: With Friends Like These\nAuthors: Alan ...
2490,9780394895833,394895835,Baby's Animal Friends,,Phoebe Dunn,Juvenile Fiction,http://books.google.com/books/content?id=ytIAZ...,Photographs capture the special relationship b...,1988.0,3.4,28.0,15.0,Title: Baby's Animal Friends\nAuthors: Phoebe ...
4373,9780743272773,743272773,How to Win Friends and Influence People for Te...,,Donna Dale Carnegie,Self-Help,http://books.google.com/books/content?id=eIc2l...,"Donna Dale Carnegie, daughter of the late moti...",2005.0,3.9,208.0,426.0,Title: How to Win Friends and Influence People...
4533,9780749307844,749307846,How to Win Friends and Influence People,,Dale Carnegie,Conduct of life,http://books.google.com/books/content?id=aO7CQ...,Dale Carnegie aims to show how to makes friend...,1990.0,4.18,256.0,199.0,Title: How to Win Friends and Influence People...
5115,9780810958623,810958627,Ruby Gloom's Guide to Friendship,,Mighty Fine Inc.,Juvenile Nonfiction,http://books.google.com/books/content?id=FYrcu...,"If Ruby Gloom's friends seem somewhat unusual,...",2005.0,4.34,72.0,37.0,Title: Ruby Gloom's Guide to Friendship\nAutho...


In [23]:
favorite_book = df.iloc[4533]
favorite_book

isbn13                                                        9780749307844
isbn10                                                           0749307846
title                               How to Win Friends and Influence People
subtitle                                                                NaN
authors                                                       Dale Carnegie
categories                                                  Conduct of life
thumbnail                 http://books.google.com/books/content?id=aO7CQ...
description               Dale Carnegie aims to show how to makes friend...
published_year                                                       1990.0
average_rating                                                         4.18
num_pages                                                             256.0
ratings_count                                                         199.0
textual_representation    Title: How to Win Friends and Influence People...
Name: 4533, 

Let's say we want book like this. Assume it is not part of a dataframe (If you want you can give unique input.) Let's create an embedding for a given book and perform a similarity search.

In [24]:
res = requests.post('http://localhost:11434/api/embeddings',
                        json={
                            'model':'llama2',
                            'prompt': favorite_book['textual_representation']
                        })

In [25]:
embedding = np.array([res.json()['embedding']], dtype='float32')

Fit this into index and search for similarities

In [26]:
D,I = index.search(embedding,5) # We want 5 most similar items

In [27]:
best_matches = np.array(df['textual_representation'])[I.flatten()] #Turn position to a text

In [29]:
for match in best_matches:
    print(match)
    print()

Title: How to Win Friends and Influence People
Authors: Dale Carnegie
Description: Dale Carnegie aims to show how to makes friends, increase your prestige, break out of the vicious circle of worry and generally get the better of life, using a series of simple and practical rules, techniques and attitudes.
Categories: Conduct of life
Publishing Year: 1990.0
Average Rating: 4.18
Number of Pages: 256.0 

Title: The Easy Way to Stop Smoking
Authors: Allen Carr
Description: Presents the Easyway method for quitting smoking, based on a factual understanding of the harm of cigarette addiction and practical advice on how to successfully break the habit.
Categories: Self-Help
Publishing Year: 2004.0
Average Rating: 4.29
Number of Pages: 224.0 

Title: The Denial of Death
Authors: Ernest Becker
Description: Drawing from religion and the human sciences, particularly psychology after Freud, the author attempts to demonstrate that the fear of death is man's central concern
Categories: Philosophy
Pub

All books about self improvement and relationships so quiete similar to reference book