# Foxlink's clustering algorithm evaluation
Evaluating Foxlink's clustering algorithm on bookdepository.com pages. The aim is to calculate precision and recall for "book details" cluster in bookdepository.com.

In [1]:
%matplotlib inline
# Importing libraries
import matplotlib.pyplot as plt
import pandas as pd

FILEPATH = '../../../datasets/bookdepository.csv'
FILEPATH

'../../../datasets/bookdepository.csv'

In [2]:
df = pd.read_csv(FILEPATH)

## Data analisys
Some preliminary analisys of the dataset

In [3]:
print("First 5 rows")
print("------------")
df.head()

First 5 rows
------------


Unnamed: 0,url,referer_url,src,label,shingle_vector
0,https://www.bookdepository.com/,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",,"(0, 3, 2, 0, 5, 1, 1, 1)"
1,https://www.bookdepository.com/author/J-K-Rowling,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",,"(0, 3, 2, 0, 7, 1, 2, 1)"
2,https://www.bookdepository.com/category/3098/T...,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",,"(0, 3, 1, 0, 7, 1, 1, 1)"
3,https://www.bookdepository.com/category/3392/B...,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",,"(0, 3, 1, 0, 5, 1, 1, 1)"
4,https://www.bookdepository.com/category/2967/T...,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",,"(0, 3, 2, 0, 7, 1, 1, 1)"


In [4]:
print("No. of rows and columns")
print("-----------------------")
df.shape

No. of rows and columns
-----------------------


(25549, 5)

In [5]:
print("Check null values")
print("-----------------")
df.isnull().any().any()

Check null values
-----------------


True

In [6]:
print("Check duplicate values")
print("----------------------")
len(df['url'].unique()) != df.shape[0]

Check duplicate values
----------------------


False

In [7]:
print("DataFrame column types")
print("----------------------")
df.info()

DataFrame column types
----------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25549 entries, 0 to 25548
Data columns (total 5 columns):
url               25549 non-null object
referer_url       25549 non-null object
src               25549 non-null object
label             15807 non-null object
shingle_vector    25549 non-null object
dtypes: object(5)
memory usage: 998.1+ KB


In [8]:
print("Some stats")
print("----------------")
df.describe()

Some stats
----------------


Unnamed: 0,url,referer_url,src,label,shingle_vector
count,25549,25549,25549,15807,25549
unique,25549,10496,25549,1,23
top,https://www.bookdepository.com/Making-West-Con...,https://www.bookdepository.com/,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",product,"(0, 1, 2, 3, 0, 1, 2, 1)"
freq,1,27,1,15807,6941


In [9]:
fmt_string = 'There are {} row with {} label'
print(fmt_string.format(len(df[df['label'].isnull()]),'no'))
print(fmt_string.format(len(df[df['label']=='product']), 'product'))

There are 9742 row with no label
There are 15807 row with product label


## Run Foxlink's clustering algorithm

In [10]:
#add top level folder to sys.path
import sys
sys.path.append('../../../')

In [11]:
from foxlink_clustering.clustering.structural_clustering import structural_clustering

clusters = structural_clustering(df)

In [12]:
len(clusters)

6

So Foxlink's clustering algorithm discovered 6 clusters. Let's see how many pages contains each cluster

In [18]:
cluster_fmt = 'cluster n. {} has {} pages'
for index, cluster in enumerate(clusters):
    print(cluster_fmt.format(index +1 , len(cluster[1])))
    

cluster n. 1 has 521 pages
cluster n. 2 has 9077 pages
cluster n. 3 has 649 pages
cluster n. 4 has 4023 pages
cluster n. 5 has 11201 pages
cluster n. 6 has 56 pages


Looking at each cluster it seems that cluster n.5 should be the one which groups pages that show books details. Let's check that

In [15]:
clusters[4]

([0, 1, -1, 3, 0, 1, -1, 1],
 ['https://www.bookdepository.com/Look-Inside-Our-World-Emily-Bone/9781409563945?ref=grid-view',
  'https://www.bookdepository.com/Look-Inside-Things-That-Go-Rob-Lloyd-Jones/9781409550259?ref=grid-view',
  'https://www.bookdepository.com/Look-Inside-Space-Rob-Lloyd-Jones/9781409523383?ref=grid-view',
  'https://www.bookdepository.com/Angel-Time-Professor-Anne-Rice/9781400078950',
  'https://www.bookdepository.com/Art-Mass-Effect-Universe-Casey-Hudson/9781595827685?ref=grid-view',
  'https://www.bookdepository.com/Talking-Heads-Fear-Music-Jonathan-Lethem/9781441121004?ref=grid-view',
  'https://www.bookdepository.com/Time-Death-Susan-Ericksen/9781469205793?ref=grid-view',
  'https://www.bookdepository.com/Holiday-Death-J-D-Robb/9781469233758',
  'https://www.bookdepository.com/Holiday-Death-J-D-Robb/9781417711772',
  'https://www.bookdepository.com/Meditations-Marcus-Aurelius/9780141018829?ref=grid-view',
  'https://www.bookdepository.com/Communist-Manifesto

However, as said previously there are 15807 pages which show book details. So there are 4606 pages displaying book details which are (probably ?) distributed across the remaining 5 clusters. Therefore it seems that Foxlink's clustering algorithm should have a high precision (possibly 1) and a lower value of recall.

## Calculate precision and recall
Calculate precision and recall considering the entire dataset.

In [32]:
bookCluster = clusters[4]
urlsFromBookCluster = bookCluster[1]

In [39]:
pages_retrieved_for_book_query = len(urlsFromBookCluster)

true_positive = 0

all_positives = len(df[df['label']=='product'])

for url in urlsFromBookCluster:
    matchingRow  = df[df['url'] == url][['url','label']].iloc[0]
    if matchingRow['label'] == 'product':
        true_positive += 1

In [50]:
recall = true_positive/all_positives
precision = true_positive/pages_retrieved_for_book_query
eval_fmt = '{}: {}'
print(eval_fmt.format('Recall', recall))
print(eval_fmt.format('Precision', precision))

Recall: 0.7048143227683937
Precision: 0.9946433354164806


Let's find out how much precision and recall are dependent from the dataset size. Let's consider a dataset of 100, 500, 1000 and 5000 pages.

### Evaluate recall and precision using a dataset of 100 pages

In [42]:
sample100 = df.sample(100)
sample100.describe()

Unnamed: 0,url,referer_url,src,label,shingle_vector,predicted_label
count,100,100,100,67,100,0.0
unique,100,98,100,1,12,0.0
top,https://www.bookdepository.com/Line-Fire/97814...,https://www.bookdepository.com/author/Louise-R...,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",product,"(0, 1, 2, 3, 0, 1, 2, 1)",
freq,1,2,1,67,29,


In [44]:
print(fmt_string.format(len(sample100[sample100['label'].isnull()]),'no'))
print(fmt_string.format(len(sample100[sample100['label']=='product']), 'product'))

There are 33 row with no label
There are 67 row with product label


In [45]:
clusters = structural_clustering(sample100)
print("There are {} clusters".format(len(clusters)))

There are 2 clusters


In [46]:
for index, cluster in enumerate(clusters):
    print(cluster_fmt.format(index +1 , len(cluster[1])))

cluster n. 1 has 48 pages
cluster n. 2 has 31 pages


Note that during Foxlink's clustering algorithm some pages are discarded. 
Printing out each cluster data points

In [59]:
for i in range(10):
    print(clusters[0][1][i])

https://www.bookdepository.com/Line-Fire/9781435915374
https://www.bookdepository.com/The-Collected-Works-of-Newton-P-Stallknecht-Vol-7-David--White-Marilyn-C-Bisch-Donald-L-Jennermann/9780773482258
https://www.bookdepository.com/London-Library-City-Chris-Burton/9781873667026
https://www.bookdepository.com/Kant-en-het-rode-jurkje-Lamia-Berrada-Berca/9789044538267
https://www.bookdepository.com/Second-Language-Acquisition-Research/9781847180513
https://www.bookdepository.com/Low-Bridge-Lionel-D-Wyld/9781258191238
https://www.bookdepository.com/BRAIN-AT-WORK/9783928383295
https://www.bookdepository.com/Complete-Guide-Growing-Berries-Grapes-Louise-Riotte/9780878338252
https://www.bookdepository.com/Collected-Works-Newton-P-Stallknecht-Studies-Philosophy-Creation-With-Especial-Reference-Bergson-Whitehead-v-3-Newton-P-Stallknecht/9780773482203
https://www.bookdepository.com/Making-Most-Shade-Larry-Hodgson/9781579549664


In [60]:
for i in range(10):
    print(clusters[1][1][i])

https://www.bookdepository.com/author/Joan-Laporta-I-Estruch
https://www.bookdepository.com/author/Stanley-Richard-Challand
https://www.bookdepository.com/author/Marvelene-C-Moore
https://www.bookdepository.com/author/Ingrid-Whitton
https://www.bookdepository.com/author/Diana-Sharon-Saunders-Curtin
https://www.bookdepository.com/publishers/Xing-Ren-Chu-Ban-She
https://www.bookdepository.com/author/Kathleen-J-Hanna
https://www.bookdepository.com/author/Michael-Zarnock
https://www.bookdepository.com/author/Francesca-Segal
https://www.bookdepository.com/publishers/Jessica-Kingsley-Publishers-Ltd


So it seems that the first cluster contains pages from the 'product' template. Let's calculate now precision and recall for that cluster.

In [55]:
bookCluster = clusters[0]
urlsFromBookCluster = bookCluster[1]

In [56]:
pages_retrieved_for_book_query = len(urlsFromBookCluster)

true_positive = 0

all_positives = len(sample100[sample100['label']=='product'])

for url in urlsFromBookCluster:
    matchingRow  = sample100[sample100['url'] == url][['url','label']].iloc[0]
    if matchingRow['label'] == 'product':
        true_positive += 1

In [57]:
recall = true_positive/all_positives
precision = true_positive/pages_retrieved_for_book_query
eval_fmt = '{}: {}'
print(eval_fmt.format('Recall', recall))
print(eval_fmt.format('Precision', precision))

Recall: 0.7164179104477612
Precision: 1.0


Finally let's calculate again precision and recall considering 1000 pages and 5000 pages

### Evaluate recall and precision using a dataset of 1000 pages

In [115]:
sample1000 = df.sample(1000)
sample1000.describe()

Unnamed: 0,url,referer_url,src,label,shingle_vector,predicted_label
count,1000,1000,1000,601,1000,0.0
unique,1000,942,1000,1,15,0.0
top,https://www.bookdepository.com/author/Jeremey-...,https://www.bookdepository.com/author/Frank-Fr...,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",product,"(0, 3, 2, 0, 7, 1, 2, 1)",
freq,1,3,1,601,249,


In [116]:
print(fmt_string.format(len(sample1000[sample1000['label'].isnull()]),'no'))
print(fmt_string.format(len(sample1000[sample1000['label']=='product']), 'product'))

There are 399 row with no label
There are 601 row with product label


In [117]:
clusters = structural_clustering(sample1000)
print("There are {} clusters".format(len(clusters)))

There are 5 clusters


In [118]:
for index, cluster in enumerate(clusters):
    print(cluster_fmt.format(index +1 , len(cluster[1])))

cluster n. 1 has 424 pages
cluster n. 2 has 371 pages
cluster n. 3 has 158 pages
cluster n. 4 has 23 pages
cluster n. 5 has 22 pages


Printing each cluster's data points

In [119]:
for i in range(10):
    print(clusters[0][1][i])

https://www.bookdepository.com/het-bos-Thereza-Rowe/9789045321370
https://www.bookdepository.com/Chemistry-Hat-Manufacturing-Watson-Smith/9780656123988
https://www.bookdepository.com/Ramsays-beste-menus-druk-1-Gordon-Ramsay/9789021539904
https://www.bookdepository.com/American-Literature/9781245891127
https://www.bookdepository.com/My-Pregnancy-2011-Kate-Street/9781905410675
https://www.bookdepository.com/Introduction-Usability-Patrick-W-Jordan/9780748407620?ref=grid-view
https://www.bookdepository.com/Female-Thing-Laura-Kipnis/9781852429812
https://www.bookdepository.com/Mein-Lernposter-Erste-W%C3%B6rter/9783845505473
https://www.bookdepository.com/Alphabet-Alliteration-Bilingual-Navajo-English-Adele-Marie-Crouch/9781484825631
https://www.bookdepository.com/Biopsia-endomiocardica-Testo-atlante-Giorgio-Baroldi/9788829912490


In [120]:
for i in range(10):
    print(clusters[1][1][i])

https://www.bookdepository.com/author/Sergey-Kapitza
https://www.bookdepository.com/author/David-Marquez
https://www.bookdepository.com/author/Anna-Maria-Horner
https://www.bookdepository.com/author/Tamdin-Sither-Bradley
https://www.bookdepository.com/author/Dr-April-Pudsey
https://www.bookdepository.com/author/Ingrid-Whitton
https://www.bookdepository.com/author/Rudolf-Feik
https://www.bookdepository.com/author/Ana-Maria-B-Vazquez
https://www.bookdepository.com/author/Stephen-K-Barton
https://www.bookdepository.com/author/Angelika-Keller-Und-Hans-Ulrich-Keller


In [121]:
for i in range(10):
    print(clusters[2][1][i])

https://www.bookdepository.com/Dem-Winde-versprochen-Florencia-Bonelli/9783596182114
https://www.bookdepository.com/Brief-Gestalt-Therapy-Gaie-Houston/9780761973485?ref=bd_ser_1_1
https://www.bookdepository.com/Patient-Violence-Clinician/9780880484541
https://www.bookdepository.com/Psychology-Self-Regulation-Joseph-P-Forgas/9781848728424?ref=bd_ser_1_1
https://www.bookdepository.com/Old-New-Terrorism-Peter-Neumann/9780745643755?ref=bd_ser_1_1
https://www.bookdepository.com/Matters-Spirit-F-Scott-Scribner/9780271036212
https://www.bookdepository.com/Alices-Adventures-Wonderland-Lewis-Carroll/9780147515872
https://www.bookdepository.com/Ruby-Redfort-Take-Your-Last-Breath-Lauren-Child/9781536200485?ref=bd_ser_1_1
https://www.bookdepository.com/Top-10-Moscow-Dk-Travel/9781409326694?ref=bd_ser_1_1
https://www.bookdepository.com/Becoming-Your-Own-Emotional-Support-System-Linda-L-Simmons/9780789032225


In [122]:
for i in range(10):
    print(clusters[3][1][i])

https://www.bookdepository.com/Faire-LAventure-Fabienne-Kanor/9782709643634
https://www.bookdepository.com/Das-entschwundene-Land-1-Cassette-Astrid-Lindgren/9783829106917
https://www.bookdepository.com/Oxford-Guide-Metaphors-CBT-Richard-Stott/9780191015656
https://www.bookdepository.com/Gewerbestatistik-Preussens-VOR-1850/9783922661566
https://www.bookdepository.com/Besser-essen-Rudi-Anschober/9783990247631
https://www.bookdepository.com/Foreign-Language-Teaching-U-S-Schools-Nancy-C-Rhodes/9780872810068
https://www.bookdepository.com/Quand-on-Aime-Il-Ne-Fait-Jamais-Nuit-Mgr-Jacques-Gaillot/9782918414490
https://www.bookdepository.com/Grenzg%C3%A4ngerinnen/9783590180420
https://www.bookdepository.com/Pochoir/9783896023278
https://www.bookdepository.com/Rousseaus-Critique-Inequality-Professor-Philosophy-Viola-Manderfeld-Professor-German-Frederick-Neuhouser/9781316004845


In [123]:
for i in range(10):
    print(clusters[4][1][i])

https://www.bookdepository.com/category/351/vid/3389/Classic-Horror-Audio-Books
https://www.bookdepository.com/category/2921/Home-Renovation-Extension
https://www.bookdepository.com/category/3294/vid/3389/Sacred-Religious-Music-Audio-Books
https://www.bookdepository.com/category/232/vid/3389/Diaries-Letters-Journals-Audio-Books
https://www.bookdepository.com/category/273/vid/3389/Language-Teaching-Learning-Material-Coursework-Audio-Books
https://www.bookdepository.com/category/1785/Power-Generation-Distribution
https://www.bookdepository.com/category/1100/Systems-Law
https://www.bookdepository.com/category/3314/Musical-Scores-Lyrics-Libretti
https://www.bookdepository.com/category/359/Adult-Contemporary-Romance?page=4
https://www.bookdepository.com/category/2842/Psychic-Powers-Psychic-Phenomena


although it is not clear, there are three clusters (first, third and fourth) of pages that match the book (or product) template. In this case it won't be necessary to calculate precision and recall since they should be really low values.

### Evaluate recall and precision using a dataset of 5000 pages

In [124]:
sample5000 = df.sample(5000)
sample5000.describe()

Unnamed: 0,url,referer_url,src,label,shingle_vector,predicted_label
count,5000,5000,5000,3057,5000,0.0
unique,5000,3878,5000,1,18,0.0
top,https://www.bookdepository.com/author/Carlos-A...,https://www.bookdepository.com/publishers/Aver...,"<!DOCTYPE html>\n<html lang=""en"">\n<head>\n\n ...",product,"(0, 1, 2, 3, 0, 1, 2, 1)",
freq,1,8,1,3057,1339,


In [125]:
print(fmt_string.format(len(sample5000[sample5000['label'].isnull()]),'no'))
print(fmt_string.format(len(sample5000[sample5000['label']=='product']), 'product'))

There are 1943 row with no label
There are 3057 row with product label


In [126]:
clusters = structural_clustering(sample5000)
print("There are {} clusters".format(len(clusters)))

There are 5 clusters


In [127]:
for index, cluster in enumerate(clusters):
    print(cluster_fmt.format(index +1 , len(cluster[1])))

cluster n. 1 has 2215 pages
cluster n. 2 has 1806 pages
cluster n. 3 has 144 pages
cluster n. 4 has 714 pages
cluster n. 5 has 104 pages


Printing each cluster's data points

In [129]:
for i in range(10):
    print(clusters[0][1][i])

https://www.bookdepository.com/When-God-Speaks-David-Shultz/9781593177102
https://www.bookdepository.com/Schwimmen-lernen-03-Pool-Nudel-Co-Laminiert-Veronika-Aretz/9783944824161
https://www.bookdepository.com/Round-Horne-Complete-Series-3-Barry-Took/9781785292101?ref=grid-view
https://www.bookdepository.com/search/Hacia-una-poetica-del-humanismo-Cristobal-Gabarron-Towards-Poetry-of-Humanism-Cristobal-Gabarron-Mari-Carmen-Sanchez-Rojas-Antonio-Parra/9788483717387
https://www.bookdepository.com/Runaway-Loaf-Penny-Dolan/9780008185749?ref=grid-view&qid=1557066978493&sr=1-29
https://www.bookdepository.com/Pattern-Language-Christopher-Alexander/9780195019193
https://www.bookdepository.com/Red-Sonja-She-Devil-With-Sword-4-Mike-Avon-Oeming/9781933305646
https://www.bookdepository.com/Das-Kopfkissenbuch-der-Dame-Sei-Shonagon-Sei-Shonagon/9783458089988?ref=bd_ser_1_1
https://www.bookdepository.com/Complete-Icelandic-Beginner-Intermediate-Book-Audio-Course-Hildur-Jonsdottir/9781444105377?ref=bd_s

In [130]:
for i in range(10):
    print(clusters[1][1][i])

https://www.bookdepository.com/author/Christopher-Pierce
https://www.bookdepository.com/author/Mosley-Michael
https://www.bookdepository.com/author/A-Mantovani
https://www.bookdepository.com/author/Anna-Mari
https://www.bookdepository.com/author/Benjamin-Dittrich
https://www.bookdepository.com/author/Mark-Owen
https://www.bookdepository.com/publishers/Klever-Verlag
https://www.bookdepository.com/publishers/Oxbow-Books
https://www.bookdepository.com/author/Annie-Auerbach
https://www.bookdepository.com/author/Bridget-D-Samuels


In [131]:
for i in range(10):
    print(clusters[2][1][i])

https://www.bookdepository.com/Man-on-Donkey-H-F-M-Prescott/9780140029635
https://www.bookdepository.com/Nouveaux-Secrets-Sur-La-Relation-Homme-Cheval-Christelle-Perrin/9791090213531
https://www.bookdepository.com/Evolution-Social-Mind/9780203837788
https://www.bookdepository.com/Handle-selbst-und-lebe-jetzt-Alexander-S-Kaufmann/9783869018720
https://www.bookdepository.com/de-La-Emergencia-la-Estrategia-Jose-Luis-Coraggio/9789508021809
https://www.bookdepository.com/Ce-Livre-Cache-Un-Tr%C3%AF%C2%BF%C2%BDs-Grand-Secret/9782745943842
https://www.bookdepository.com/Atlas-Obscura-Joshua-Foer/9782501117340
https://www.bookdepository.com/%CE%9C%CE%B5-%CF%84%CE%BF%CF%85-%CF%86%CE%B1%CE%BD%CE%B1%CF%81%CE%B9%CE%BF%CF%8D-%CF%84%CE%BF-%CF%86%CF%89%CF%82-Weatherill-Cat/9789604129133
https://www.bookdepository.com/Karfreitag-predigen-Steffen-Bauer/9783927718906
https://www.bookdepository.com/Chewing-on-Tinfoil-Joe-Ollmann/9781897415924


In [132]:
for i in range(10):
    print(clusters[3][1][i])

https://www.bookdepository.com/How-Pass-Higher-Computing-Science-Second-Edition-Greg-Reid/9781510452435?ref=bd_ser_1_1
https://www.bookdepository.com/Just-Above-My-Head-James-Baldwin/9780140187991
https://www.bookdepository.com/Allein-unter-Studenten-Juliet-Hastings/9783404154999
https://www.bookdepository.com/Varieties-Capitalism-History-Transition-Emergence-Martha-Prevezer/9780415735407
https://www.bookdepository.com/Folk-Voiceworks-Peter-Hunt/9780193355736?ref=grid-view&qid=1557069509277&sr=1-25
https://www.bookdepository.com/White-Collar-Crime-Michael-L-Seigel/9780735596511
https://www.bookdepository.com/Keeper-Lost-Cities-Caitlin-Kelly/9781721378746?ref=grid-view&qid=1557069131591&sr=1-1
https://www.bookdepository.com/Colloquial-Breton-Ian-Press/9780415224536
https://www.bookdepository.com/Color-Atlas-Pathophysiology-Stefan-Silbernagl/9783131165534
https://www.bookdepository.com/Beginning-Arduino-Programming-Brian-Evans/9781430237778


In [133]:
for i in range(10):
    print(clusters[4][1][i])

https://www.bookdepository.com/category/241/vid/3389/Language-Reference-General-Audio-Books
https://www.bookdepository.com/category/2657/Early-Modern-History-C-1450-1500-C-1700
https://www.bookdepository.com/category/1865/Ordnance-Weapons-Technology
https://www.bookdepository.com/category/1124/Law-Sea
https://www.bookdepository.com/category/273/vid/3389/Language-Teaching-Learning-Material-Coursework-Audio-Books
https://www.bookdepository.com/category/3217/Theology
https://www.bookdepository.com/category/2075/Human-computer-Interaction
https://www.bookdepository.com/category/1677/Economic-Geography
https://www.bookdepository.com/category/2625/vid/3389/Classic-Horror-Audio-Books
https://www.bookdepository.com/category/358/vid/3389/Romance-Audio-Books


So it seems that the first cluster contains pages matching with the book template. Let's calculate precision and recall again

In [134]:
bookCluster = clusters[0]
urlsFromBookCluster = bookCluster[1]

In [135]:
pages_retrieved_for_book_query = len(urlsFromBookCluster)

true_positive = 0

all_positives = len(sample5000[sample5000['label']=='product'])

for url in urlsFromBookCluster:
    matchingRow  = sample5000[sample5000['url'] == url][['url','label']].iloc[0]
    if matchingRow['label'] == 'product':
        true_positive += 1

In [136]:
recall = true_positive/all_positives
precision = true_positive/pages_retrieved_for_book_query
eval_fmt = '{}: {}'
print(eval_fmt.format('Recall', recall))
print(eval_fmt.format('Precision', precision))

Recall: 0.7196597971867844
Precision: 0.9932279909706546
