## Our clustering algorithm evaluation
Evaluating our clustering algorithm on bookswagon.com pages. The aim is to calculate precision and recall for "book details" cluster and the "catalog" cluster in bookswagon.com.

In [1]:
# Importing libraries
import numpy as np
import pandas as pd
import ast
FILEPATH = '../../../datasets/blackwells.csv'
FILEPATH

'../../../datasets/blackwells.csv'

In [3]:
df = pd.read_csv(FILEPATH, converters={'bitset': ast.literal_eval, 'tag_count': ast.literal_eval})

##Â Data analisys
Some preliminary analisys of the dataset

In [4]:
print("First 5 rows")
print("------------")
df.head()

First 5 rows
------------


Unnamed: 0,url,referer_url,src,shingle_vector,label,tag_count,bitset
0,https://blackwells.co.uk/bookshop/basket,https://blackwells.co.uk/bookshop/home,"\n\n\n \n<!DOCTYPE html>\n<html lang=""e...","(0, 1, 5, 1, 1, 6, 3, 1)",,"[0.0019569471624266144, 0.0019569471624266144,...","[1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, ..."
1,https://blackwells.co.uk/bookshop/search/,https://blackwells.co.uk/bookshop/home,"\n\n\n \n<!DOCTYPE html>\n<html lang=""e...","(0, 1, 5, 1, 1, 0, 3, 0)",list,"[0.0012970168612191958, 0.0012970168612191958,...","[0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, ..."
2,https://blackwells.co.uk/bookshop/home,https://blackwells.co.uk/bookshop/home,"\n\n\n \n<!DOCTYPE html>\n<html lang=""e...","(0, 1, 0, 1, 0, 0, 3, 1)",,"[0.0011655011655011655, 0.0011655011655011655,...","[1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, ..."
3,https://blackwells.co.uk/bookshop/product/9781...,https://blackwells.co.uk/bookshop/home,"\n\n\n \n<!DOCTYPE html>\n<html lang=""e...","(0, 1, 1, 1, 1, 0, 0, 1)",product,"[0.0008116883116883117, 0.0008116883116883117,...","[1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, ..."
4,https://blackwells.co.uk/bookshop/mapping,https://blackwells.co.uk/bookshop/basket,"\n\n\n\n\n\n<!DOCTYPE html>\n<html lang=""en"" c...","(2, 22, 1, 1, 7, 15, 7, 5)",,"[0.008333333333333333, 0.008333333333333333, 0...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."


In [5]:
print("No. of rows and columns")
print("-----------------------")
df.shape

No. of rows and columns
-----------------------


(10919, 7)

In [6]:
print("Check null values")
print("-----------------")
df.isnull().any().any()

Check null values
-----------------


True

In [7]:
print("Check duplicate values")
print("----------------------")
len(df['url'].unique()) != df.shape[0]

Check duplicate values
----------------------


False

In [8]:
print("DataFrame column types")
print("----------------------")
df.info()

DataFrame column types
----------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10919 entries, 0 to 10918
Data columns (total 7 columns):
url               10919 non-null object
referer_url       10919 non-null object
src               10919 non-null object
shingle_vector    10919 non-null object
label             10899 non-null object
tag_count         10919 non-null object
bitset            10919 non-null object
dtypes: object(7)
memory usage: 597.2+ KB


In [11]:
print("Some stats")
print("----------------")
df[['url','referer_url','src','shingle_vector','label']].describe()

Some stats
----------------


Unnamed: 0,url,referer_url,src,shingle_vector,label
count,10919,10919,10919,10919,10899
unique,10919,6375,10525,73,2
top,https://blackwells.co.uk/bookshop/product/101-...,https://blackwells.co.uk/bookshop/home,"\n\n\n \n<!DOCTYPE html>\n<html lang=""e...","(0, 1, 5, 0, 1, 0, 3, 0)",product
freq,1,12,7,2197,10405


In [12]:
fmt_string = 'There are {} row with {} label'
print(fmt_string.format(len(df[df['label'].isnull()]),'no'))
print(fmt_string.format(len(df[df['label']=='product']), 'product'))
print(fmt_string.format(len(df[df['label']=='list']), 'list'))

There are 20 row with no label
There are 10405 row with product label
There are 494 row with list label


## Run MeanShift clustering algorithm

In [13]:
#add top level folder to sys.path
import sys
sys.path.append('../../../')

In [14]:
from astarwars_clustering.clustering import clusteringevaluation
from astarwars_clustering.utils import utility
from astarwars_clustering.clustering.structural_clustering import dbscanclustering, meanshiftclustering

In [19]:
clustering = meanshiftclustering(0.1, df['bitset'].tolist())

Elapsed time to calculate MeanShift clustering:00:05:05.02


In [21]:
predictedLabels = clustering.labels_
noOfClusters = np.unique(predictedLabels)
df['predicted_label'] = predictedLabels
print('There are ' + str(noOfClusters) + 'clusters')
print()
print()
print('Cluster labels:')
noOfClusters

There are [   0    1    2 ... 8038 8039 8040]clusters


Cluster labels:


array([   0,    1,    2, ..., 8038, 8039, 8040])

In [46]:
cluster_fmt = 'cluster n. {} has {} pages'
noOfPages = 0

for index ,el in enumerate(noOfClusters):
    print(cluster_fmt.format(index ,utility.count_occurrences(predictedLabels,el)))

cluster n. 0 has 951 pages
cluster n. 1 has 163 pages
cluster n. 2 has 43 pages
cluster n. 3 has 36 pages
cluster n. 4 has 35 pages
cluster n. 5 has 27 pages
cluster n. 6 has 26 pages
cluster n. 7 has 24 pages
cluster n. 8 has 23 pages
cluster n. 9 has 19 pages
cluster n. 10 has 18 pages
cluster n. 11 has 18 pages
cluster n. 12 has 17 pages
cluster n. 13 has 16 pages
cluster n. 14 has 16 pages
cluster n. 15 has 15 pages
cluster n. 16 has 15 pages
cluster n. 17 has 14 pages
cluster n. 18 has 14 pages
cluster n. 19 has 12 pages
cluster n. 20 has 11 pages
cluster n. 21 has 11 pages
cluster n. 22 has 11 pages
cluster n. 23 has 10 pages
cluster n. 24 has 10 pages
cluster n. 25 has 10 pages
cluster n. 26 has 10 pages
cluster n. 27 has 10 pages
cluster n. 28 has 9 pages
cluster n. 29 has 9 pages
cluster n. 30 has 9 pages
cluster n. 31 has 9 pages
cluster n. 32 has 9 pages
cluster n. 33 has 8 pages
cluster n. 34 has 8 pages
cluster n. 35 has 8 pages
cluster n. 36 has 8 pages
cluster n. 37 has 

cluster n. 338 has 2 pages
cluster n. 339 has 2 pages
cluster n. 340 has 2 pages
cluster n. 341 has 2 pages
cluster n. 342 has 2 pages
cluster n. 343 has 2 pages
cluster n. 344 has 2 pages
cluster n. 345 has 2 pages
cluster n. 346 has 2 pages
cluster n. 347 has 2 pages
cluster n. 348 has 2 pages
cluster n. 349 has 2 pages
cluster n. 350 has 2 pages
cluster n. 351 has 2 pages
cluster n. 352 has 2 pages
cluster n. 353 has 2 pages
cluster n. 354 has 2 pages
cluster n. 355 has 2 pages
cluster n. 356 has 2 pages
cluster n. 357 has 2 pages
cluster n. 358 has 2 pages
cluster n. 359 has 2 pages
cluster n. 360 has 2 pages
cluster n. 361 has 2 pages
cluster n. 362 has 2 pages
cluster n. 363 has 2 pages
cluster n. 364 has 2 pages
cluster n. 365 has 2 pages
cluster n. 366 has 2 pages
cluster n. 367 has 2 pages
cluster n. 368 has 2 pages
cluster n. 369 has 2 pages
cluster n. 370 has 2 pages
cluster n. 371 has 2 pages
cluster n. 372 has 2 pages
cluster n. 373 has 2 pages
cluster n. 374 has 2 pages
c

cluster n. 671 has 2 pages
cluster n. 672 has 2 pages
cluster n. 673 has 2 pages
cluster n. 674 has 2 pages
cluster n. 675 has 2 pages
cluster n. 676 has 2 pages
cluster n. 677 has 2 pages
cluster n. 678 has 2 pages
cluster n. 679 has 2 pages
cluster n. 680 has 2 pages
cluster n. 681 has 2 pages
cluster n. 682 has 2 pages
cluster n. 683 has 2 pages
cluster n. 684 has 2 pages
cluster n. 685 has 2 pages
cluster n. 686 has 2 pages
cluster n. 687 has 2 pages
cluster n. 688 has 2 pages
cluster n. 689 has 2 pages
cluster n. 690 has 2 pages
cluster n. 691 has 2 pages
cluster n. 692 has 2 pages
cluster n. 693 has 2 pages
cluster n. 694 has 2 pages
cluster n. 695 has 2 pages
cluster n. 696 has 2 pages
cluster n. 697 has 2 pages
cluster n. 698 has 2 pages
cluster n. 699 has 2 pages
cluster n. 700 has 2 pages
cluster n. 701 has 2 pages
cluster n. 702 has 2 pages
cluster n. 703 has 2 pages
cluster n. 704 has 2 pages
cluster n. 705 has 2 pages
cluster n. 706 has 2 pages
cluster n. 707 has 2 pages
c

cluster n. 1081 has 1 pages
cluster n. 1082 has 1 pages
cluster n. 1083 has 1 pages
cluster n. 1084 has 1 pages
cluster n. 1085 has 1 pages
cluster n. 1086 has 1 pages
cluster n. 1087 has 1 pages
cluster n. 1088 has 1 pages
cluster n. 1089 has 1 pages
cluster n. 1090 has 1 pages
cluster n. 1091 has 1 pages
cluster n. 1092 has 1 pages
cluster n. 1093 has 1 pages
cluster n. 1094 has 1 pages
cluster n. 1095 has 1 pages
cluster n. 1096 has 1 pages
cluster n. 1097 has 1 pages
cluster n. 1098 has 1 pages
cluster n. 1099 has 1 pages
cluster n. 1100 has 1 pages
cluster n. 1101 has 1 pages
cluster n. 1102 has 1 pages
cluster n. 1103 has 1 pages
cluster n. 1104 has 1 pages
cluster n. 1105 has 1 pages
cluster n. 1106 has 1 pages
cluster n. 1107 has 1 pages
cluster n. 1108 has 1 pages
cluster n. 1109 has 1 pages
cluster n. 1110 has 1 pages
cluster n. 1111 has 1 pages
cluster n. 1112 has 1 pages
cluster n. 1113 has 1 pages
cluster n. 1114 has 1 pages
cluster n. 1115 has 1 pages
cluster n. 1116 has 

cluster n. 1385 has 1 pages
cluster n. 1386 has 1 pages
cluster n. 1387 has 1 pages
cluster n. 1388 has 1 pages
cluster n. 1389 has 1 pages
cluster n. 1390 has 1 pages
cluster n. 1391 has 1 pages
cluster n. 1392 has 1 pages
cluster n. 1393 has 1 pages
cluster n. 1394 has 1 pages
cluster n. 1395 has 1 pages
cluster n. 1396 has 1 pages
cluster n. 1397 has 1 pages
cluster n. 1398 has 1 pages
cluster n. 1399 has 1 pages
cluster n. 1400 has 1 pages
cluster n. 1401 has 1 pages
cluster n. 1402 has 1 pages
cluster n. 1403 has 1 pages
cluster n. 1404 has 1 pages
cluster n. 1405 has 1 pages
cluster n. 1406 has 1 pages
cluster n. 1407 has 1 pages
cluster n. 1408 has 1 pages
cluster n. 1409 has 1 pages
cluster n. 1410 has 1 pages
cluster n. 1411 has 1 pages
cluster n. 1412 has 1 pages
cluster n. 1413 has 1 pages
cluster n. 1414 has 1 pages
cluster n. 1415 has 1 pages
cluster n. 1416 has 1 pages
cluster n. 1417 has 1 pages
cluster n. 1418 has 1 pages
cluster n. 1419 has 1 pages
cluster n. 1420 has 

cluster n. 1684 has 1 pages
cluster n. 1685 has 1 pages
cluster n. 1686 has 1 pages
cluster n. 1687 has 1 pages
cluster n. 1688 has 1 pages
cluster n. 1689 has 1 pages
cluster n. 1690 has 1 pages
cluster n. 1691 has 1 pages
cluster n. 1692 has 1 pages
cluster n. 1693 has 1 pages
cluster n. 1694 has 1 pages
cluster n. 1695 has 1 pages
cluster n. 1696 has 1 pages
cluster n. 1697 has 1 pages
cluster n. 1698 has 1 pages
cluster n. 1699 has 1 pages
cluster n. 1700 has 1 pages
cluster n. 1701 has 1 pages
cluster n. 1702 has 1 pages
cluster n. 1703 has 1 pages
cluster n. 1704 has 1 pages
cluster n. 1705 has 1 pages
cluster n. 1706 has 1 pages
cluster n. 1707 has 1 pages
cluster n. 1708 has 1 pages
cluster n. 1709 has 1 pages
cluster n. 1710 has 1 pages
cluster n. 1711 has 1 pages
cluster n. 1712 has 1 pages
cluster n. 1713 has 1 pages
cluster n. 1714 has 1 pages
cluster n. 1715 has 1 pages
cluster n. 1716 has 1 pages
cluster n. 1717 has 1 pages
cluster n. 1718 has 1 pages
cluster n. 1719 has 

cluster n. 2029 has 1 pages
cluster n. 2030 has 1 pages
cluster n. 2031 has 1 pages
cluster n. 2032 has 1 pages
cluster n. 2033 has 1 pages
cluster n. 2034 has 1 pages
cluster n. 2035 has 1 pages
cluster n. 2036 has 1 pages
cluster n. 2037 has 1 pages
cluster n. 2038 has 1 pages
cluster n. 2039 has 1 pages
cluster n. 2040 has 1 pages
cluster n. 2041 has 1 pages
cluster n. 2042 has 1 pages
cluster n. 2043 has 1 pages
cluster n. 2044 has 1 pages
cluster n. 2045 has 1 pages
cluster n. 2046 has 1 pages
cluster n. 2047 has 1 pages
cluster n. 2048 has 1 pages
cluster n. 2049 has 1 pages
cluster n. 2050 has 1 pages
cluster n. 2051 has 1 pages
cluster n. 2052 has 1 pages
cluster n. 2053 has 1 pages
cluster n. 2054 has 1 pages
cluster n. 2055 has 1 pages
cluster n. 2056 has 1 pages
cluster n. 2057 has 1 pages
cluster n. 2058 has 1 pages
cluster n. 2059 has 1 pages
cluster n. 2060 has 1 pages
cluster n. 2061 has 1 pages
cluster n. 2062 has 1 pages
cluster n. 2063 has 1 pages
cluster n. 2064 has 

cluster n. 2371 has 1 pages
cluster n. 2372 has 1 pages
cluster n. 2373 has 1 pages
cluster n. 2374 has 1 pages
cluster n. 2375 has 1 pages
cluster n. 2376 has 1 pages
cluster n. 2377 has 1 pages
cluster n. 2378 has 1 pages
cluster n. 2379 has 1 pages
cluster n. 2380 has 1 pages
cluster n. 2381 has 1 pages
cluster n. 2382 has 1 pages
cluster n. 2383 has 1 pages
cluster n. 2384 has 1 pages
cluster n. 2385 has 1 pages
cluster n. 2386 has 1 pages
cluster n. 2387 has 1 pages
cluster n. 2388 has 1 pages
cluster n. 2389 has 1 pages
cluster n. 2390 has 1 pages
cluster n. 2391 has 1 pages
cluster n. 2392 has 1 pages
cluster n. 2393 has 1 pages
cluster n. 2394 has 1 pages
cluster n. 2395 has 1 pages
cluster n. 2396 has 1 pages
cluster n. 2397 has 1 pages
cluster n. 2398 has 1 pages
cluster n. 2399 has 1 pages
cluster n. 2400 has 1 pages
cluster n. 2401 has 1 pages
cluster n. 2402 has 1 pages
cluster n. 2403 has 1 pages
cluster n. 2404 has 1 pages
cluster n. 2405 has 1 pages
cluster n. 2406 has 

cluster n. 2715 has 1 pages
cluster n. 2716 has 1 pages
cluster n. 2717 has 1 pages
cluster n. 2718 has 1 pages
cluster n. 2719 has 1 pages
cluster n. 2720 has 1 pages
cluster n. 2721 has 1 pages
cluster n. 2722 has 1 pages
cluster n. 2723 has 1 pages
cluster n. 2724 has 1 pages
cluster n. 2725 has 1 pages
cluster n. 2726 has 1 pages
cluster n. 2727 has 1 pages
cluster n. 2728 has 1 pages
cluster n. 2729 has 1 pages
cluster n. 2730 has 1 pages
cluster n. 2731 has 1 pages
cluster n. 2732 has 1 pages
cluster n. 2733 has 1 pages
cluster n. 2734 has 1 pages
cluster n. 2735 has 1 pages
cluster n. 2736 has 1 pages
cluster n. 2737 has 1 pages
cluster n. 2738 has 1 pages
cluster n. 2739 has 1 pages
cluster n. 2740 has 1 pages
cluster n. 2741 has 1 pages
cluster n. 2742 has 1 pages
cluster n. 2743 has 1 pages
cluster n. 2744 has 1 pages
cluster n. 2745 has 1 pages
cluster n. 2746 has 1 pages
cluster n. 2747 has 1 pages
cluster n. 2748 has 1 pages
cluster n. 2749 has 1 pages
cluster n. 2750 has 

cluster n. 3074 has 1 pages
cluster n. 3075 has 1 pages
cluster n. 3076 has 1 pages
cluster n. 3077 has 1 pages
cluster n. 3078 has 1 pages
cluster n. 3079 has 1 pages
cluster n. 3080 has 1 pages
cluster n. 3081 has 1 pages
cluster n. 3082 has 1 pages
cluster n. 3083 has 1 pages
cluster n. 3084 has 1 pages
cluster n. 3085 has 1 pages
cluster n. 3086 has 1 pages
cluster n. 3087 has 1 pages
cluster n. 3088 has 1 pages
cluster n. 3089 has 1 pages
cluster n. 3090 has 1 pages
cluster n. 3091 has 1 pages
cluster n. 3092 has 1 pages
cluster n. 3093 has 1 pages
cluster n. 3094 has 1 pages
cluster n. 3095 has 1 pages
cluster n. 3096 has 1 pages
cluster n. 3097 has 1 pages
cluster n. 3098 has 1 pages
cluster n. 3099 has 1 pages
cluster n. 3100 has 1 pages
cluster n. 3101 has 1 pages
cluster n. 3102 has 1 pages
cluster n. 3103 has 1 pages
cluster n. 3104 has 1 pages
cluster n. 3105 has 1 pages
cluster n. 3106 has 1 pages
cluster n. 3107 has 1 pages
cluster n. 3108 has 1 pages
cluster n. 3109 has 

cluster n. 3417 has 1 pages
cluster n. 3418 has 1 pages
cluster n. 3419 has 1 pages
cluster n. 3420 has 1 pages
cluster n. 3421 has 1 pages
cluster n. 3422 has 1 pages
cluster n. 3423 has 1 pages
cluster n. 3424 has 1 pages
cluster n. 3425 has 1 pages
cluster n. 3426 has 1 pages
cluster n. 3427 has 1 pages
cluster n. 3428 has 1 pages
cluster n. 3429 has 1 pages
cluster n. 3430 has 1 pages
cluster n. 3431 has 1 pages
cluster n. 3432 has 1 pages
cluster n. 3433 has 1 pages
cluster n. 3434 has 1 pages
cluster n. 3435 has 1 pages
cluster n. 3436 has 1 pages
cluster n. 3437 has 1 pages
cluster n. 3438 has 1 pages
cluster n. 3439 has 1 pages
cluster n. 3440 has 1 pages
cluster n. 3441 has 1 pages
cluster n. 3442 has 1 pages
cluster n. 3443 has 1 pages
cluster n. 3444 has 1 pages
cluster n. 3445 has 1 pages
cluster n. 3446 has 1 pages
cluster n. 3447 has 1 pages
cluster n. 3448 has 1 pages
cluster n. 3449 has 1 pages
cluster n. 3450 has 1 pages
cluster n. 3451 has 1 pages
cluster n. 3452 has 

cluster n. 3774 has 1 pages
cluster n. 3775 has 1 pages
cluster n. 3776 has 1 pages
cluster n. 3777 has 1 pages
cluster n. 3778 has 1 pages
cluster n. 3779 has 1 pages
cluster n. 3780 has 1 pages
cluster n. 3781 has 1 pages
cluster n. 3782 has 1 pages
cluster n. 3783 has 1 pages
cluster n. 3784 has 1 pages
cluster n. 3785 has 1 pages
cluster n. 3786 has 1 pages
cluster n. 3787 has 1 pages
cluster n. 3788 has 1 pages
cluster n. 3789 has 1 pages
cluster n. 3790 has 1 pages
cluster n. 3791 has 1 pages
cluster n. 3792 has 1 pages
cluster n. 3793 has 1 pages
cluster n. 3794 has 1 pages
cluster n. 3795 has 1 pages
cluster n. 3796 has 1 pages
cluster n. 3797 has 1 pages
cluster n. 3798 has 1 pages
cluster n. 3799 has 1 pages
cluster n. 3800 has 1 pages
cluster n. 3801 has 1 pages
cluster n. 3802 has 1 pages
cluster n. 3803 has 1 pages
cluster n. 3804 has 1 pages
cluster n. 3805 has 1 pages
cluster n. 3806 has 1 pages
cluster n. 3807 has 1 pages
cluster n. 3808 has 1 pages
cluster n. 3809 has 

cluster n. 4166 has 1 pages
cluster n. 4167 has 1 pages
cluster n. 4168 has 1 pages
cluster n. 4169 has 1 pages
cluster n. 4170 has 1 pages
cluster n. 4171 has 1 pages
cluster n. 4172 has 1 pages
cluster n. 4173 has 1 pages
cluster n. 4174 has 1 pages
cluster n. 4175 has 1 pages
cluster n. 4176 has 1 pages
cluster n. 4177 has 1 pages
cluster n. 4178 has 1 pages
cluster n. 4179 has 1 pages
cluster n. 4180 has 1 pages
cluster n. 4181 has 1 pages
cluster n. 4182 has 1 pages
cluster n. 4183 has 1 pages
cluster n. 4184 has 1 pages
cluster n. 4185 has 1 pages
cluster n. 4186 has 1 pages
cluster n. 4187 has 1 pages
cluster n. 4188 has 1 pages
cluster n. 4189 has 1 pages
cluster n. 4190 has 1 pages
cluster n. 4191 has 1 pages
cluster n. 4192 has 1 pages
cluster n. 4193 has 1 pages
cluster n. 4194 has 1 pages
cluster n. 4195 has 1 pages
cluster n. 4196 has 1 pages
cluster n. 4197 has 1 pages
cluster n. 4198 has 1 pages
cluster n. 4199 has 1 pages
cluster n. 4200 has 1 pages
cluster n. 4201 has 

cluster n. 4464 has 1 pages
cluster n. 4465 has 1 pages
cluster n. 4466 has 1 pages
cluster n. 4467 has 1 pages
cluster n. 4468 has 1 pages
cluster n. 4469 has 1 pages
cluster n. 4470 has 1 pages
cluster n. 4471 has 1 pages
cluster n. 4472 has 1 pages
cluster n. 4473 has 1 pages
cluster n. 4474 has 1 pages
cluster n. 4475 has 1 pages
cluster n. 4476 has 1 pages
cluster n. 4477 has 1 pages
cluster n. 4478 has 1 pages
cluster n. 4479 has 1 pages
cluster n. 4480 has 1 pages
cluster n. 4481 has 1 pages
cluster n. 4482 has 1 pages
cluster n. 4483 has 1 pages
cluster n. 4484 has 1 pages
cluster n. 4485 has 1 pages
cluster n. 4486 has 1 pages
cluster n. 4487 has 1 pages
cluster n. 4488 has 1 pages
cluster n. 4489 has 1 pages
cluster n. 4490 has 1 pages
cluster n. 4491 has 1 pages
cluster n. 4492 has 1 pages
cluster n. 4493 has 1 pages
cluster n. 4494 has 1 pages
cluster n. 4495 has 1 pages
cluster n. 4496 has 1 pages
cluster n. 4497 has 1 pages
cluster n. 4498 has 1 pages
cluster n. 4499 has 

cluster n. 4761 has 1 pages
cluster n. 4762 has 1 pages
cluster n. 4763 has 1 pages
cluster n. 4764 has 1 pages
cluster n. 4765 has 1 pages
cluster n. 4766 has 1 pages
cluster n. 4767 has 1 pages
cluster n. 4768 has 1 pages
cluster n. 4769 has 1 pages
cluster n. 4770 has 1 pages
cluster n. 4771 has 1 pages
cluster n. 4772 has 1 pages
cluster n. 4773 has 1 pages
cluster n. 4774 has 1 pages
cluster n. 4775 has 1 pages
cluster n. 4776 has 1 pages
cluster n. 4777 has 1 pages
cluster n. 4778 has 1 pages
cluster n. 4779 has 1 pages
cluster n. 4780 has 1 pages
cluster n. 4781 has 1 pages
cluster n. 4782 has 1 pages
cluster n. 4783 has 1 pages
cluster n. 4784 has 1 pages
cluster n. 4785 has 1 pages
cluster n. 4786 has 1 pages
cluster n. 4787 has 1 pages
cluster n. 4788 has 1 pages
cluster n. 4789 has 1 pages
cluster n. 4790 has 1 pages
cluster n. 4791 has 1 pages
cluster n. 4792 has 1 pages
cluster n. 4793 has 1 pages
cluster n. 4794 has 1 pages
cluster n. 4795 has 1 pages
cluster n. 4796 has 

cluster n. 5067 has 1 pages
cluster n. 5068 has 1 pages
cluster n. 5069 has 1 pages
cluster n. 5070 has 1 pages
cluster n. 5071 has 1 pages
cluster n. 5072 has 1 pages
cluster n. 5073 has 1 pages
cluster n. 5074 has 1 pages
cluster n. 5075 has 1 pages
cluster n. 5076 has 1 pages
cluster n. 5077 has 1 pages
cluster n. 5078 has 1 pages
cluster n. 5079 has 1 pages
cluster n. 5080 has 1 pages
cluster n. 5081 has 1 pages
cluster n. 5082 has 1 pages
cluster n. 5083 has 1 pages
cluster n. 5084 has 1 pages
cluster n. 5085 has 1 pages
cluster n. 5086 has 1 pages
cluster n. 5087 has 1 pages
cluster n. 5088 has 1 pages
cluster n. 5089 has 1 pages
cluster n. 5090 has 1 pages
cluster n. 5091 has 1 pages
cluster n. 5092 has 1 pages
cluster n. 5093 has 1 pages
cluster n. 5094 has 1 pages
cluster n. 5095 has 1 pages
cluster n. 5096 has 1 pages
cluster n. 5097 has 1 pages
cluster n. 5098 has 1 pages
cluster n. 5099 has 1 pages
cluster n. 5100 has 1 pages
cluster n. 5101 has 1 pages
cluster n. 5102 has 

cluster n. 5580 has 1 pages
cluster n. 5581 has 1 pages
cluster n. 5582 has 1 pages
cluster n. 5583 has 1 pages
cluster n. 5584 has 1 pages
cluster n. 5585 has 1 pages
cluster n. 5586 has 1 pages
cluster n. 5587 has 1 pages
cluster n. 5588 has 1 pages
cluster n. 5589 has 1 pages
cluster n. 5590 has 1 pages
cluster n. 5591 has 1 pages
cluster n. 5592 has 1 pages
cluster n. 5593 has 1 pages
cluster n. 5594 has 1 pages
cluster n. 5595 has 1 pages
cluster n. 5596 has 1 pages
cluster n. 5597 has 1 pages
cluster n. 5598 has 1 pages
cluster n. 5599 has 1 pages
cluster n. 5600 has 1 pages
cluster n. 5601 has 1 pages
cluster n. 5602 has 1 pages
cluster n. 5603 has 1 pages
cluster n. 5604 has 1 pages
cluster n. 5605 has 1 pages
cluster n. 5606 has 1 pages
cluster n. 5607 has 1 pages
cluster n. 5608 has 1 pages
cluster n. 5609 has 1 pages
cluster n. 5610 has 1 pages
cluster n. 5611 has 1 pages
cluster n. 5612 has 1 pages
cluster n. 5613 has 1 pages
cluster n. 5614 has 1 pages
cluster n. 5615 has 

cluster n. 6265 has 1 pages
cluster n. 6266 has 1 pages
cluster n. 6267 has 1 pages
cluster n. 6268 has 1 pages
cluster n. 6269 has 1 pages
cluster n. 6270 has 1 pages
cluster n. 6271 has 1 pages
cluster n. 6272 has 1 pages
cluster n. 6273 has 1 pages
cluster n. 6274 has 1 pages
cluster n. 6275 has 1 pages
cluster n. 6276 has 1 pages
cluster n. 6277 has 1 pages
cluster n. 6278 has 1 pages
cluster n. 6279 has 1 pages
cluster n. 6280 has 1 pages
cluster n. 6281 has 1 pages
cluster n. 6282 has 1 pages
cluster n. 6283 has 1 pages
cluster n. 6284 has 1 pages
cluster n. 6285 has 1 pages
cluster n. 6286 has 1 pages
cluster n. 6287 has 1 pages
cluster n. 6288 has 1 pages
cluster n. 6289 has 1 pages
cluster n. 6290 has 1 pages
cluster n. 6291 has 1 pages
cluster n. 6292 has 1 pages
cluster n. 6293 has 1 pages
cluster n. 6294 has 1 pages
cluster n. 6295 has 1 pages
cluster n. 6296 has 1 pages
cluster n. 6297 has 1 pages
cluster n. 6298 has 1 pages
cluster n. 6299 has 1 pages
cluster n. 6300 has 

cluster n. 6602 has 1 pages
cluster n. 6603 has 1 pages
cluster n. 6604 has 1 pages
cluster n. 6605 has 1 pages
cluster n. 6606 has 1 pages
cluster n. 6607 has 1 pages
cluster n. 6608 has 1 pages
cluster n. 6609 has 1 pages
cluster n. 6610 has 1 pages
cluster n. 6611 has 1 pages
cluster n. 6612 has 1 pages
cluster n. 6613 has 1 pages
cluster n. 6614 has 1 pages
cluster n. 6615 has 1 pages
cluster n. 6616 has 1 pages
cluster n. 6617 has 1 pages
cluster n. 6618 has 1 pages
cluster n. 6619 has 1 pages
cluster n. 6620 has 1 pages
cluster n. 6621 has 1 pages
cluster n. 6622 has 1 pages
cluster n. 6623 has 1 pages
cluster n. 6624 has 1 pages
cluster n. 6625 has 1 pages
cluster n. 6626 has 1 pages
cluster n. 6627 has 1 pages
cluster n. 6628 has 1 pages
cluster n. 6629 has 1 pages
cluster n. 6630 has 1 pages
cluster n. 6631 has 1 pages
cluster n. 6632 has 1 pages
cluster n. 6633 has 1 pages
cluster n. 6634 has 1 pages
cluster n. 6635 has 1 pages
cluster n. 6636 has 1 pages
cluster n. 6637 has 

cluster n. 7005 has 1 pages
cluster n. 7006 has 1 pages
cluster n. 7007 has 1 pages
cluster n. 7008 has 1 pages
cluster n. 7009 has 1 pages
cluster n. 7010 has 1 pages
cluster n. 7011 has 1 pages
cluster n. 7012 has 1 pages
cluster n. 7013 has 1 pages
cluster n. 7014 has 1 pages
cluster n. 7015 has 1 pages
cluster n. 7016 has 1 pages
cluster n. 7017 has 1 pages
cluster n. 7018 has 1 pages
cluster n. 7019 has 1 pages
cluster n. 7020 has 1 pages
cluster n. 7021 has 1 pages
cluster n. 7022 has 1 pages
cluster n. 7023 has 1 pages
cluster n. 7024 has 1 pages
cluster n. 7025 has 1 pages
cluster n. 7026 has 1 pages
cluster n. 7027 has 1 pages
cluster n. 7028 has 1 pages
cluster n. 7029 has 1 pages
cluster n. 7030 has 1 pages
cluster n. 7031 has 1 pages
cluster n. 7032 has 1 pages
cluster n. 7033 has 1 pages
cluster n. 7034 has 1 pages
cluster n. 7035 has 1 pages
cluster n. 7036 has 1 pages
cluster n. 7037 has 1 pages
cluster n. 7038 has 1 pages
cluster n. 7039 has 1 pages
cluster n. 7040 has 

cluster n. 7492 has 1 pages
cluster n. 7493 has 1 pages
cluster n. 7494 has 1 pages
cluster n. 7495 has 1 pages
cluster n. 7496 has 1 pages
cluster n. 7497 has 1 pages
cluster n. 7498 has 1 pages
cluster n. 7499 has 1 pages
cluster n. 7500 has 1 pages
cluster n. 7501 has 1 pages
cluster n. 7502 has 1 pages
cluster n. 7503 has 1 pages
cluster n. 7504 has 1 pages
cluster n. 7505 has 1 pages
cluster n. 7506 has 1 pages
cluster n. 7507 has 1 pages
cluster n. 7508 has 1 pages
cluster n. 7509 has 1 pages
cluster n. 7510 has 1 pages
cluster n. 7511 has 1 pages
cluster n. 7512 has 1 pages
cluster n. 7513 has 1 pages
cluster n. 7514 has 1 pages
cluster n. 7515 has 1 pages
cluster n. 7516 has 1 pages
cluster n. 7517 has 1 pages
cluster n. 7518 has 1 pages
cluster n. 7519 has 1 pages
cluster n. 7520 has 1 pages
cluster n. 7521 has 1 pages
cluster n. 7522 has 1 pages
cluster n. 7523 has 1 pages
cluster n. 7524 has 1 pages
cluster n. 7525 has 1 pages
cluster n. 7526 has 1 pages
cluster n. 7527 has 

cluster n. 7829 has 1 pages
cluster n. 7830 has 1 pages
cluster n. 7831 has 1 pages
cluster n. 7832 has 1 pages
cluster n. 7833 has 1 pages
cluster n. 7834 has 1 pages
cluster n. 7835 has 1 pages
cluster n. 7836 has 1 pages
cluster n. 7837 has 1 pages
cluster n. 7838 has 1 pages
cluster n. 7839 has 1 pages
cluster n. 7840 has 1 pages
cluster n. 7841 has 1 pages
cluster n. 7842 has 1 pages
cluster n. 7843 has 1 pages
cluster n. 7844 has 1 pages
cluster n. 7845 has 1 pages
cluster n. 7846 has 1 pages
cluster n. 7847 has 1 pages
cluster n. 7848 has 1 pages
cluster n. 7849 has 1 pages
cluster n. 7850 has 1 pages
cluster n. 7851 has 1 pages
cluster n. 7852 has 1 pages
cluster n. 7853 has 1 pages
cluster n. 7854 has 1 pages
cluster n. 7855 has 1 pages
cluster n. 7856 has 1 pages
cluster n. 7857 has 1 pages
cluster n. 7858 has 1 pages
cluster n. 7859 has 1 pages
cluster n. 7860 has 1 pages
cluster n. 7861 has 1 pages
cluster n. 7862 has 1 pages
cluster n. 7863 has 1 pages
cluster n. 7864 has 

In [49]:
df[df['predicted_label'] == 0]['url'].head(10)

209     https://blackwells.co.uk/bookshop/product/The-...
288     https://blackwells.co.uk/bookshop/product/Unde...
289     https://blackwells.co.uk/bookshop/product/Art-...
337     https://blackwells.co.uk/bookshop/product/Harr...
610     https://blackwells.co.uk/bookshop/product/The-...
1957    https://blackwells.co.uk/bookshop/product/Hero...
1966    https://blackwells.co.uk/bookshop/product/Stud...
1992    https://blackwells.co.uk/bookshop/product/Chin...
1997    https://blackwells.co.uk/bookshop/product/Jim-...
2003    https://blackwells.co.uk/bookshop/product/Port...
Name: url, dtype: object

In [50]:
df[df['predicted_label'] == 1]['url'].head(10)

60     https://blackwells.co.uk/bookshop/product/Roof...
80     https://blackwells.co.uk/bookshop/product/The-...
114    https://blackwells.co.uk/bookshop/product/The-...
148    https://blackwells.co.uk/bookshop/product/A-Ma...
161    https://blackwells.co.uk/bookshop/product/Hey-...
169    https://blackwells.co.uk/bookshop/product/The-...
171    https://blackwells.co.uk/bookshop/product/Swan...
174    https://blackwells.co.uk/bookshop/product/Isad...
179    https://blackwells.co.uk/bookshop/product/Best...
195    https://blackwells.co.uk/bookshop/product/A-La...
Name: url, dtype: object

In [51]:
df[df['predicted_label'] == 2]['url'].head(10)

563     https://blackwells.co.uk/bookshop/product/A-Bo...
1757    https://blackwells.co.uk/bookshop/product/Tour...
3704    https://blackwells.co.uk/bookshop/product/The-...
3717    https://blackwells.co.uk/bookshop/product/Phot...
4354    https://blackwells.co.uk/bookshop/product/Whos...
4403    https://blackwells.co.uk/bookshop/product/Nucl...
4434    https://blackwells.co.uk/bookshop/product/Chan...
4459    https://blackwells.co.uk/bookshop/product/Hamm...
4476    https://blackwells.co.uk/bookshop/product/The-...
4478    https://blackwells.co.uk/bookshop/product/Pigg...
Name: url, dtype: object

## Evaluate recall and precision

In [52]:
p1,r1=clusteringevaluation.calculate_precision_and_recall(df,clustering,'product',0)

NameError: name 'clustering' is not defined

In [53]:
p1,r1=clusteringevaluation.calculate_precision_and_recall(df,clustering,'list',0)

NameError: name 'clustering' is not defined

# Clustering with DBSCAN algorithm

In [None]:
dbscanclustering=dbscanclustering