In [1]:
import sys,os
sys.path.append("../")

import pandas as pd
import numpy as np


import matplotlib.pyplot as plt
from IPython.display import HTML
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets



# Example of Change Object Groups. 

This notebook aims to present few example of `Change Object` Groups created by our algorithm on the article of `John Logie Baird`.


 `Change Object:` consists of gaps of inserted and deleted string tokens which were either inserted or removed at contiguous positions. These gaps come between two revisions of an article.


These `Change Object` groups were created by clustering `Change Vectors` created from neighbour tokens equal to `context_length`. Clustering is done using DBSCAN which has two parameters `eps` and `min_sampels`. We evaluate the clusters using V-Measure analysis on human annotated data created to mark the tokens which were about nationality of John Logie Baird.

In order to identify the example of change object clusters which capture the nationality of *John Logie Baird* we use V-measure analysis to show the example of groups. As clusters are done using left and neighbour token Vectors we also show the left and neighbour context tokens along with gap of inserted and deleted tokens.  

**Example of good and bad clusters**

A good example is one which has almost similar tokens in left and right context. As the tokens in context start getting of different kind groups would not be about same gap.

### Reading the change object and human annotations

In [2]:
article_name = "John_Logie_Baird"
evaluation_dir =  "../data/evaluation/"

cluster_file_name = f"{article_name}_clusters.h5"
cluster_file_path = os.path.join(evaluation_dir, cluster_file_name)

evaluation_file_name = f"{article_name}.csv"
evaluation_file_path = os.path.join(evaluation_dir, evaluation_file_name)
   
dbscan_results = pd.read_hdf(cluster_file_path, "clusters")
evaluation_df = pd.read_csv(evaluation_file_path,
                            index_col=["context", "eps","min_samples"])


file_name = article_name + "_FULL.csv"
annotation_dir = "../data/annotation/"
full_file_path = os.path.join(annotation_dir, file_name)
annotation_df = pd.read_csv(full_file_path)
annotation_df = annotation_df[["revid_ctxt", "token_id",
                               "rev_id", "nationality", "birth_place", "Bulk" ]]

## Example of  best and worst clusters 

In [4]:
pd.set_option('display.max_rows', 860)

pd.set_option("display.max_colwidth",1200)

#### Evaluating Homogenity of  clusters
Clusters with smaller value of context_length gives more homogenious clusters. In others words as we increase context value more noise start getting clustered.

In [5]:
evaluation_df.reset_index().set_index([ "context","eps",  "min_samples", ])\
    ["change_object_homegenity"].sort_values().iloc[-10:]

context  eps   min_samples
4        1.50  2              0.755881
2        2.25  2              0.758501
         2.00  2              0.758574
         0.50  2              0.759280
         0.25  2              0.759280
         0.75  2              0.759321
         1.00  2              0.759341
         1.25  2              0.759383
         1.50  2              0.759435
         1.75  2              0.759571
Name: change_object_homegenity, dtype: float64

### Best homogenity

We can see that this cluster has same neighbourhood being changed multiple times.
Context_length = 2 is one of the reason both left and right context has the same token.



Parameters of the algorithm
`Context_length = 2, eps = 1.75 and min_samples = 2`

Next is example of *four* clusters created on above parameters which share exact left and right context tokens.

Neighbouring context token  of *first* group
* Left_context is  \[ nationality = \]
* right_context is \[ other_names = \]

In [6]:
dbscan_results.groupby((2, 1.75, 2,)).get_group(dbscan_results.groupby((2, 1.75, 2,)).size().sort_values(ascending=False).index[8]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]

Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(brittish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
1,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(brittish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
2,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
3,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
4,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
5,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
6,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
7,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(american,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
8,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = america , york | nationality =","(american,)","(scottish,)","| other _ names = | citizenship = america | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow <"
9,]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland | nationality =,"(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"


Neighbouring context token of *second* group

* Left_context is  \[ Citizenship = \]
* right_context is \[ education = \]


In [7]:
dbscan_results.groupby((2, 1.75, 2,)).get_group(dbscan_results.groupby((2, 1.75, 2,)).size().sort_values(ascending=False).index[10]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]



Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"| lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality = scottish | other _ names = | citizenship =","(united, kingdom)","(america,)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow university ]] | occupation ="
1,"| lat | long | display = inline }} --> | monuments = | residence = america , york | nationality = american | other _ names = | citizenship =","(america,)","(united, kingdom)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow university ]] | occupation ="
2,resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland | nationality = scottish | other _ names = | citizenship =,"(united, kingdom)","(argentina,)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow university ]] | occupation ="
3,_ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland | nationality = peru | other _ names = juan | citizenship =,"(argentina,)","(united, kingdom)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow university ]] | occupation ="
4,_ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland | nationality = scottish | other _ names = johnny | citizenship =,"(united, kingdom)","([[, united, kingdom, |, british, ]])","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow university ]] | occupation ="
5,in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] [[ united kingdom ]] | nationality = scottish | other _ names = johnny | citizenship =,"(british,)","(united, kingdom)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ university of strathclyde ]] | occupation"
6,"place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland , england | nationality = scottish | other _ names = | citizenship =","(united, kingdom)","(british,)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] ( now [[ university of strathclyde ]] ) , glasgow | occupation"
7,"place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland , england | nationality = british | other _ names = | citizenship =","(british,)","(cool, place)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] ( now [[ university of strathclyde ]] ) , glasgow | occupation"
8,"place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland , england | nationality = british | other _ names = | citizenship =","(cool, place)","(british,)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] ( now [[ university of strathclyde ]] ) , glasgow | occupation"
9,"place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland , england | nationality = scottish | other _ names = | citizenship =","(british,)","(scottish,)","| education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] ( now [[ university of strathclyde ]] ) , glasgow | occupation"


Neighbouring context token of *third* group

* Left_context is  \[  = scotland \]
* right_context is \[ nationality = \]


In [8]:
dbscan_results.groupby((2, 1.75, 2,)).get_group(dbscan_results.groupby((2, 1.75, 2,)).size().sort_values(ascending=False).index[12]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]



Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland ,","(england,)","(uk,)","| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
1,"helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland ,","(uk,)","(england,)","| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
2,"helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland ,","(england,)","(york,)","| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
3,[[ helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland,"(,, england)",(),"| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
4,"- on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland",(),"(&, trinidad, &, tobago)","| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
5,"- on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland","(&, trinidad, &, tobago)",(),"| nationality = scottish & trinidadian | other _ names = | citizenship = united kingdom & trinidad & tobago | education = [[ larchfield academy ]] , helensburgh | alma"
6,"- on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland",(),"(]], [[, united, kingdom, ]])","| nationality = scottish | other _ names = johnny | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal"
7,"on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland","(]], ,, [[, england, ]])",(),"| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
8,"on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland ,","(england,)","(united, kingdom)","| nationality = scottish | other _ names = | citizenship = british | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college"
9,"on - sea | bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland ,","(united, kingdom)","(england,)","| nationality = scottish | other _ names = | citizenship = british | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college"


Neighbouring context token of *fourth* group


* Left_context is  \[  = Scottish \]
* right_context is \[  | other_names \]



In [9]:
dbscan_results.groupby((2, 1.75, 2,)).get_group(dbscan_results.groupby((2, 1.75, 2,)).size().sort_values(ascending=False).index[13]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]



Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"| bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland | nationality = scottish",(),"(&, trinidadian)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
1,"sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland & trinidad & tobago | nationality = scottish","(&, trinidadian)",(),"| other _ names = | citizenship = united kingdom & trinidad & tobago | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
2,"| bexhill ]] , sussex , england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = scotland | nationality = scottish",(),"(]], [[, british, ]])","| other _ names = johnny | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] ,"
3,_ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] [[ united kingdom ]] | nationality = [[ scotland | scottish,"(]], [[, united, kingdom, |, british, ]])",(),"| other _ names = johnny | citizenship = [[ united kingdom | british ]] | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal"
4,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish",(),"((, british, ))","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
5,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish","((, british, ))",(),"| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
6,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish",(),"((, british, ))","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
7,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish","((, british, ))",(),"| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
8,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish",(),"((, british, ))","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
9,"england | resting _ place = baird family grave in [[ helensburgh cemetery ]] | monuments = | residence = [[ scotland ]] , [[ england ]] | nationality = scottish","((, british, ))",(),"| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"


#### Low value of context length finds clusters with less noise but gives more clusters having the similar neighbourhood.

We have got clusters which were having similar neighbourhood but were clustered differently because we considered only the context length of 2. As this gives us lots of clusters who are similar in their neighbourhood, Next we see Completness and V-measure to find out the clusters which has few noises but takes into account bigger context_length to cluster similar change object neighbour hood in one. Hence giving fewer clusters.

### Completness

In [10]:
evaluation_df.reset_index().set_index([ "context","eps",  "min_samples", ])["change_object_completness"].sort_values().iloc[-10:]

context  eps   min_samples
8        1.50  50             0.138596
30       1.00  50             0.139274
8        1.75  50             0.140918
15       1.75  50             0.165547
20       0.50  50             1.000000
30       0.25  50             1.000000
15       0.25  50             1.000000
25       0.25  50             1.000000
20       0.25  50             1.000000
15       0.50  50             1.000000
Name: change_object_completness, dtype: float64

In [11]:
dbscan_results.groupby((15,0.5,50)).ngroups

1

In [12]:
dbscan_results.groupby((20,0.25,50)).ngroups

1

#### Trivial clusters of length 1
Highest value of completness is one implying everything got clustered as one huge cluster hence all instances of same class are in same cluster. As one is trivial case, we need to move to V-measure to find the clusters who are good indicators of clusters where all the attributes about the same groups are in same cluster.

### V-measure

In [13]:
evaluation_df.reset_index().set_index([ "context","eps",  "min_samples", ])\
    ["change_object_vmeasure"].sort_values().iloc[-20:]

context  eps   min_samples
20       0.75  10             0.138329
15       1.00  20             0.139846
25       0.50  20             0.140066
8        2.00  20             0.142369
10       2.25  50             0.145335
15       1.50  30             0.146098
30       1.25  50             0.146502
25       1.25  50             0.149093
         0.50  10             0.151193
20       1.25  30             0.151276
10       1.75  30             0.158588
20       1.50  50             0.161853
         1.00  20             0.162006
30       0.75  20             0.162820
10       2.00  30             0.168044
         1.50  20             0.178537
30       1.00  50             0.181075
8        1.75  20             0.184929
10       1.75  20             0.190063
15       1.75  50             0.211642
Name: change_object_vmeasure, dtype: float64

#### Clusters with some errors

We show cluster pertainig to top V-measure and try to analyse the errors in clusters 
parameters

`Context_length = 15, eps = 1.75 and min_samples = 50`

##### Following is an example of cluster which still has similar neighbourhood left and right context but has some noise.
First observation is clusters which were different in groups above has started to get combined in one hence leading to larger clusters. Second is few noisy change object is getting clustered due to increased context_length.

---

#### Kinds of noise which gets added in  both the cluster
1. Few of the right context is about education.
2. Other names of john loggie baird

In [14]:

dbscan_results.groupby((15, 1.75, 50,)).get_group(dbscan_results.groupby((15, 1.75, 50,)).size().sort_values(ascending=False).index[2]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]

Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(brittish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
1,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(brittish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
2,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
3,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
4,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
5,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
6,"<!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality = british | other _ names =",(),"(mr, fat)","| citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow"
7,"<!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality = british | other _ names =","(mr, fat)",(),"| citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow"
8,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
9,"<!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality = scottish | other _ names =",(),"(hello,)","| citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow < br > [[ glasgow"


*First three  entry has different neighbourhood and is about the name of the john loggie baird.*

In [15]:

dbscan_results.groupby((15, 1.75, 50,)).get_group(dbscan_results.groupby((15, 1.75, 50,)).size().sort_values(ascending=False).index[4]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]


Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"| image = john logie baird , bust . jpg | image _ width = 250px | caption = bust of john logie baird in [[ helensburgh ]] | name =",(),"(john, logie, baird)","}} | nationality = [[ scotland | scottish ]] | birth _ date = [[ august 13 ]] [[ 1888 ]] | birth _ place = [[ helensburgh ]] , [["
1,"image = john logie baird , bust . jpg | image _ width = 250px | caption = bust of john logie baird in [[ helensburgh ]] | name = {{",(),"(reflist,)","| nationality = [[ scotland | scottish ]] | birth _ date = [[ august 13 ]] [[ 1888 ]] | birth _ place = [[ helensburgh ]] , [[ argyll"
2,"image = john logie baird , bust . jpg | image _ width = 250px | caption = bust of john logie baird in [[ helensburgh ]] | name = {{","(pagename,)",(),"}} | nationality = [[ scotland | scottish ]] | birth _ date = [[ august 13 ]] [[ 1888 ]] | birth _ place = [[ helensburgh ]] , [["
3,{{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality =,"([[, scotland, |)",(),"scottish ]] | birth _ date = [[ august 13 ]] [[ 1888 ]] | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , [[ scotland ]] |"
4,image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality = [[ scotland | scottish,"(]],)",(),"| birth _ date = [[ august 13 ]] [[ 1888 ]] | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , [[ scotland ]] | death _"
5,{{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality =,"(scottish,)","(british,)","| birth _ date = 13 august 1888 | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , scotland | death _ date = 14 june 1946 |"
6,{{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality =,"(british,)","(scottish,)","| birth _ date = 13 august 1888 | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , scotland | death _ date = 14 june 1946 |"
7,{{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality =,"(scottish,)","(british,)","| birth _ date = 13 august 1888 | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , scotland | death _ date = 14 june 1946 |"
8,{{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird | nationality =,"(british,)","(scottish,)","| birth _ date = 13 august 1888 | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , scotland | death _ date = 14 june 1946 |"
9,baird }} {{ infobox engineer | image = jl - baird - scottish - inventor - of - tv . jpg | caption = | name = john logie baird |,"(nationality,)","(scottish,)","= scottish | birth _ date = 13 august 1888 | birth _ place = [[ helensburgh ]] , [[ dunbartonshire ]] , scotland | death _ date = 14 june"


# Identifying worst clusters

**We see the median value of V-measure to see the increase in error in clusters.**

In [16]:
evaluation_df.reset_index().set_index([ "context","eps",  "min_samples", ])\
["change_object_vmeasure"].sort_values().iloc[384:385]

context  eps   min_samples
25       1.75  20             0.047735
Name: change_object_vmeasure, dtype: float64

`parameters context_length=25, eps = 1.75, min_samples=20`

We can see that noise has considerably increased compared to best V-measure

In [17]:
dbscan_results.groupby((25,1.75,20)).get_group(dbscan_results.groupby((25,1.75,20)).size().sort_values(ascending=False).index[2]).reset_index()[[ 'left_context',
     'del_string_tokens',  'ins_string_tokens',   'right_context']]

Unnamed: 0,left_context,del_string_tokens,ins_string_tokens,right_context
0,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(brittish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
1,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(brittish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
2,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
3,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
4,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , england | nationality =","(british,)","(scottish,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
5,"8 | 13 | df = y }} | death _ place = [[ bexhill - on - sea | bexhill ]] , sussex , england | death _ cause =",(),"(stroke,)",| body _ discovered = | resting _ place = baird family grave in [[ helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat |
6,"helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland ,","(england,)","(uk,)","| nationality = scottish | other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical"
7,"resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline }} --> | monuments = | residence = scotland , uk | nationality =","(scottish,)","(british,)","| other _ names = | citizenship = united kingdom | education = [[ larchfield academy ]] , helensburgh | alma _ mater = [[ royal technical college ]] , glasgow"
8,"8 | 13 | df = y }} | death _ place = [[ bexhill - on - sea | bexhill ]] , sussex , england | death _ cause =","(stroke,)","(lachlan, baker, ', s, nose)",| body _ discovered = | resting _ place = baird family grave in [[ helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat |
9,"y }} | death _ place = [[ bexhill - on - sea | bexhill ]] , sussex , england | death _ cause = stroke | body _ discovered =",(),"(in, a, hole)",| resting _ place = baird family grave in [[ helensburgh cemetery ]] | resting _ place _ coordinates = <!-- {{ coord | lat | long | display = inline


In [18]:
pd.set_option('expand_frame_repr', False)
pd.reset_option('display.max_colwidth')