# Notebook 4.3 - Actors curation: duplicates

This notebook gather several checks that can be run together or independently of each other. The set of these checks helps moderators to curate Duplicated actors in the SSH Open Marketplace. 

This notebook is composed of 6 sections:

0. Requirements to run this notebook
1. Get actors 
2. Duplicated actors 
2.1 Get duplicates for actors
    2.2 Compare duplicated actors
    2.3 Merge duplicated actors


## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import pandas as pd #to manage dataframes
import matplotlib.pyplot as plt #to create histograms and images
import seaborn as sns #to create histograms and images
import numpy as np #to manage json objects
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

In [2]:
mpdata = mpd()
utils=hel.Util()
check=eva.URLCheck()

In [3]:
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


## 1. Get actors

In [4]:
df_actors_flat =mpdata.getMPItems ("actors", True)

getting data from local repository...


In [5]:
df_actors_flat.tail()

Unnamed: 0,id,name,externalIds,affiliations,website
8977,8062,Zong Peng,"[{'identifierService': {'code': 'DBLP', 'label...",[],
8978,2029,Zoomify Inc.,[{'identifierService': {'code': 'SourceActorId...,[],
8979,218,Zoomify Inc.,[],[],
8980,1590,Zoppi Angela,[],[],
8981,7819,Zsófia Fellegi,"[{'identifierService': {'code': 'DBLP', 'label...",[],


## 2. Duplicated actors
    2.1 Get duplicates for actors using *actor.name* and *actor.website* as filters
    2.2 Compare duplicated actors (optional)
    2.4 Merge duplicated actors

### 2.1 Get duplicates for actors using *actor.name* and *actor.website* as filter

In [6]:
utils=hel.Util()
filter_attribute='name, website'
df_actor_duplicates=utils.getDuplicates(df_actors_flat, filter_attribute)
dupl_actor_website=df_actor_duplicates[df_actor_duplicates['website'].notnull()].sort_values('name')

In [7]:
print (f'Using the attributes "{filter_attribute}" as filter, there are: {dupl_actor_website.shape[0]} duplicated actors')

Using the attributes "name, website" as filter, there are: 711 duplicated actors


In [9]:
actorwebsite_tomerge=dupl_actor_website.groupby(['name','website'])['id'].apply(list).reset_index(name='idtomerge')

In [10]:
actorwebsite_tomerge.head()

Unnamed: 0,name,website,idtomerge
0,ARTFL Project and Digital Library Development ...,http://artfl-project.uchicago.edu/,"[842, 2720]"
1,AT&T Research,http://www.research.att.com/,"[2566, 701]"
2,ATLAS.ti Scientific Software Development GmbH,http://www.atlasti.com/copyright.html,"[25, 1828]"
3,Alan Liu,http://liu.english.ucsb.edu/,"[1954, 149]"
4,Alan Reed,http://www.textworld.com/,"[493, 2332]"


In [11]:
#The number of actors with more than one duplicate
actorwebsite_tomerge[actorwebsite_tomerge.idtomerge.map(len)>2].count()

name         23
website      23
idtomerge    23
dtype: int64

### 2.2 Compare duplicated actors

In [12]:
#id of duplicated actors
ids=[2720, 842]
compareitems=df_actor_duplicates[df_actor_duplicates.id.isin(ids)]

In [13]:
css_equal="font-size:1.5rem; border: 2px solid silver;background-color: white; padding: 10px 20px"
css_diff="background-color: lightyellow;  font-size:1.5rem; border: 2px solid silver; padding: 10px 20px"

In [14]:
#view items
showdiff = compareitems.T.style.apply(lambda x: [css_equal if ((len(utils.lists_to_list(x.values))==1) ) else css_diff for i in x],
                    axis=1)
showdiff

Unnamed: 0,139,140
MPUrl,actors/2720,actors/842
id,2720,842
name,"ARTFL Project and Digital Library Development Centre, University of Chicago","ARTFL Project and Digital Library Development Centre, University of Chicago"
externalIds,"[{'identifierService': {'code': 'SourceActorId', 'label': 'Source ActorId', 'ord': 7, 'urlTemplate': ''}, 'identifier': '1-fbf7d044b032ca104a0b1d4b0ab312e1629158169a81d2ce1869f28d61da5c9e'}]",[]
affiliations,[],[]
website,http://artfl-project.uchicago.edu/,http://artfl-project.uchicago.edu/


### 2.3 Merge items

POST /api/actors/{id}/merge


In [16]:
#mpdata.postMergedActors('2505', '2266')
for item in actorwebsite_tomerge.itertuples():
    print(item.idtomerge[0], item.idtomerge[1])
    mpdata.postMergedActors(str(item.idtomerge[0]), str(item.idtomerge[1:]))

842 2720
Merging actor 842 with actor(s) [2720]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/842/merge?with=[2720]
...not executed, running in DEBUG mode.
2566 701
Merging actor 2566 with actor(s) [701]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2566/merge?with=[701]
...not executed, running in DEBUG mode.
25 1828
Merging actor 25 with actor(s) [1828]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/25/merge?with=[1828]
...not executed, running in DEBUG mode.
1954 149
Merging actor 1954 with actor(s) [149]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1954/merge?with=[149]
...not executed, running in DEBUG mode.
493 2332
Merging actor 493 with actor(s) [2332]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/493/merge?with=[2332]
...not executed, running in DEBUG mode.
3056 3096
Merging actor 3056 with actor(s) [3096, 1129]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3056/merge?with=[3096,1129]
...not executed

Merging actor 966 with actor(s) [2086]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/966/merge?with=[2086]
...not executed, running in DEBUG mode.
1974 170
Merging actor 1974 with actor(s) [170]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1974/merge?with=[170]
...not executed, running in DEBUG mode.
839 2717
Merging actor 839 with actor(s) [2717]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/839/merge?with=[2717]
...not executed, running in DEBUG mode.
626 2481
Merging actor 626 with actor(s) [2481]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/626/merge?with=[2481]
...not executed, running in DEBUG mode.
597 2450
Merging actor 597 with actor(s) [2450]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/597/merge?with=[2450]
...not executed, running in DEBUG mode.
2653 780
Merging actor 2653 with actor(s) [780]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2653/merge?with=[780]
...not executed, running in DEBUG m

Merging actor 744 with actor(s) [2612]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/744/merge?with=[2612]
...not executed, running in DEBUG mode.
2119 301
Merging actor 2119 with actor(s) [301]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2119/merge?with=[301]
...not executed, running in DEBUG mode.
692 2556
Merging actor 692 with actor(s) [2556]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/692/merge?with=[2556]
...not executed, running in DEBUG mode.
901 2784
Merging actor 901 with actor(s) [2784]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/901/merge?with=[2784]
...not executed, running in DEBUG mode.
752 2621
Merging actor 752 with actor(s) [2621]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/752/merge?with=[2621]
...not executed, running in DEBUG mode.
476 2314
Merging actor 476 with actor(s) [2314]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/476/merge?with=[2314]
...not executed, running in DEBUG m

Merging actor 3025 with actor(s) [1034, 3077]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3025/merge?with=[1034,3077]
...not executed, running in DEBUG mode.
971 2171
Merging actor 971 with actor(s) [2171]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/971/merge?with=[2171]
...not executed, running in DEBUG mode.
735 2603
Merging actor 735 with actor(s) [2603]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/735/merge?with=[2603]
...not executed, running in DEBUG mode.
459 2295
Merging actor 459 with actor(s) [2295]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/459/merge?with=[2295]
...not executed, running in DEBUG mode.
719 2586
Merging actor 719 with actor(s) [2586]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/719/merge?with=[2586]
...not executed, running in DEBUG mode.
318 2137
Merging actor 318 with actor(s) [2137]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/318/merge?with=[2137]
...not executed, runni

Merging actor 766 with actor(s) [2637]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/766/merge?with=[2637]
...not executed, running in DEBUG mode.
1962 158
Merging actor 1962 with actor(s) [158]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1962/merge?with=[158]
...not executed, running in DEBUG mode.
705 2570
Merging actor 705 with actor(s) [2570]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/705/merge?with=[2570]
...not executed, running in DEBUG mode.
3086 1096
Merging actor 3086 with actor(s) [1096, 3036]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3086/merge?with=[1096,3036]
...not executed, running in DEBUG mode.
2430 580
Merging actor 2430 with actor(s) [580]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2430/merge?with=[580]
...not executed, running in DEBUG mode.
915 2802
Merging actor 915 with actor(s) [2802]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/915/merge?with=[2802]
...not executed, runn

Merging actor 2048 with actor(s) [237]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2048/merge?with=[237]
...not executed, running in DEBUG mode.
1865 61
Merging actor 1865 with actor(s) [61]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1865/merge?with=[61]
...not executed, running in DEBUG mode.
2819 930
Merging actor 2819 with actor(s) [930]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2819/merge?with=[930]
...not executed, running in DEBUG mode.
2795 908
Merging actor 2795 with actor(s) [908]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2795/merge?with=[908]
...not executed, running in DEBUG mode.
275 2091
Merging actor 275 with actor(s) [2091]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/275/merge?with=[2091]
...not executed, running in DEBUG mode.
2071 258
Merging actor 2071 with actor(s) [258]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2071/merge?with=[258]
...not executed, running in DEBUG mode

Merging actor 1979 with actor(s) [1871]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1979/merge?with=[1871]
...not executed, running in DEBUG mode.
2659 786
Merging actor 2659 with actor(s) [786]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2659/merge?with=[786]
...not executed, running in DEBUG mode.
2231 398
Merging actor 2231 with actor(s) [398]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2231/merge?with=[398]
...not executed, running in DEBUG mode.
822 2699
Merging actor 822 with actor(s) [2699]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/822/merge?with=[2699]
...not executed, running in DEBUG mode.
3059 3099
Merging actor 3059 with actor(s) [3099, 1132]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3059/merge?with=[3099,1132]
...not executed, running in DEBUG mode.
791 2664
Merging actor 791 with actor(s) [2664]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/791/merge?with=[2664]
...not executed, ru

Merging actor 2579 with actor(s) [713]...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2579/merge?with=[713]
...not executed, running in DEBUG mode.
