# Notebook 4.3 - Actors curation: duplicates

This notebook gather several checks that can be run together or independently of each other. The set of these checks helps moderators to curate Duplicated actors in the SSH Open Marketplace. 

This notebook is composed of 6 sections:

0. Requirements to run this notebook
1. Get actors 
2. Duplicated actors 
2.1 Get duplicates for actors
    2.2 Compare duplicated actors
    2.3 Merge duplicated actors


## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [2]:
import pandas as pd #to manage dataframes
import matplotlib.pyplot as plt #to create histograms and images
import seaborn as sns #to create histograms and images
import numpy as np #to manage json objects
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

In [3]:
mpdata = mpd()
utils=hel.Util()
check=eva.URLCheck()

In [4]:
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


## 1. Get actors

In [5]:
df_actors_flat =mpdata.getMPItems ("actors", True)

getting data from local repository...


In [6]:
df_actors_flat.tail()

Unnamed: 0,id,name,externalIds,affiliations,website
8965,8062,Zong Peng,"[{'identifierService': {'code': 'DBLP', 'label...",[],
8966,218,Zoomify Inc.,[],[],
8967,2029,Zoomify Inc.,[{'identifierService': {'code': 'SourceActorId...,[],
8968,1590,Zoppi Angela,[],[],
8969,7819,Zsófia Fellegi,"[{'identifierService': {'code': 'DBLP', 'label...",[],


## 2. Duplicated actors
    2.1 Get duplicates for actors using *actor.name* and *actor.website* as filters
    2.2 Compare duplicated actors (optional)
    2.4 Merge duplicated actors

### 2.1 Get duplicates for actors using *actor.name* and *actor.website* as filter

In [7]:
utils=hel.Util()
filter_attribute='name, website'
df_actor_duplicates=utils.getDuplicates(df_actors_flat, filter_attribute)
dupl_actor_website=df_actor_duplicates[df_actor_duplicates['website'].notnull()].sort_values('name')

In [9]:
print (f'Using the attributes "{filter_attribute}" as filter, there are: {dupl_actor_website.shape[0]} duplicated actors')

Using the attributes "name, website" as filter, there are: 716 duplicated actors


In [10]:
actorwebsite_tomerge=dupl_actor_website.groupby(['name','website'])['id'].apply(list).reset_index(name='idtomerge')

In [12]:
actorwebsite_tomerge.head()

Unnamed: 0,name,website,idtomerge
0,ARTFL Project and Digital Library Development ...,http://artfl-project.uchicago.edu/,"[2720, 842]"
1,AT&T Research,http://www.research.att.com/,"[2566, 701]"
2,ATLAS.ti Scientific Software Development GmbH,http://www.atlasti.com/copyright.html,"[25, 1828]"
3,Adam Crymble,http://adamcrymble.org,"[3020, 3020]"
4,Alan Liu,http://liu.english.ucsb.edu/,"[1954, 149]"


In [13]:
#The number of actors with more than one duplicate
actorwebsite_tomerge[actorwebsite_tomerge.idtomerge.map(len)>2].count()

name         23
website      23
idtomerge    23
dtype: int64

### 2.2 Compare duplicated actors

In [14]:
#id of duplicated actors
ids=[2720, 842]
compareitems=df_actor_duplicates[df_actor_duplicates.id.isin(ids)]

In [15]:
css_equal="font-size:1.5rem; border: 2px solid silver;background-color: white; padding: 10px 20px"
css_diff="background-color: lightyellow;  font-size:1.5rem; border: 2px solid silver; padding: 10px 20px"

In [16]:
#view items
showdiff = compareitems.T.style.apply(lambda x: [css_equal if ((len(utils.lists_to_list(x.values))==1) ) else css_diff for i in x],
                    axis=1)
showdiff

Unnamed: 0,669,670
MPUrl,actors/2720,actors/842
id,2720,842
name,"ARTFL Project and Digital Library Development Centre, University of Chicago","ARTFL Project and Digital Library Development Centre, University of Chicago"
externalIds,"[{'identifierService': {'code': 'SourceActorId', 'label': 'Source ActorId', 'ord': 7, 'urlTemplate': ''}, 'identifier': '1-fbf7d044b032ca104a0b1d4b0ab312e1629158169a81d2ce1869f28d61da5c9e'}]",[]
affiliations,[],[]
website,http://artfl-project.uchicago.edu/,http://artfl-project.uchicago.edu/


### 2.3 Merge items

POST /api/actors/{id}/merge


In [24]:
#mpdata.postMergedActors('2505', '2266')
for item in actorwebsite_tomerge.itertuples():
    if len(item.idtomerge)>2:
        continue
    print(item.idtomerge[0], item.idtomerge[1])
    mpdata.postMergedActors(str(item.idtomerge[0]), str(item.idtomerge[1]))

2720 842
Merging actor 2720 with actor(s) 842...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2720/merge?with=842
...not executed, running in DEBUG mode.
2566 701
Merging actor 2566 with actor(s) 701...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2566/merge?with=701
...not executed, running in DEBUG mode.
25 1828
Merging actor 25 with actor(s) 1828...
URL: https://marketplace-api.sshopencloud.eu/api/actors/25/merge?with=1828
...not executed, running in DEBUG mode.
3020 3020
Merging actor 3020 with actor(s) 3020...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3020/merge?with=3020
...not executed, running in DEBUG mode.
1954 149
Merging actor 1954 with actor(s) 149...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1954/merge?with=149
...not executed, running in DEBUG mode.
2332 493
Merging actor 2332 with actor(s) 493...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2332/merge?with=493
...not executed, running in DEBUG mode.
192 2000
M

Merging actor 780 with actor(s) 2653...
URL: https://marketplace-api.sshopencloud.eu/api/actors/780/merge?with=2653
...not executed, running in DEBUG mode.
484 2323
Merging actor 484 with actor(s) 2323...
URL: https://marketplace-api.sshopencloud.eu/api/actors/484/merge?with=2323
...not executed, running in DEBUG mode.
554 2401
Merging actor 554 with actor(s) 2401...
URL: https://marketplace-api.sshopencloud.eu/api/actors/554/merge?with=2401
...not executed, running in DEBUG mode.
1957 152
Merging actor 1957 with actor(s) 152...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1957/merge?with=152
...not executed, running in DEBUG mode.
1507 3801
Merging actor 1507 with actor(s) 3801...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1507/merge?with=3801
...not executed, running in DEBUG mode.
2711 834
Merging actor 2711 with actor(s) 834...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2711/merge?with=834
...not executed, running in DEBUG mode.
286 2102
Merging

Merging actor 562 with actor(s) 2409...
URL: https://marketplace-api.sshopencloud.eu/api/actors/562/merge?with=2409
...not executed, running in DEBUG mode.
1838 35
Merging actor 1838 with actor(s) 35...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1838/merge?with=35
...not executed, running in DEBUG mode.
2084 269
Merging actor 2084 with actor(s) 269...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2084/merge?with=269
...not executed, running in DEBUG mode.
3812 56
Merging actor 3812 with actor(s) 56...
URL: https://marketplace-api.sshopencloud.eu/api/actors/3812/merge?with=56
...not executed, running in DEBUG mode.
396 2229
Merging actor 396 with actor(s) 2229...
URL: https://marketplace-api.sshopencloud.eu/api/actors/396/merge?with=2229
...not executed, running in DEBUG mode.
425 2257
Merging actor 425 with actor(s) 2257...
URL: https://marketplace-api.sshopencloud.eu/api/actors/425/merge?with=2257
...not executed, running in DEBUG mode.
796 2671
Merging actor 79

Merging actor 981 with actor(s) 2484...
URL: https://marketplace-api.sshopencloud.eu/api/actors/981/merge?with=2484
...not executed, running in DEBUG mode.
737 2605
Merging actor 737 with actor(s) 2605...
URL: https://marketplace-api.sshopencloud.eu/api/actors/737/merge?with=2605
...not executed, running in DEBUG mode.
1892 87
Merging actor 1892 with actor(s) 87...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1892/merge?with=87
...not executed, running in DEBUG mode.
511 2351
Merging actor 511 with actor(s) 2351...
URL: https://marketplace-api.sshopencloud.eu/api/actors/511/merge?with=2351
...not executed, running in DEBUG mode.
862 2741
Merging actor 862 with actor(s) 2741...
URL: https://marketplace-api.sshopencloud.eu/api/actors/862/merge?with=2741
...not executed, running in DEBUG mode.
2644 197
Merging actor 2644 with actor(s) 197...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2644/merge?with=197
...not executed, running in DEBUG mode.
2006 962
Merging actor

Merging actor 2574 with actor(s) 709...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2574/merge?with=709
...not executed, running in DEBUG mode.
870 2751
Merging actor 870 with actor(s) 2751...
URL: https://marketplace-api.sshopencloud.eu/api/actors/870/merge?with=2751
...not executed, running in DEBUG mode.
688 2550
Merging actor 688 with actor(s) 2550...
URL: https://marketplace-api.sshopencloud.eu/api/actors/688/merge?with=2550
...not executed, running in DEBUG mode.
711 2577
Merging actor 711 with actor(s) 2577...
URL: https://marketplace-api.sshopencloud.eu/api/actors/711/merge?with=2577
...not executed, running in DEBUG mode.
2853 2830
Merging actor 2853 with actor(s) 2830...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2853/merge?with=2830
...not executed, running in DEBUG mode.
2822 2845
Merging actor 2822 with actor(s) 2845...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2822/merge?with=2845
...not executed, running in DEBUG mode.
253 2073
Merg

Merging actor 2419 with actor(s) 571...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2419/merge?with=571
...not executed, running in DEBUG mode.
310 2129
Merging actor 310 with actor(s) 2129...
URL: https://marketplace-api.sshopencloud.eu/api/actors/310/merge?with=2129
...not executed, running in DEBUG mode.
878 3802
Merging actor 878 with actor(s) 3802...
URL: https://marketplace-api.sshopencloud.eu/api/actors/878/merge?with=3802
...not executed, running in DEBUG mode.
622 2477
Merging actor 622 with actor(s) 2477...
URL: https://marketplace-api.sshopencloud.eu/api/actors/622/merge?with=2477
...not executed, running in DEBUG mode.
1811 9
Merging actor 1811 with actor(s) 9...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1811/merge?with=9
...not executed, running in DEBUG mode.
1845 42
Merging actor 1845 with actor(s) 42...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1845/merge?with=42
...not executed, running in DEBUG mode.
1949 144
Merging actor 1949 

Merging actor 1879 with actor(s) 230...
URL: https://marketplace-api.sshopencloud.eu/api/actors/1879/merge?with=230
...not executed, running in DEBUG mode.
2384 539
Merging actor 2384 with actor(s) 539...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2384/merge?with=539
...not executed, running in DEBUG mode.
191 1999
Merging actor 191 with actor(s) 1999...
URL: https://marketplace-api.sshopencloud.eu/api/actors/191/merge?with=1999
...not executed, running in DEBUG mode.
446 2280
Merging actor 446 with actor(s) 2280...
URL: https://marketplace-api.sshopencloud.eu/api/actors/446/merge?with=2280
...not executed, running in DEBUG mode.
2684 808
Merging actor 2684 with actor(s) 808...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2684/merge?with=808
...not executed, running in DEBUG mode.
2317 479
Merging actor 2317 with actor(s) 479...
URL: https://marketplace-api.sshopencloud.eu/api/actors/2317/merge?with=479
...not executed, running in DEBUG mode.
7 1809
Merging acto