# Notebook 3.3 - Curation-flag-relation

This notebook analyses the relations between items of the SSH Open Marketplace and writes back to the system via two dedicated curation properties: `curation-flag-relation` and `curation-detail` properties.

This notebook flags Marketplace items for which contextualisation quality is low (based on interlinked items), helping Moderators improve the relations between items. 

This notebook is part of a series of 4 notebooks that inform the curation properties used in the SSH Open Marketplace Editorial Dashboard.

It is composed of 3 sections:

0. Requirements to run the notebook
1. Check the number of relations per item
2. Flag items with not enough relations

## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel
utils=hel.Util()

### 0.2 Get the data

In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


### 0.3 A look at the data

In [3]:
df_all_items=pd.concat([df_tool_flat, df_publication_flat, df_trainingmaterials_flat, df_workflows_flat, df_datasets_flat])
df_all_items.head()

Unnamed: 0,id,category,label,persistentId,lastInfoUpdate,status,description,contributors,properties,externalIds,...,thumbnail.info.filename,thumbnail.info.mimeType,thumbnail.info.hasThumbnail,thumbnail.caption,version,thumbnail.info.location.sourceUrl,informationContributor.email,dateCreated,dateLastUpdated,composedOf
0,45953,tool-or-service,140kit,3IAyEp,2021-07-30T16:03:01+0000,approved,140kit provides a management layer for tweet c...,"[{'actor': {'id': 483, 'name': 'Ian Pearce, De...","[{'type': {'code': 'activity', 'label': 'Activ...",[],...,acdh-ch-logo96.png,image/png,True,test thumbnail of uploaded media image,,,,,,
1,49576,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,U3gQrh,2021-09-22T15:51:38+0000,approved,No description provided.,[],[{'type': {'code': 'curation-flag-description'...,[],...,,,,,,,,,,
2,49577,tool-or-service,3DHOP,MnpOWX,2021-09-22T15:51:39+0000,approved,No description provided.,[],[{'type': {'code': 'curation-flag-description'...,[],...,,,,,,,,,,
3,49578,tool-or-service,3DHOP: 3D Heritage Online Presenter,gA7zFN,2021-09-22T15:51:39+0000,approved,No description provided.,[],[{'type': {'code': 'curation-flag-description'...,[],...,,,,,,,,,,
4,49579,tool-or-service,3DReshaper \| 3DReshaper,Q49CiV,2021-09-22T15:51:40+0000,approved,No description provided.,[],[{'type': {'code': 'curation-flag-description'...,[],...,,,,,,,,,,


In [4]:
df_all_items_work=df_all_items[['id', 'persistentId', 'category', 'label', 'description', 'contributors', 'accessibleAt', 'source.label']]
df_all_items_work.tail()

Unnamed: 0,id,persistentId,category,label,description,contributors,accessibleAt,source.label
303,12634,l8gLBb,dataset,Yelp Academic Challenge Dataset,The Yelp dataset is a subset of our businesses...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://www.yelp.com/dataset],Humanities Data
304,12631,xvYQQ4,dataset,YelpCHI,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpchi-dataset/],Humanities Data
305,12632,IdZGtV,dataset,YelpNYC,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpnyc-dataset/],Humanities Data
306,12633,OMny6U,dataset,YelpZIP,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpzip-dataset/],Humanities Data
307,12589,YnEaU0,dataset,"""You Are Where You Tweet: A Content-Based Appr...",This dataset is a collection of scraped public...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://archive.org/details/twitter_cikm_2010],Humanities Data


## 1. Check the number of relations per item

By default, and according to rules set up in the Editorial Guidelines, items in the SSH Open Marketplace should be interlinked.

In [5]:
df_rel_it=utils.getAllRelatedItems()
df_rel_it.sort_values('label').head()

Unnamed: 0,item_persistentId,item_category,item_label,relation.label,persistentId,category,label,workflowId,description,relation.code
235,h8UTUS,tool-or-service,Ethnic and Migrant Minorities (EMM) Survey Reg...,Is related to,4SVucl,tool-or-service,Automated Verification Tool,,The Automatic Verification Tool (AVT) enables ...,is-related-to
313,iQdKk2,tool-or-service,Gephi,Is related to,4SVucl,tool-or-service,Automated Verification Tool,,The Automatic Verification Tool (AVT) enables ...,is-related-to
171,ip7fwu,tool-or-service,DH Press,Is mentioned in,t6q6FW,publication,"""A Pale Reflection of the Violent Truth? Pract...",,No description provided.,is-mentioned-in
1042,CduoOq,tool-or-service,Skype,Is mentioned in,BNGWfl,publication,"""A picture is worth a thousand words""? - From ...",,No description provided.,is-mentioned-in
465,dvJSDD,tool-or-service,Google Books,Is mentioned in,GIb2N9,publication,"""Inventing the Map - "" from 19th-century Pedag...",,No description provided,is-mentioned-in


In [6]:
df_rel_it.shape

(1766, 10)

The following cell code returns a dataframe with all the items having 3 related items, it uses the function __getRelateditems (itemcategories, operator, nrelitems)__  returns a dataframe with items in a specific category having related items. 
The parameters are:

<ul>
    <li>categories: the list of categories. This parameter is mandatory, if the search should be done in all categories the keyword *all* can be used</li>
    <li> the operator.  Optional, accepted values are =, &lt; and &gt; (the default) </li> 
    <li> nrelitems: an optional parameter, if not present all items are returned.</li>
</ul>

In [7]:
df_test_ri=utils.getRelatedItems('all', '<', 3)
df_test_ri.head(7)

Unnamed: 0,MPUrl,persistentId,category,label,relation.label,relitem_persistentId,relItem_category,relItem_label,relItem_description,relation.code,value,workflowId
0,tool-or-service/4SVucl,4SVucl,tool-or-service,Automated Verification Tool,Relates to,h8UTUS,tool-or-service,Ethnic and Migrant Minorities (EMM) Survey Reg...,The EMM Survey Registry—which has been co-crea...,relates-to,2,
1,tool-or-service/4SVucl,4SVucl,tool-or-service,Automated Verification Tool,Relates to,iQdKk2,tool-or-service,Gephi,Gephi is the leading visualization and explora...,relates-to,2,
2,publication/rSRIZx,rSRIZx,publication,"""My Name is Lizzie Bennet - "" Reading, Partici...",Mentions,gNfOzz,tool-or-service,YouTube,http://www.wikidata.org/entity/Q16971117,mentions,1,
3,publication/UK7Sij,UK7Sij,publication,"""On the record"" - transcribing and valorizing ...",Is related to,h2MFPS,tool-or-service,TEI Boilerplate,TEI Boilerplate is a lightweight solution for ...,is-related-to,1,
4,training-material/baDTwh,baDTwh,training-material,2.1 Error rates and ground truth - Text Digiti...,Is mentioned in,KEugTm,step,Test on a subset and assess quality,The first step to take is the definition of a ...,is-mentioned-in,1,qYzKVU
5,tool-or-service/U3gQrh,U3gQrh,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,Is mentioned in,pXppX6,step,Acquisition of a 3D object (by photogrammetry),The photogrammetry method relies on a\n ...,is-mentioned-in,1,MHUaGI
6,training-material/T3JbA2,T3JbA2,training-material,3DHOP - How To,Is mentioned in,mSWgiF,step,Visualisation,Visualize 3D models requires specific (web-bas...,is-mentioned-in,1,B3V0jJ


In [8]:
df_test_ri['property']='relations'
df_test_ri_no_duplicates=df_test_ri[df_test_ri.duplicated(subset=['label'], keep='first')]


## 2. Flag items with with not enough relations

In [9]:
curation_flag_property={"code": "curation-flag-relations"}
curation_detail_property={"code": "curation-detail"}

In [10]:

res_des=mpdata.setPropertyFlags(df_test_ri_no_duplicates, curation_flag_property, curation_detail_property)

Creating log file...
The property: relations, has value 1, in item with pid: tools/m9q6vc, (current version: 43477)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "1"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/n0cSzL, (current version: 63339)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/d2HMjC, (current version: 61434)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/5v0M24, (current version: 63438)
append curation_property_value
append curation_detail_value {'type': {'

<Response [200]>
The property: relations, has value 2, in item with pid: tools/YmnOEu, (current version: 63345)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/N2DBHn, (current version: 55478)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/5yweb6, (current version: 49696)
The property: relations, has value 2, in item with pid: tools/5yweb6, (current version: 49696)
{"description": {"length": "0"}, "relations": "{\"length\": \"2\"}"}
append curation_property_value
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/aZ99Rw, (current version: 62381)
append curation_prope

<Response [200]>
The property: relations, has value 2, in item with pid: tools/iy9UL6, (current version: 62852)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/onidM5, (current version: 54710)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: tools/sGLPyd, (current version: 61665)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [200]>
The property: relations, has value 2, in item with pid: publications/Ja3t2z, (current version: 40185)
append curation_property_value
append curation_detail_value {'type':

<Response [404]>
The property: relations, has value 2, in item with pid: trainingMaterials/dEKR6G, (current version: 36942)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [404]>
The property: relations, has value 2, in item with pid: trainingMaterials/21muUk, (current version: 36456)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [404]>
The property: relations, has value 2, in item with pid: trainingMaterials/ZV6GEg, (current version: 36468)
append curation_property_value
append curation_detail_value {'type': {'code': 'curation-detail'}, 'value': '{"relations": {"length": "2"}}'}
updating item... 
<Response [404]>
The property: relations, has value 2, in item with pid: trainingMaterials/dro7ts, (current version: 36936)
append curation_property_va