# Notebook 3.3 - Curation-flag-relation

This notebook analyses the relations between items of the SSH Open Marketplace and writes back to the system via two dedicated curation properties: `curation-flag-relation` and `curation-detail` properties.

This notebook flags Marketplace items for which contextualisation quality is low (based on interlinked items), helping Moderators improve the relations between items. 

This notebook is part of a series of 4 notebooks that inform the curation properties used in the SSH Open Marketplace Editorial Dashboard.

It is composed of 3 sections:

0. Requirements to run the notebook
1. Check the number of relations per item
2. Flag items with not enough relations

## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel
utils=hel.Util()

### 0.2 Get the data

In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


### 0.3 A look at the data

In [3]:
df_all_items=pd.concat([df_tool_flat, df_publication_flat, df_trainingmaterials_flat, df_workflows_flat, df_datasets_flat])
df_all_items.head()

Unnamed: 0,id,category,label,persistentId,lastInfoUpdate,status,description,contributors,properties,externalIds,...,thumbnail.info.mediaId,thumbnail.info.category,thumbnail.info.filename,thumbnail.info.mimeType,thumbnail.info.hasThumbnail,thumbnail.info.location.sourceUrl,thumbnail.caption,dateCreated,dateLastUpdated,composedOf
0,28230,tool-or-service,140kit,SIU1nO,2021-11-23T17:24:25+0000,approved,140kit provides a management layer for tweet c...,"[{'actor': {'id': 2224, 'name': 'Ian Pearce, D...","[{'type': {'code': 'mode-of-use', 'label': 'Mo...",[],...,,,,,,,,,,
1,36324,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,4gDAHv,2022-01-13T11:49:02+0000,approved,3DF Zephyr\[1\]\[2\] is a commercial photogram...,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,
2,36552,tool-or-service,3DHOP,UcxOmD,2022-01-13T11:50:31+0000,approved,3DHOP (3D Heritage Online Presenter) is an ope...,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,
3,36555,tool-or-service,3DHOP: 3D Heritage Online Presenter,uFIMPQ,2022-01-13T11:50:32+0000,approved,No description provided.,[],[],[],...,,,,,,,,,,
4,36189,tool-or-service,3DReshaper \| 3DReshaper,kAkzuz,2022-01-13T11:47:44+0000,approved,No description provided.,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,


In [4]:
df_all_items_work=df_all_items[['id', 'persistentId', 'category', 'label', 'description', 'contributors', 'accessibleAt', 'source.label']]
df_all_items_work.tail()

Unnamed: 0,id,persistentId,category,label,description,contributors,accessibleAt,source.label
303,12634,l8gLBb,dataset,Yelp Academic Challenge Dataset,The Yelp dataset is a subset of our businesses...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://www.yelp.com/dataset],Humanities Data
304,12631,xvYQQ4,dataset,YelpCHI,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpchi-dataset/],Humanities Data
305,12632,IdZGtV,dataset,YelpNYC,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpnyc-dataset/],Humanities Data
306,12633,OMny6U,dataset,YelpZIP,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpzip-dataset/],Humanities Data
307,12589,YnEaU0,dataset,"""You Are Where You Tweet: A Content-Based Appr...",This dataset is a collection of scraped public...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://archive.org/details/twitter_cikm_2010],Humanities Data


## 1. Check the number of relations per item

By default, and according to rules set up in the Editorial Guidelines, items in the SSH Open Marketplace should be interlinked.

In [5]:
df_rel_it=utils.getAllRelatedItems()
df_rel_it.sort_values('label').head()

Unnamed: 0,item_persistentId,item_category,item_label,relation.label,persistentId,category,label,workflowId,description,relation.code
155,jW7Juy,tool-or-service,MicMac documentation,Is mentioned in,GwNSio,step,Preservation,0BLi83,To convert and represent the information in a ...,is-mentioned-in
156,jW7Juy,tool-or-service,MicMac documentation,Is mentioned in,7iPdD9,step,Visualisation,0BLi83,Decide what the final user will be able to see...,is-mentioned-in
293,gNfOzz,tool-or-service,YouTube,Is mentioned in,rSRIZx,publication,"""My Name is Lizzie Bennet - "" Reading, Partici...",,No description provided,is-mentioned-in
238,h2MFPS,tool-or-service,TEI Boilerplate,Relates to,UK7Sij,publication,"""On the record"" - transcribing and valorizing ...",,Qualitative interviews constitute an important...,relates-to
134,gHCDwq,tool-or-service,Kindle,Is mentioned in,q0d7jU,publication,A Scholarly Edition for Mobile Devices,,No description provided,is-mentioned-in


In [6]:
df_rel_it.shape

(607, 10)

The following cell code returns a dataframe with all the publications having 3 related items, it uses the function __getRelateditems (itemcategories, operator, nrelitems)__  returns a dataframe with items in a specific category having related items. 
The parameters are:

<ul>
    <li>categories: the list of categories. This parameter is mandatory, if the search should be done in all categories the keyword *all* can be used</li>
    <li> the operator.  Optional, accepted values are =, &lt; and &gt; (the default) </li> 
    <li> nrelitems: an optional parameter, if not present all items are returned.</li>
</ul>

In [7]:
df_test_ri=utils.getRelatedItems('all', '<', 3)
df_test_ri.head(7)

Unnamed: 0,MPUrl,persistentId,category,label,relation.label,relitem_persistentId,relItem_category,relItem_label,relItem_description,relation.code,value,workflowId
0,publication/rSRIZx,rSRIZx,publication,"""My Name is Lizzie Bennet - "" Reading, Partici...",Mentions,gNfOzz,tool-or-service,YouTube,http://www.wikidata.org/entity/Q16971117,mentions,1,
1,publication/UK7Sij,UK7Sij,publication,"""On the record"" - transcribing and valorizing ...",Is related to,h2MFPS,tool-or-service,TEI Boilerplate,TEI Boilerplate is a lightweight solution for ...,is-related-to,1,
2,training-material/baDTwh,baDTwh,training-material,2.1 Error rates and ground truth - Text Digiti...,Is mentioned in,KEugTm,step,Test on a subset and assess quality,The first step to take is the definition of a ...,is-mentioned-in,1,qYzKVU
3,tool-or-service/4gDAHv,4gDAHv,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,Is mentioned in,Vb1dhh,step,Georeferencing the surveyed data: Using a GNSS...,Using well materialized Ground Control Points ...,is-mentioned-in,1,wqeZGu
4,tool-or-service/UcxOmD,UcxOmD,tool-or-service,3DHOP,Is mentioned in,mSWgiF,step,Visualisation,Visualize 3D models requires specific (web-bas...,is-mentioned-in,1,B3V0jJ
5,training-material/T3JbA2,T3JbA2,training-material,3DHOP - How To,Is mentioned in,mSWgiF,step,Visualisation,Visualize 3D models requires specific (web-bas...,is-mentioned-in,1,B3V0jJ
6,tool-or-service/uFIMPQ,uFIMPQ,tool-or-service,3DHOP: 3D Heritage Online Presenter,Is mentioned in,mSWgiF,step,Visualisation,Visualize 3D models requires specific (web-bas...,is-mentioned-in,1,B3V0jJ


In [8]:
df_test_ri.shape

(423, 12)

## 2. Flag items with with not enough relations

In [None]:
curation_flag_property={"code": "curation-flag-relations"}
curation_detail_property={"code": "curation-detail"}