# Notebook 3.3 - Curation-flag-relation

This notebook analyses the relations between items of the SSH Open Marketplace and writes back to the system via two dedicated curation properties: `curation-flag-relation` and `curation-detail` properties.

This notebook flags Marketplace items for which contextualisation quality is low (based on interlinked items), helping Moderators improve the relations between items. 

This notebook is part of a series of 4 notebooks that inform the curation properties used in the SSH Open Marketplace Editorial Dashboard.

It is composed of 3 sections:

0. Requirements to run the notebook
1. Check the number of relations per item
2. Flag items with not enough relations

## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel
utils=hel.Util()

### 0.2 Get the data

In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


### 0.3 A look at the data

In [3]:
df_all_items=pd.concat([df_tool_flat, df_publication_flat, df_trainingmaterials_flat, df_workflows_flat, df_datasets_flat])
#df_all_items.iloc[1].relatedItems

In [4]:
df_all_items_work=df_all_items[['id', 'persistentId', 'category', 'label', 'description', 'contributors', 'accessibleAt', 'source.label']]
df_all_items_work.tail()

Unnamed: 0,id,persistentId,category,label,description,contributors,accessibleAt,source.label
305,12634,l8gLBb,dataset,Yelp Academic Challenge Dataset,The Yelp dataset is a subset of our businesses...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://www.yelp.com/dataset],Humanities Data
306,12631,xvYQQ4,dataset,YelpCHI,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpchi-dataset/],Humanities Data
307,12632,IdZGtV,dataset,YelpNYC,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpnyc-dataset/],Humanities Data
308,12633,OMny6U,dataset,YelpZIP,This dataset is collected from Yelp.com and fi...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[http://odds.cs.stonybrook.edu/yelpzip-dataset/],Humanities Data
309,12589,YnEaU0,dataset,"""You Are Where You Tweet: A Content-Based Appr...",This dataset is a collection of scraped public...,"[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://archive.org/details/twitter_cikm_2010],Humanities Data


## 1. Check the number of relations per item

By default, and according to rules set up in the Editorial Guidelines, items in the SSH Open Marketplace should be interlinked.

In [9]:
df_rel_it=utils.getAllRelatedItems()
df_rel_it.sort_values('label').head()

Unnamed: 0,item_persistentId,item_category,item_label,relation.label,persistentId,category,label,workflowId,description,relation.code
336,gNfOzz,tool-or-service,YouTube,Is mentioned in,rSRIZx,publication,"""My Name is Lizzie Bennet - "" Reading, Partici...",,No description provided,is-mentioned-in
273,h2MFPS,tool-or-service,TEI Boilerplate,Relates to,UK7Sij,publication,"""On the record"" - transcribing and valorizing ...",,Qualitative interviews constitute an important...,relates-to
149,gHCDwq,tool-or-service,Kindle,Is mentioned in,q0d7jU,publication,A Scholarly Edition for Mobile Devices,,No description provided,is-mentioned-in
208,Ov8Pwn,tool-or-service,Omeka,Is mentioned in,rhl8Z2,publication,"A Seat At ""La Tawola""",,No description provided,is-mentioned-in
136,Px3QPk,tool-or-service,InDesign,Is mentioned in,MEJQbE,publication,A World of Difference - Myths and misconceptio...,,No description provided,is-mentioned-in


In [6]:
# df_rel_it.shape

The following cell code returns a dataframe with all the items having 3 related items, it uses the function __getRelateditems (itemcategories, operator, nrelitems)__  returns a dataframe with items in a specific category having related items. 
The parameters are:

<ul>
    <li>categories: the list of categories. This parameter is mandatory, if the search should be done in all categories the keyword *all* can be used</li>
    <li> the operator.  Optional, accepted values are =, &lt; and &gt; (the default) </li> 
    <li> nrelitems: an optional parameter, if not present all items are returned.</li>
</ul>

*Warning: the development of the function getRelatedItems is in progress*

In [7]:
df_test_ri=utils.getRelatedItems('all', '<',1)
df_test_ri.head()

Unnamed: 0,MPUrl,persistentId,category,label,relatedItems,value
0,dataset/soZi7a,soZi7a,dataset,"""A BLAST-based, Language-agnostic Text Reuse A...",[],0
1,publication/kqQFjG,kqQFjG,publication,"""A Model for International Cooperation - Emble...",[],0
2,publication/PTeiK8,PTeiK8,publication,"""A Pale Reflection of the Violent Truth? Pract...",[],0
3,publication/lVUU0N,lVUU0N,publication,"""A Trace of this Journey"" - Citations of Digit...",[],0
4,dataset/PODPq9,PODPq9,dataset,"""A Visual Style in Two Network Sitcoms"" data",[],0


In [8]:
df_test_ri=utils.getRelatedItems('all', '=',2)
df_test_ri.head()

Unnamed: 0,MPUrl,persistentId,category,label,relation.label,relitem_persistentId,relItem_category,relItem_label,relItem_description,relation.code,value,workflowId
0,tool-or-service/isedTr,isedTr,tool-or-service,Agisoft Metashape,Is mentioned in,R6XaVm,step,Data processing: Create a 3D model of the site...,This step is related to the surveyed object an...,is-mentioned-in,2,wqeZGu
1,tool-or-service/isedTr,isedTr,tool-or-service,Agisoft Metashape,Is mentioned in,Vb1dhh,step,Georeferencing the surveyed data: Using a GNSS...,Using well materialized Ground Control Points ...,is-mentioned-in,2,wqeZGu
2,publication/Ja3t2z,Ja3t2z,publication,An update on automatic transcription vs. manua...,Mentions,cIP0cd,tool-or-service,Mozilla Firefox,http://www.wikidata.org/entity/Q3866728,mentions,2,
3,publication/Ja3t2z,Ja3t2z,publication,An update on automatic transcription vs. manua...,Mentions,gNfOzz,tool-or-service,YouTube,http://www.wikidata.org/entity/Q16971117,mentions,2,
4,tool-or-service/r43yWg,r43yWg,tool-or-service,Apache OpenNLP,Is mentioned in,s89flK,step,Using an existing NLP pipeline,"In practice, the above steps are combined to p...",is-mentioned-in,2,x1hL0m


In [None]:
df_test_ri['property']='relations'
df_test_ri_no_duplicates=df_test_ri.drop_duplicates(subset=['label'], keep='first')
df_test_ri_no_duplicates.head()

## 2. Flag items with with not enough relations

In [None]:
curation_flag_property={"code": "curation-flag-relations"}
curation_detail_property={"code": "curation-detail"}

In [None]:

res_des=mpdata.setPropertyFlags(df_test_ri_no_duplicates, curation_flag_property, curation_detail_property)