# Notebook 5.1 - Curation-Keywords (alpha release)

The final release of this notebook will implement the workflow defined in:

[Curating keywords](https://gitlab.gwdg.de/sshoc/marketplace-curation/-/issues/1#note_71056) GitLab issue

The notebook is composed of 4 sections:

0. Import external libraries and loads the MP dataset and the google sheet
1. Look for vocabulary terms with the value from the column *Keyword to map*
2. Look for the term in the column *Map to*
3. Go through all MP items to look for the *keywords-to-map* in the keyword-dynamic-property


## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

### 0.2 Get the data



Get the MarketPlace dataset

In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


Get the list of keywords from the [gsheet](https://docs.google.com/spreadsheets/d/1-Oh9_SxIhfMAT6KNJrMf4LetCpy5s1fHZEyTL__TUVA/edit#gid=0)

In [3]:
sheet_id = '1-Oh9_SxIhfMAT6KNJrMf4LetCpy5s1fHZEyTL__TUVA'
sheet_name = 'Mappings'
url = f'https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}'
df_keywords=pd.read_csv(url)

### 0.3 A look at the data

A few lines of the gsheet

In [4]:
df_keywords.head()

Unnamed: 0,Keyword to map,Map to,Comment
0,Linguistics,https://vocabs.acdh.oeaw.ac.at/oefosdiscipline...,Sprach- und Literaturwissenschaften
1,History,https://vocabs.acdh.oeaw.ac.at/oefosdiscipline...,"Geschichte, Archäologie"
2,Literature,https://vocabs.acdh.oeaw.ac.at/oefosdiscipline...,Sprach- und Literaturwissenschaften
3,Video,https://vocabs.sshopencloud.eu/vocabularies/me...,video
4,Text,https://vocabs.sshopencloud.eu/vocabularies/me...,text


## 1 Look for vocabulary terms with the value from the column *Keyword to map*  

The function *getMPKeywordProperies(mKey)* is a custum function that uses the API entry 

    GET https://marketplace-api.sshopencloud.eu/api/concept-search?types=keyword&q=VALUE

and returns the vocabulary terms for *mKey*.  

In the cell below, the returned dataset is filtered to individuate those values coming from the vocabulary *sshoc-keyword* (vocabulary[code]=sshoc-keyword), then the resulting dataset is shown.


In [5]:
#In this example the key value searched is the one in the rown of the google sheet
rown=1
myKey=df_keywords.iloc[rown]['Keyword to map']
df_vocterms=mpdata.getMPKeywordProperties(myKey)
df_vocterms=df_vocterms[df_vocterms.vocabulary=={'code': 'sshoc-keyword'}]
df_vocterms

Unnamed: 0,code,vocabulary,label,notation,uri,types,candidate
0,history,{'code': 'sshoc-keyword'},History,,https://vocabs.dariah.eu/sshoc-keyword/history,[{'code': 'keyword'}],False
1,african-history,{'code': 'sshoc-keyword'},african-history,,https://vocabs.dariah.eu/sshoc-keyword/african...,[{'code': 'keyword'}],False
2,ancient-history,{'code': 'sshoc-keyword'},ancient-history,,https://vocabs.dariah.eu/sshoc-keyword/ancient...,[{'code': 'keyword'}],False
3,book-history,{'code': 'sshoc-keyword'},book-history,,https://vocabs.dariah.eu/sshoc-keyword/book-hi...,[{'code': 'keyword'}],False
4,bulgarian-history,{'code': 'sshoc-keyword'},bulgarian-history,,https://vocabs.dariah.eu/sshoc-keyword/bulgari...,[{'code': 'keyword'}],False
5,canadian-history,{'code': 'sshoc-keyword'},canadian-history,,https://vocabs.dariah.eu/sshoc-keyword/canadia...,[{'code': 'keyword'}],False
6,european-history,{'code': 'sshoc-keyword'},european-history,,https://vocabs.dariah.eu/sshoc-keyword/europea...,[{'code': 'keyword'}],False
7,film-history,{'code': 'sshoc-keyword'},film-history,,https://vocabs.dariah.eu/sshoc-keyword/film-hi...,[{'code': 'keyword'}],False
8,global-history,{'code': 'sshoc-keyword'},global-history,,https://vocabs.dariah.eu/sshoc-keyword/global-...,[{'code': 'keyword'}],False
9,history-of-science,{'code': 'sshoc-keyword'},history-of-science,,https://vocabs.dariah.eu/sshoc-keyword/history...,[{'code': 'keyword'}],False


The set of vocabulary terms is filterd to individuate those with the exact match (case-insensitive) 

In [6]:
df_vocterms=df_vocterms[(df_vocterms.code).str.lower()==myKey.lower()]
df_vocterms

Unnamed: 0,code,vocabulary,label,notation,uri,types,candidate
0,history,{'code': 'sshoc-keyword'},History,,https://vocabs.dariah.eu/sshoc-keyword/history,[{'code': 'keyword'}],False


## 2 Look for the term in the column *Map to*  

The function *getMPConcepts()* is a custum function that uses the API entry: 

GET https://marketplace-api.sshopencloud.eu/api/concept-search?perpage=100&q=URI

to get all the *concepts* from the MarketPlace dataset. 

**Note that executing this function may require some time, currently 14995 records are returned**



In [7]:
df_concepts=mpdata.getMPConcepts()
df_concepts.count()

code          14921
vocabulary    14921
label         14921
notation      14921
uri           14921
types         14921
candidate     14921
definition      209
dtype: int64

a look at few of the returned records

In [8]:
#df_concepts.tail()

Search for the one concept that has in the *uri* column the value of the *Map to*. There should be only one such concept.

In [9]:
test3=df_concepts[df_concepts.uri==df_keywords.iloc[rown]['Map to']]
test3.head()

Unnamed: 0,code,vocabulary,label,notation,uri,types,candidate,definition
6279,6010,{'code': 'discipline'},"History, Archaeology",6010,https://vocabs.acdh.oeaw.ac.at/oefosdiscipline...,[{'code': 'discipline'}],False,


The attribute *types* contains the value of the type code

{
      "type": {
        "code": "discipline"
      },
      "concept": {
        "code": "6020",
        "vocabulary": {
          "code": "discipline"
        },
        "uri": "https://vocabs.acdh.oeaw.ac.at/oefosdisciplines/6020"
      }
    }

In [10]:
jsonType={}
jsonConceptVal={}
jsonConcept={}
jsonType["type"]=test3.iloc[0].vocabulary
jsonConceptVal["code"]=test3.iloc[0].code
jsonConceptVal["vocabulary"]=test3.iloc[0].vocabulary
jsonConceptVal["uri"]=test3.iloc[0].uri
jsonConcept["concept"]=jsonConceptVal
jsonConcept

{'concept': {'code': '6010',
  'vocabulary': {'code': 'discipline'},
  'uri': 'https://vocabs.acdh.oeaw.ac.at/oefosdisciplines/6010'}}

## 3 Go through all MP items to look for the *keywords-to-map* in the keyword-dynamic-property

For this we use the *getAllPropertiesBySources()* custom functions that returns all dynamic properties; for every property it is reported also the main attributes and the PID of the item whom it belongs.

In [11]:
utils=hel.Util()
resultfields=['persistentId', 'MPUrl', 'category', 'label', 'type.code', 'type.label', 'concept.code', 'concept.label', 'concept.uri', 'concept.vocabulary.scheme']
udf_alprop=utils.getAllPropertiesBySources()
udf_alprop=udf_alprop.loc[ : ,resultfields]

In [12]:
udf_alprop.head()

Unnamed: 0,persistentId,MPUrl,category,label,type.code,type.label,concept.code,concept.label,concept.uri,concept.vocabulary.scheme
0,3IAyEp,tool-or-service/3IAyEp,tool-or-service,140kit,activity,Activity,https://vocabs.dariah.eu/tadirah/capturing,Capturing,https://vocabs.dariah.eu/tadirah/capturing,https://vocabs.dariah.eu/tadirah/
1,3IAyEp,tool-or-service/3IAyEp,tool-or-service,140kit,activity,Activity,https://vocabs.dariah.eu/tadirah/gathering,Gathering,https://vocabs.dariah.eu/tadirah/gathering,https://vocabs.dariah.eu/tadirah/
2,3IAyEp,tool-or-service/3IAyEp,tool-or-service,140kit,activity,Activity,https://vocabs.dariah.eu/tadirah/analyzing,Analyzing,https://vocabs.dariah.eu/tadirah/analyzing,https://vocabs.dariah.eu/tadirah/
3,3IAyEp,tool-or-service/3IAyEp,tool-or-service,140kit,activity,Activity,https://vocabs.dariah.eu/tadirah/visualAnalysis,Visual Analysis,https://vocabs.dariah.eu/tadirah/visualAnalysis,https://vocabs.dariah.eu/tadirah/
4,3IAyEp,tool-or-service/3IAyEp,tool-or-service,140kit,year,Year,,,,


Filter the dataset using the *Keyword to Map* and show the result (or part of the result)

In [13]:
udf_alprop.reset_index(inplace=True)
test=udf_alprop[(udf_alprop['concept.label'].str.lower()==myKey.lower()) & (udf_alprop['type.code'].str.lower()=='keyword')]


In [18]:
test.iloc[0:3]

Unnamed: 0,index,persistentId,MPUrl,category,label,type.code,type.label,concept.code,concept.label,concept.uri,concept.vocabulary.scheme
467,467,9oXLps,tool-or-service/9oXLps,tool-or-service,Bibliography of the History of the Czech Lands,keyword,Keyword,history,History,https://vocabs.dariah.eu/sshoc-keyword/history,https://vocabs.dariah.eu/sshoc-keyword/Schema
2310,2310,KtkVLQ,tool-or-service/KtkVLQ,tool-or-service,Digital repository of the Institute of Ethnolo...,keyword,Keyword,history,History,https://vocabs.dariah.eu/sshoc-keyword/history,https://vocabs.dariah.eu/sshoc-keyword/Schema
3337,3337,rwapG0,tool-or-service/rwapG0,tool-or-service,"FORTH_05_""TheMaS"": An open source system for t...",keyword,Keyword,history,History,https://vocabs.dariah.eu/sshoc-keyword/history,https://vocabs.dariah.eu/sshoc-keyword/Schema


In [19]:

jsonConceptVal={}
jsonConceptVal["code"]=test3.iloc[0].code
jsonConceptVal["vocabulary"]=test3.iloc[0].vocabulary
jsonConceptVal["uri"]=test3.iloc[0].uri
jsonConcept
attrList={}
attrList["type"]=test3.iloc[0].vocabulary
attrList["concept"]=jsonConceptVal
filterList={}
filterList["concept"]=myKey.lower()
attrList
mpdata.updateItemsProperties(test.iloc[0:3], attrList, filterList)

Changing the property:  "type", from  "type": {'code': 'keyword', 'label': 'Keyword', 'type': 'concept', 'groupName': 'Categorisation', 'hidden': False, 'ord': 18, 'allowedVocabularies': [{'code': 'sshoc-keyword', 'scheme': 'https://vocabs.dariah.eu/sshoc-keyword/Schema', 'namespace': 'https://vocabs.dariah.eu/sshoc-keyword/', 'label': 'Keywords from SSHOC MP', 'closed': False}]} 
to
 "type": {'code': 'discipline'}", in item with pid: "tools/9oXLps"
(Log info: current version is: 63814)

Changing the property:  "concept", from  "concept": {'code': 'history', 'vocabulary': {'code': 'sshoc-keyword', 'scheme': 'https://vocabs.dariah.eu/sshoc-keyword/Schema', 'namespace': 'https://vocabs.dariah.eu/sshoc-keyword/', 'label': 'Keywords from SSHOC MP', 'closed': False}, 'label': 'History', 'notation': '', 'uri': 'https://vocabs.dariah.eu/sshoc-keyword/history', 'candidate': False} 
to
 "concept": {'code': '6010', 'vocabulary': {'code': 'discipline'}, 'uri': 'https://vocabs.acdh.oeaw.ac.at/oefo

In [21]:
testlog=mpdata._getLog()
testlog.tail(3)

Unnamed: 0,date,persistentId,category,restore_version,operation
0,"Fri, 21 Oct 2022 08:44:50 GMT",9oXLps,tools,63814,update
0,"Fri, 21 Oct 2022 08:46:58 GMT",KtkVLQ,tools,63813,update
0,"Fri, 21 Oct 2022 08:47:00 GMT",rwapG0,tools,63822,update


In [22]:
mpdata.restoreItems(testlog.tail(3))

Restoring: https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools-services/9oXLps/versions/63814/revert
item 9oXLps, restored to the version 63814
result <Response [200]>
Restoring: https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools-services/KtkVLQ/versions/63813/revert
item KtkVLQ, restored to the version 63813
result <Response [200]>
Restoring: https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools-services/rwapG0/versions/63822/revert
item rwapG0, restored to the version 63822
result <Response [200]>


To be completed