# Notebook 5.1 - Curation-Keywords: delete

This notebook implements the workflow defined in:

[Curating keywords](https://gitlab.gwdg.de/sshoc/marketplace-curation/-/issues/1#note_71056) GitLab issue to delete wrong keywords.

The notebook works as follows:

0. Imports external libraries and loads the MP dataset and the google sheet
1. Delete keywords marked as 'delete' in the gsheet


## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

### 0.2 Get the data



Get the MarketPlace dataset

In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


The function *getMPConcepts()* is a custom function that uses the API entry: 

GET https://marketplace-api.sshopencloud.eu/api/concept-search?perpage=100&q=URI

to get all the *concepts* from the MarketPlace dataset. 




In [3]:
df_concepts=mpdata.getMPConcepts()

In [None]:
df_concepts.head()

In [5]:
utils=hel.Util()
resultfields=['persistentId', 'MPUrl', 'category', 'label', 'type.code', 'type.label', 'concept.code', 'concept.label', 'concept.uri', 'concept.vocabulary.scheme']
udf_alprop=utils.getAllPropertiesBySources()
udf_alprop=udf_alprop.loc[ : ,resultfields]

### 1 Delete Keywords

Get the list of keywords from the [gsheet](https://docs.google.com/spreadsheets/d/1-Oh9_SxIhfMAT6KNJrMf4LetCpy5s1fHZEyTL__TUVA/edit#gid=0)

In [None]:
sheet_id = '1-Oh9_SxIhfMAT6KNJrMf4LetCpy5s1fHZEyTL__TUVA'
sheet_name = 'Mappings'
rejurl = f'https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}'
df_rej_keywords=pd.read_csv(rejurl)

In [None]:
df_rej_keywords=df_rej_keywords[df_rej_keywords['Map to']=='delete'].drop_duplicates(keep='first')
df_rej_keywords.head(10)

In [None]:
pd.options.mode.chained_assignment = None
rejectedItems=pd.DataFrame()
for rown, row in df_rej_keywords.iterrows():
    #print (row)
    #rk=df_rej_keywords.iloc[rown]['Keyword to map']
    rk=row['Keyword to map']
    df_items_wrk=udf_alprop.loc[(udf_alprop['concept.label'].str.lower()==rk.lower()), ]
        
    if (df_items_wrk.empty):
        print (f"\n%%%%%%%%  No items found for {rk}")
        continue;
    print (f'\n Keyword {rk} found as {df_items_wrk.iloc[0]["type.code"]}\n')
    jsonmapto=[]    
    #attrList={}
        
    #jsonmapto.append(attrList)
    filterList={}
    filterList["concept"]=rk.lower()
    
    df_items_wrk['filterList']=[filterList] * len(df_items_wrk)
    df_items_wrk['updateList']=[jsonmapto] * len(df_items_wrk)
    
#     df_items_wrk.loc[ : ,('filterList')]=[filterList for _ in range(df_items_wrk.shape[0])]
#     df_items_wrk.loc[ : ,('updateList')]=[jsonmapto for _ in range(df_items_wrk.shape[0])]
    rejectedItems=pd.concat([rejectedItems, df_items_wrk.loc[df_items_wrk.astype(str).drop_duplicates(keep='first').index]])
    

attrList={}
filterList={}
#rejectedItems.head()
