# Notebook 4.4 - Actors curation: delete actors not associated to any item

This notebook helps moderators to curate Actors with no items in the SSH Open Marketplace. 

This notebook is composed of n sections:

0. Requirements to run this notebook
1. Get all Actors from the MP dataset 
2. Find Actors not associated to any item
3. Delete Actors not associated to any item

## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 


In [None]:
import pandas as pd #to manage dataframes

import numpy as np #to manage json objects
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

In [None]:
mpdata = mpd()
utils=hel.Util()
check=eva.URLCheck()

## 1. Get actors

In [None]:
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

In [None]:
df_actors_flat =mpdata.getMPItems ("actors", False)

Remove spaces from Actor names

In [None]:
df_actors_flat['norm_name']=df_actors_flat['name'].apply(lambda y: (' ').join(y.split()) if type(y)==str else y)
df_actors_flat_no_duplicates=df_actors_flat.sort_values('name').drop_duplicates(subset=['id'], keep='first', ignore_index=True)

print(f'Number of actors: {df_actors_flat.shape[0]}')

Reduce the number of Actors to inspect by finding those that do not have associated items in the current data set

In [None]:
df_contrib=utils.getContributors()
df_no=df_actors_flat.merge(df_contrib['actor.id'], left_on='id', right_on='actor.id', how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']
df_no.sort_values('name').shape[0]

### 2 Find actors not associated to any item
The code below search for Actors not associated to any item using the API entry:

    /api/actors/{id}?items=true


In [None]:
df_actors_ei=pd.DataFrame()
tot=0
print ('Start...')
for mytindex, mytrow in df_no.iterrows():
    actitems=mpdata.getItemsforActor(str(mytrow.id).replace('.0',''))
    if (actitems.empty):
        tot=tot+1
        if (tot % 100 ==0):
            print (f'found {tot} actors,...')
        #print (f'{mytrow.id}, {mytrow["norm_name"]}')
        new_df = pd.DataFrame([mytrow])
        df_actors_ei=pd.concat([df_actors_ei, new_df])
       
print(f'Number of actors not associated to items: {tot}')

In [None]:
#df_actors_ei.to_pickle('data/actors_no_items.pickle')
#df_actors_ei = pd.read_pickle('data/actors_no_items.pickle')

In [None]:
df_actors_ei.sort_values('id').iloc[-6:]

### 3 Delete actors never associated to any item

_WARNING: some Actors may be no deleted since they may be associated to deleted items or affiliated to other Actors, in those cases a '500' status is returned. This is a known issue, will be fixed._

In [None]:
for niindex,nirow in df_actors_ei.sort_values('id').iterrows():
    mpdata.deleteItem('actors', str(nirow.id).replace('.0',''))