# Notebook 3.1 - Curation-flag-URL

This notebook analyses the URL-based fields of the SSH Open Marketplace and writes back to the system via two dedicated curation properties: `curation-flag-url` and `curation-detail` properties.

This notebook flags Marketplace items that have errors in their URL-based fields, helping Moderators identify curation priorities to improve data quality. 

This notebook is part of a series of 4 notebooks that inform the curation properties used in the SSH Open Marketplace Editorial Dashboard.

It is composed of 4 sections:

0. Requirements to run the notebook
1. Check & flag error values in `accessibleAt`
2. Check & flag error values in URL-based properties
3. Check & flag error values in URL-based properties for a given source - example


## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

### 0.2 Get the data



In [2]:
mpdata = mpd()
df_tool_flat =mpdata.getMPItems ("toolsandservices", True)
df_publication_flat =mpdata.getMPItems ("publications", True)
df_trainingmaterials_flat =mpdata.getMPItems ("trainingmaterials", True)
df_workflows_flat =mpdata.getMPItems ("workflows", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)

getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...
getting data from local repository...


### 0.3 A look at the data

df_all_items.head() will show the first 5 rows of the dataframe

df_all_items.tail() will show the 5 last rows of the dataframe

df_all_items.shape will give the dataframe shape (number of rows and columns)


In [3]:
df_all_items=pd.concat([df_tool_flat, df_publication_flat, df_trainingmaterials_flat, df_workflows_flat, df_datasets_flat])
df_all_items.head()

Unnamed: 0,id,category,label,persistentId,lastInfoUpdate,status,description,contributors,properties,externalIds,...,thumbnail.info.mediaId,thumbnail.info.category,thumbnail.info.filename,thumbnail.info.mimeType,thumbnail.info.hasThumbnail,thumbnail.info.location.sourceUrl,thumbnail.caption,dateCreated,dateLastUpdated,composedOf
0,45163,tool-or-service,140kit,SIU1nO,2022-10-27T16:29:58+0000,approved,140kit provides a management layer for tweet c...,"[{'actor': {'id': 2224, 'name': 'Ian Pearce, D...","[{'type': {'code': 'mode-of-use', 'label': 'Mo...",[],...,,,,,,,,,,
1,36324,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,4gDAHv,2022-01-13T11:49:02+0000,approved,3DF Zephyr\[1\]\[2\] is a commercial photogram...,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,
2,48611,tool-or-service,3DHOP,UcxOmD,2022-11-04T09:34:17+0000,approved,3DHOP (3D Heritage Online Presenter) is an ope...,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,
3,42055,tool-or-service,3DHOP: 3D Heritage Online Presenter,uFIMPQ,2022-09-12T16:59:36+0000,approved,No description provided.,[],[{'type': {'code': 'curation-flag-description'...,[],...,,,,,,,,,,
4,42056,tool-or-service,3DReshaper \| 3DReshaper,kAkzuz,2022-09-12T16:59:36+0000,approved,No description provided.,[],"[{'type': {'code': 'language', 'label': 'Langu...",[],...,,,,,,,,,,


`df_all_items_work` selects the columns/attributes of interest 

In [4]:
df_all_items_work=df_all_items[['id', 'persistentId', 'category', 'label', 'contributors', 'accessibleAt', 'source.label']]
df_all_items_work.tail()

Unnamed: 0,id,persistentId,category,label,contributors,accessibleAt,source.label
1139,49109,gKJuvx,dataset,York-Helsinki Parsed Corpus of Old English Poetry,[],[http://hdl.handle.net/20.500.12024/2425],CLARIN Resource Families
1140,12589,YnEaU0,dataset,"""You Are Where You Tweet: A Content-Based Appr...","[{'actor': {'id': 1752, 'name': 'Eva Bacas', '...",[https://archive.org/details/twitter_cikm_2010],Humanities Data
1141,49297,x2xkU7,dataset,Zurich English Newspaper Corpus,[],[http://www.helsinki.fi/varieng/CoRD/corpora/Z...,CLARIN Resource Families
1142,49595,fT0AeP,dataset,Zweite Generation deutschsprachiger Migranten ...,[],[http://hdl.handle.net/10932/00-0332-C453-CEDC...,CLARIN Resource Families
1143,49408,ekoDXk,dataset,μtopia,[],[http://www.cs.cmu.edu/~lingwang/microtopia/#o...,CLARIN Resource Families


## 1. Check & flag values in `accessibleAt`

`accessibleAt` is the main URL field of MP items.

The following cell checks if there are empty values in `accessibleAt` for all items

In [5]:
df_all_items_work_emptyurls=df_all_items_work[df_all_items_work['accessibleAt'].str.len()==0]

emptyurldescriptionsn=df_all_items_work_emptyurls.count()[0]

print(f'\n There are {emptyurldescriptionsn} items without accessibleAt URLs\n')


 There are 564 items without accessibleAt URLs



### 1.1 Check the validity of URLs in the accessibleAt property using the HTTP Result Status code

The code below explicitly execute an http call for every URL, waits for the [Result Status Code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) of the call and then registers the code.
Depending on connections and server answer times it may take several minutes to process all URLs.


In [6]:
#The list of categories is defined in the following statement

categories="toolsandservices, publications, trainingmaterials, workflows, datasets"

check=eva.URLCheck()
df_urls=check.checkURLValues(categories, 'accessibleAt')
df_urls.head()

inspecting accessibleAt


Unnamed: 0,MPUrl,persistentId,category,label,property,url,status
0,tool-or-service/SIU1nO,SIU1nO,tool-or-service,140kit,accessibleAt,https://github.com/WebEcologyProject/140kit,200
1,tool-or-service/4gDAHv,4gDAHv,tool-or-service,3DF Zephyr - photogrammetry software - 3d mode...,accessibleAt,https://www.3dflow.net/3df-zephyr-pro-3d-model...,200
2,tool-or-service/UcxOmD,UcxOmD,tool-or-service,3DHOP,accessibleAt,http://vcg.isti.cnr.it/3dhop/,200
3,tool-or-service/uFIMPQ,uFIMPQ,tool-or-service,3DHOP: 3D Heritage Online Presenter,accessibleAt,https://github.com/cnr-isti-vclab/3DHOP,200
4,tool-or-service/kAkzuz,kAkzuz,tool-or-service,3DReshaper \| 3DReshaper,accessibleAt,https://www.3dreshaper.com/en/,200


In [7]:
df_urls.drop_duplicates(keep='first', inplace=True)
#df_urls.head()

In [8]:
utils=hel.Util()
df_http_status_nf_err=df_urls[df_urls['status'] == 404].sort_values('persistentId').drop_duplicates(keep='first', inplace=False)
df_http_status_serv_err=df_urls[df_urls['status'] == 200].sort_values('persistentId').drop_duplicates(keep='first', inplace=False)
df_http_status_err=pd.concat([df_http_status_nf_err, df_http_status_serv_err])
#df_http_status_err=df_urls.sort_values('persistentId').drop_duplicates(keep='first', inplace=False)
#df_http_status_err.to_pickle('data/urlstatus.pickle')
#df_http_status_nf_err.to_pickle('data/urlstatus404.pickle')
myclickable_table = df_http_status_nf_err.style.format({'MPUrl': utils.make_clickable})
myclickable_table

Unnamed: 0,MPUrl,persistentId,category,label,property,url,status
5058,training-material/0y7yIY,0y7yIY,training-material,Techniques d'anonymisation,accessibleAt,http://journal-sfds.fr/index.php/stat_soc/article/view/398,404
1310,tool-or-service/1vuuAw,1vuuAw,tool-or-service,test merge url,accessibleAt,https://acdh.oeaw.ac.at/isl/testurl,404
1520,tool-or-service/25grr1,25grr1,tool-or-service,VLE - Viennese Lexicographic Editor,accessibleAt,https://www.oeaw.ac.at/en/acdh/tools/vle/,404
278,tool-or-service/2YMYaD,2YMYaD,tool-or-service,DocuBurst,accessibleAt,http://vialab.science.uoit.ca/portfolio/docuburst,404
6497,dataset/3W5E6J,3W5E6J,dataset,Tübingen Treebank of Written German / Newspaper Corpus,accessibleAt,http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html,404
338,tool-or-service/3ZpHvD,3ZpHvD,tool-or-service,Finding People or Characters from A Text (Named-Entity Recognition),accessibleAt,https://github.com/TAPoR-3-Tools/Tapor-Coding-Tools/tree/master/tapor_coding_tools/natural%20language%20processing/Finging%20people%20or%20characters%20with%20NER,404
5779,dataset/44i9Jt,44i9Jt,dataset,KVIS Thai OCR Dataset,accessibleAt,https://figshare.com/articles/KVIS_Thai_OCR_Dataset/8963987,404
1502,tool-or-service/6ArGjA,6ArGjA,tool-or-service,VINCI,accessibleAt,http://research.cs.queensu.ca/CompLing/,404
1279,tool-or-service/6xmMnK,6xmMnK,tool-or-service,Tagger - Other (TAPoRware),accessibleAt,http://taporware.ualberta.ca/~taporware/otherTools/tagger.shtml,404
915,tool-or-service/7KpXCE,7KpXCE,tool-or-service,Meaki,accessibleAt,http://www.meaki.com/,404


## 2. Check & flag error values in URL-based properties



In [9]:
df_properties=utils.getProperties()
df_properties.head()

Unnamed: 0,type.code,type.label,type.type,type.groupName,type.hidden,type.ord,type.allowedVocabularies,concept.code,concept.vocabulary.code,concept.vocabulary.scheme,...,concept.vocabulary.closed,concept.label,concept.notation,concept.uri,concept.candidate,concept.definition,value,label,persistentId,category
0,mode-of-use,Mode of use,concept,Categorisation,False,22,"[{'code': 'invocation-type', 'scheme': 'https:...",webApplication,invocation-type,https://vocabs.sshopencloud.eu/vocabularies/in...,...,True,Web application,,https://vocabs.sshopencloud.eu/vocabularies/in...,False,,,140kit,SIU1nO,tool-or-service
1,activity,Activity,concept,Categorisation,False,17,"[{'code': 'tadirah2', 'scheme': 'https://vocab...",capturing,tadirah2,https://vocabs.dariah.eu/tadirah/,...,True,Capturing,,https://vocabs.dariah.eu/tadirah/capturing,False,,,140kit,SIU1nO,tool-or-service
2,activity,Activity,concept,Categorisation,False,17,"[{'code': 'tadirah2', 'scheme': 'https://vocab...",dataVisualization,tadirah2,https://vocabs.dariah.eu/tadirah/,...,True,Data Visualization,,https://vocabs.dariah.eu/tadirah/dataVisualiza...,False,creation and study of the visual representatio...,,140kit,SIU1nO,tool-or-service
3,activity,Activity,concept,Categorisation,False,17,"[{'code': 'tadirah2', 'scheme': 'https://vocab...",analyzing,tadirah2,https://vocabs.dariah.eu/tadirah/,...,True,Analyzing,,https://vocabs.dariah.eu/tadirah/analyzing,False,,,140kit,SIU1nO,tool-or-service
4,activity,Activity,concept,Categorisation,False,17,"[{'code': 'tadirah2', 'scheme': 'https://vocab...",analyzing,tadirah2,https://vocabs.dariah.eu/tadirah/,...,True,Analyzing,,https://vocabs.dariah.eu/tadirah/analyzing,False,,,140kit,SIU1nO,tool-or-service


In [10]:
df_properties["type.type"].unique()

array(['concept', 'boolean', 'string', 'int', 'url', 'date', 'float'],
      dtype=object)

In [11]:
df_properties_url=df_properties[df_properties["type.type"]=="url"]
df_properties_url

Unnamed: 0,type.code,type.label,type.type,type.groupName,type.hidden,type.ord,type.allowedVocabularies,concept.code,concept.vocabulary.code,concept.vocabulary.scheme,...,concept.vocabulary.closed,concept.label,concept.notation,concept.uri,concept.candidate,concept.definition,value,label,persistentId,category
379,see-also,See also,url,Context,False,28,,,,,...,,,,,,,https://www.dariah.eu/tools-services/tools-and...,Bibliography Doing Digital Humanities,qg91Bj,tool-or-service
396,see-also,See also,url,Context,False,28,,,,,...,,,,,,,http://www.hiu.cas.cz/cs/,Bibliography of the History of the Czech Lands,yC3sFv,tool-or-service
925,user-manual-url,User Manual URL,url,Context,False,29,,,,,...,,,,,,,https://du.cesnet.cz/en/navody/object_storage/...,CESNET DataCare - Object Based Storage,nCk2Fz,tool-or-service
926,terms-of-use-url,Terms of Use URL,url,Access,False,2,,,,,...,,,,,,,https://du.cesnet.cz/en/provozni_pravidla/start,CESNET DataCare - Object Based Storage,nCk2Fz,tool-or-service
927,privacy-policy-url,Privacy policy URL,url,Access,False,6,,,,,...,,,,,,,https://www.cesnet.cz/cesnet/personal-data-pro...,CESNET DataCare - Object Based Storage,nCk2Fz,tool-or-service
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40514,see-also,See also,url,Context,False,28,,,,,...,,,,,,,http://stnt.ijp.pan.pl/idxlac/index,XV century New Testament translations (Piętnas...,RB8myI,dataset
40529,see-also,See also,url,Context,False,28,,,,,...,,,,,,,http://hdl.handle.net/20.500.12024/2425,York-Helsinki Parsed Corpus of Old English Poetry,gKJuvx,dataset
40545,see-also,See also,url,Context,False,28,,,,,...,,,,,,,http://hdl.handle.net/10932/00-0332-C453-CEDC-...,Zweite Generation deutschsprachiger Migranten ...,fT0AeP,dataset
40546,see-also,See also,url,Context,False,28,,,,,...,,,,,,,http://hdl.handle.net/10932/00-0332-C453-CEDC-...,Zweite Generation deutschsprachiger Migranten ...,fT0AeP,dataset


In [12]:
df_properties_url["type.code"].unique()

array(['see-also', 'user-manual-url', 'terms-of-use-url',
       'privacy-policy-url', 'access-policy-url', 'helpdesk-url',
       'service-level-url'], dtype=object)

In [13]:
df_all_items.count()

id                                         6050
category                                   6050
label                                      6050
persistentId                               6050
lastInfoUpdate                             6050
status                                     6050
description                                6050
contributors                               6050
properties                                 6050
externalIds                                6050
accessibleAt                               6050
sourceItemId                               5874
relatedItems                               6050
media                                      6050
informationContributor.id                  6050
informationContributor.username            6050
informationContributor.displayName         6050
informationContributor.status              6050
informationContributor.registrationDate    6050
informationContributor.role                6050
informationContributor.config           

In [14]:
urls_df_properties=check.checkURLValuesInDataset(df_all_items.iloc[0:3000], 'terms-of-use-url, user-manual-url, privacy-policy-url, access-policy-url, service-level-url, see-also, helpdesk-url')
urls_df_properties.tail()

inspecting terms-of-use-url
inspecting user-manual-url
inspecting privacy-policy-url
inspecting access-policy-url
inspecting service-level-url
inspecting see-also
inspecting helpdesk-url


Unnamed: 0,MPUrl,persistentId,category,label,property,url,status
850,tool-or-service/SBcPlO,SBcPlO,tool-or-service,Virtual Language Observatory,see-also,https://vlo.clarin.eu/#tour,200
851,tool-or-service/BDR3JZ,BDR3JZ,tool-or-service,Visual Media Service Virtual Research Environment,see-also,https://services.d4science.org/web/visualmedia,200
852,tool-or-service/BDR3JZ,BDR3JZ,tool-or-service,Visual Media Service Virtual Research Environment,see-also,https://services.d4science.org/web/visualmedia,200
853,tool-or-service/BDR3JZ,BDR3JZ,tool-or-service,Visual Media Service Virtual Research Environment,see-also,https://services.d4science.org/web/visualmedia,200
854,tool-or-service/BDR3JZ,BDR3JZ,tool-or-service,Visual Media Service Virtual Research Environment,see-also,https://services.d4science.org/web/visualmedia,200


In [15]:
df_prop_http_status_nf_err=urls_df_properties[urls_df_properties['status'] == 404].sort_values('persistentId').drop_duplicates(keep='first', inplace=False)
df_prop_http_status_serv_err=urls_df_properties[urls_df_properties['status'] == 200].sort_values('persistentId').drop_duplicates(keep='first', inplace=False)
df_prop_http_status_err=pd.concat([df_prop_http_status_nf_err, df_prop_http_status_serv_err])

myclickable_table = df_prop_http_status_err.style.format({'MPUrl': utils.make_clickable})
myclickable_table

Unnamed: 0,MPUrl,persistentId,category,label,property,url,status
391,tool-or-service/8jJUMW,8jJUMW,tool-or-service,DARIAH-DE Data Federation Architecture (DFA),see-also,https://de.dariah.eu/en/data-federation-architecture,404
563,tool-or-service/EBuVud,EBuVud,tool-or-service,ELDAH Consent Form Wizard,see-also,https://consent.dariah.eu/startpage,404
699,tool-or-service/aqevCc,aqevCc,tool-or-service,Fran- Dictionaries of the Fran Ramovš Institute of the Slovenian Language ZRC SAZU,see-also,http://www.zrc-sazu.si/en/node,404
371,tool-or-service/yC3sFv,yC3sFv,tool-or-service,Bibliography of the History of the Czech Lands,see-also,http://www.hiu.cas.cz/cs/,404
266,tool-or-service/1wrJdz,1wrJdz,tool-or-service,Web Panel Sample Service (WPSS),user-manual-url,https://cdsp-scpo.github.io/wpss-doc/,200
729,tool-or-service/3DtWVy,3DtWVy,tool-or-service,GAMS repository,see-also,https://zim.uni-graz.at,200
562,tool-or-service/7j2LI8,7j2LI8,tool-or-service,Digital repository of the Institute of Ethnology and Folklore Research,see-also,http://www.ief.hr,200
743,tool-or-service/87wJWo,87wJWo,tool-or-service,Gephi,see-also,https://www.youtube.com/watch?v=2FqM4gKeNO4,200
705,tool-or-service/9TYf0i,9TYf0i,tool-or-service,Fulltext search interface for Fortunoff Video Archive for Holocaust Testimonies,see-also,https://ufal.mff.cuni.cz/malach/en,200
749,tool-or-service/AGuzS9,AGuzS9,tool-or-service,History of Slovenia - SIstory portal,see-also,https://www.inz.si/,200


### Export of the 404 errors for all URL-based fields

In [16]:
df_URL_404=pd.concat([df_http_status_nf_err, df_prop_http_status_nf_err])
export_df_URL_404=df_URL_404[['MPUrl', 'label', 'property', 'status', 'url']]
export_df_URL_404.to_csv(path_or_buf='data/brokenURLs.csv', sep=',', index=False)


### Flag items with wrong URLs in the Dataset

In [17]:
df_URL_flags=pd.concat([df_http_status_err, urls_df_properties])

In [18]:
curation_flag_property={"code": "curation-flag-url"}
curation_detail_property={"code": "curation-detail"}

In [19]:
res=mpdata.setHTTPStatusFlags(df_URL_flags, curation_flag_property, curation_detail_property)

dv {'description': {'length': 'nan'}}
dv {'description': {'length': 'nan'}}
dv {'description': {'length': '1710.0'}}
dv {'description': {'length': 'nan'}}
dv {'description': {'length': 'nan'}}
The item with PID: tools/yC3sFv has a *404* HTTP status for the property see-also, (False)
Appending curation_detail_flag {'url': [ {"see-also": "404"}]}



*** Running in DEBUG mode, Marketplace dataset not updated. ***

The item with PID: tools/FOhmdk has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: tools/I2egYT has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

dv {'url': [{'actor.website': '404'}]}
dv {'description': {'length': 'nan'}}
The item with PID: tools/i3v5q0 has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: 

dv {'description': {'length': 'nan'}}
dv {'url': [{'actor.website': '404'}]}
dv {'description': {'length': 'nan'}}
The item with PID: tools/aQK2rq has a *503* HTTP status for the property terms-of-use-url, (False)
Appending curation_detail_flag  {'url': [ {'terms-of-use-url': '503'},  {"accessibleAt": "503"}]}
dv {'url': [{'terms-of-use-url': '503'}, {'accessibleAt': '503'}]}
Dropping flag {'accessibleAt': '503'} ...
done.
Dropping flag {'terms-of-use-url': '503'} ...
done.
The item with PID: tools/aQK2rq has a *no longer valid flag* for the property user-manual-url, will be removed. (True)
The item with PID: tools/aQK2rq has a *503* HTTP status for the property privacy-policy-url, (True)
Appending curation_detail_flag  {'url': [ {'privacy-policy-url': '503'}], "url": []}
The item with PID: tools/aQK2rq has a *503* HTTP status for the property access-policy-url, (True)
Appending curation_detail_flag  {'url': [ {'access-policy-url': '503'},  {'privacy-policy-url': '503'}], "url": []}
dv

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'leng

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': 'nan'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'lengt

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'leng

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'leng

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'leng

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': 'nan'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'lengt

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'leng

dv {'description': {'length': '23.0'}}
The item with PID: publications/HoFRia has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag  {'url': [ {'accessibleAt': '404'}], "description": {'url': [ {'accessibleAt': '404'}], "length": "23.0"}}

*** Running in DEBUG mode, Marketplace dataset not updated. ***

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
The item with PID: publications

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
dv {'description': {'length': '23.0'}}
The item with PID: publications/iKH71H has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag  {'url': [ {'accessibleAt': '404'}], "description": {'url': [ {'accessibleAt': '404'}], "length": "23.0"}}

*** Running in DEBUG mode, Marketplace dataset not updated. ***

dv {'description': {'length': '23.0'}}
dv {'description': {'length': '

The item with PID: datasets/D4WBU2 has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag {'url': [ {"accessibleAt": "404"}]}



*** Running in DEBUG mode, Marketplace dataset not updated. ***

The item with PID: datasets/Qe5gex has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag {'url': [ {"accessibleAt": "404"}]}



*** Running in DEBUG mode, Marketplace dataset not updated. ***

The item with PID: datasets/tKZRRu has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag {'url': [ {"accessibleAt": "404"}]}



*** Running in DEBUG mode, Marketplace dataset not updated. ***

The item with PID: datasets/oTBwOX has a *404* HTTP status for the property accessibleAt, (False)
Appending curation_detail_flag {'url': [ {"accessibleAt": "404"}]}



*** Running in DEBUG mode, Marketplace dataset not updated. ***

The item with PID: datasets/pRSAhb has a *404* HTTP status for the p

The item with PID: datasets/3W5E6J has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/Zrq9lF has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/x2xkU7 has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/D4WBU2 has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/Qe5gex has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/tKZRRu has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The 

The item with PID: datasets/oTBwOX has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/pRSAhb has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/JkfwMn has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/VlzZY8 has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/NAqO5J has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The item with PID: datasets/o2E85y has a *404* HTTP status for the property accessibleAt, (False)
flag property exists, value:  {'url': [ {"accessibleAt": "404"}]} 

The 

#### Variant: check error values in URL-based properties for items from a specific source

Create a dataframe with all the items having the EOSC Catalogue source

In [None]:
df_ec_items=df_all_items[df_all_items['source.label']=='EOSC Catalogue']

In [None]:
df_ec_items.head(3)

Check the URL properties by invoking the function **checkURLValuesInDataset(dataset, props)**

In [None]:
urls_df_hd=check.checkURLValuesInDataset(df_ec_items, 'terms-of-use-url, user-manual-url, privacy-policy-url, access-policy-url, service-level-url, see-also, helpdesk-url')
urls_df_hd.head()

In [None]:
urls_df_hd_status_nf_err=urls_df_hd[urls_df_hd['status'] == 404].sort_values('persistentId').drop_duplicates(keep='first', inplace=False)

In [None]:
myclickable_table = urls_df_hd_status_nf_err.style.format({'MPUrl': utils.make_clickable})
myclickable_table