# Publications & Dataset: checking Licenses/Terms of use using the Data Citation Service

This notebook uses the Data Citation Service to check the **licenses** and **terms of use** of publication and datasets stored in SSHOMP having a _DOI_ or _Handle_ as PID.




## 0 Requirements to run this notebook

This section gives all the relevant information to "interact" with the MP data.

### 0.1 libraries
*There are a number of external libraries needed to run the notebook* 

*Furthermore, a dedicated SSH Open Marketplace library - sshmarketplacelib - with customised functions has been created and can be imported using the python import commands.* 

*Below the libraries import needed to run this notebook*

In [1]:
import numpy as np
import pandas as pd
import requests
#import the MarketPlace Library 
from sshmarketplacelib import MPData as mpd
from sshmarketplacelib import  eval as eva, helper as hel

### 0.2 Get the data



Get the MarketPlace data

In [2]:
mpdata = mpd()

df_publication_flat =mpdata.getMPItems ("publications", True)
df_datasets_flat =mpdata.getMPItems ("datasets", True)
df_data=pd.concat([df_publication_flat, df_datasets_flat])

getting data from local repository...
getting data from local repository...


Select all Publications/Datasets that have a _handle_ or a _doi_ as URL

In [47]:
df_data['accessibleAt'] = [[n for n in l if ('doi' in n or 'handle' in n)] for l in df_data['accessibleAt']]
df_data = df_data[df_data['accessibleAt'].str.len().gt(0)]
df_data.reset_index(inplace=True)

    Iterate over the dataframe with the publications/datasets and search for citation metadata
    Inspect the retrieved metadata to search for licensing info
    Create a dataframe with resulting information

In [52]:
import json
import urllib.request
import urllib
d = []

for inde, row in df_data[0:50].iterrows():
        for aait in row.accessibleAt:
            inshomp=[]
            
            print (f"{aait}, {inde}")
            if (len(row.properties)>0):
                for prope in row.properties:
                    
                    if ('concept' in prope):
                            
                        if (prope['type']['code']=='license' or  'terms-of-use' in prope['type']['code']):
                            inshomp.append({prope['type']['code']: prope['concept']['code']})
                            
                    if ('value' in prope):
                        if ('terms-of-use' in prope['type']['code']):
                            inshomp.append({prope['type']['code']: prope['value']})
#                             print (prope)
                                
            # connect to Data Citation Service to search for metadata
            
            pid=urllib.parse.quote_plus(aait)
            turl='https://v4e-lab.isti.cnr.it/citationservice/citharvester/getcitationmetadata?pid='+pid+'&token=test'
                 
#             print (turl)
            with urllib.request.urlopen(turl) as url:
                res_j=json.load(url)
            indcsrepo=[]
            indcsra=[]
            
            #search for license info in the result returned by the service
            
            if(res_j and 'jsonld_properties' in res_j):
#                 res_j['jsonld_properties'] =  {k.lower(): v for k, v in res_j['jsonld_properties'].items()}
                if ('license' in res_j['jsonld_properties']):
                    indcsrepo.append(res_j['jsonld_properties']['license'])
            if(res_j and 'ra_properties' in res_j):
#                 res_j['ra_properties'] =  {k.lower(): v for k, v in res_j['ra_properties'].items()}
                if ('copyright' in res_j['ra_properties']):
                    indcsra.append(res_j['ra_properties']['copyright'])
                if ('license' in res_j['ra_properties']):
                    indcsra.append(res_j['ra_properties']['license'])
            if(res_j and 'properties' in res_j):
                res_j['properties'] =  {k.lower(): v for k, v in res_j['properties'].items()}
                if ('dc.rights' in res_j['properties']):
                    indcsrepo.append(res_j['properties']['dc.rights'])
          
            if (len(indcsrepo)>0 or len(indcsra)>0):
                    d.append(
                                {
                                    'item': f"{mpdata.MPserver}{row.category}/{row.persistentId}",
                                    'accessibleAt': aait,
                                    'license_in_sshomp': inshomp,
                                    'license_from_repository':  indcsrepo,
                                    'license_from_ra': indcsra
                                }
                    )
                      
res=pd.DataFrame(d)

https://doi.org/10.48550/arXiv.2304.02603, 0
https://doi.org/10.1007/978-1-4419-6886-9_2, 1
https://run.unl.pt/handle/10362/155506, 2
https://doi.org/10.1007/978-3-031-06555-2_34, 3
https://doi.org/10.1080/01639374.2018.1543747, 4
https://run.unl.pt/handle/10362/144292, 5
https://doi.org/10.7551/mitpress/14046.001.0001, 6
https://doi.org/10.5281/zenodo.10478399, 7
https://doi.org/10.5944/rhd.vol.6.2021.27371, 8
https://onlinelibrary.wiley.com/doi/abs/10.1002/9780470106310.fmatter, 9
http://hdl.handle.net/10362/135784, 10
https://dl.acm.org/doi/10.1145/3594724, 11
https://doi.org/10.1186/s40494-018-0197-y, 12
https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119961284.ch2, 13
https://doi.org/10.5281/zenodo.5700008, 14
https://iris.polito.it/handle/11583/2675486#.XOLD7KbgpAY, 15
https://doi.org/10.1007/978-3-319-52804-5_3, 16
https://zenodo.org/doi/10.5281/zenodo.7576675, 17
https://doi.org/10.5281/zenodo.7051342, 18
https://doi.org/10.1177/0165551520950246, 19
https://run.unl.pt/handl

In [54]:
res.head()

Unnamed: 0,item,accessibleAt,license_in_sshomp,license_from_repository,license_from_ra
0,https://marketplace.sshopencloud.eu/publicatio...,https://doi.org/10.48550/arXiv.2304.02603,[{'license': 'CC-BY-4.0'}],[],[Creative Commons Attribution 4.0 International]
1,https://marketplace.sshopencloud.eu/publicatio...,https://run.unl.pt/handle/10362/155506,[],[openAccess],[]
2,https://marketplace.sshopencloud.eu/publicatio...,https://doi.org/10.1007/978-3-031-06555-2_34,[],[],"[[{'content-version': 'tdm', 'start': {'date-t..."
3,https://marketplace.sshopencloud.eu/publicatio...,https://doi.org/10.1080/01639374.2018.1543747,[],[],"[[{'content-version': 'vor', 'start': {'date-t..."
4,https://marketplace.sshopencloud.eu/publicatio...,https://run.unl.pt/handle/10362/144292,[],[openAccess],[]


In [None]:
res.to_csv('data/licenses.csv')