# Organisation/Funder/Repository Data Management Plans statistics

Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as free-form text and describe the data and tools employed in scientific investigations. They are often seen as an administrative exercise and not as an integral part of research practice.  Machine Actionable DMPs (maDMPs) takes the DMP concept further by using PIDs and PIDs services to connected al resources associated with the DMPs.


This notebook displays in a human-friendly all DMPs statistics for an Organisation, Funder and/or Repository. By the end of this notebook, you will be able to succinctly display all the DMPs stats to a organization, a funder and a repository. To demostrate this we use the **Europoean Commision**  that can be both an Organization (https://ror.org/00k4n6c32) or a Funder (https://doi.org/10.13039/501100000780). In the summary statistics you will find a row for each DMP of the EC. Each row includes the title of the DMP, its PID, number of datasets related, people involved, organizations and funders.


The process of displaying the DMP stats is very simple. First, and after a initial setup, we fetch all the we need from the DataCite GraphQL API. Then, we transform this data into a data structure that can be use for computation. Finally, we take the data tranformation and supply it to a table.




In [41]:
%%capture
# Install required Python packages
!pip install dfply

In [42]:
import json
import pandas as pd
import numpy as np
from dfply import *


In [118]:
# Prepare the GraphQL client
import requests
from IPython.display import display, Markdown
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

_transport = RequestsHTTPTransport(
    url='https://api.datacite.org/graphql',
    use_json=True,
)

client = Client(
    transport=_transport,
    fetch_schema_from_transport=True,
)

## Fetching Data

We obtain all the data from the DataCite GraphQL API.


In [119]:
 # Generate the GraphQL query to retrieve up to 100 outputs of University of Oxford, with at least 100 views each.
query_params = {
    "rorId" : "https://ror.org/00k4n6c32",
    "funderId" : "https://doi.org/10.13039/501100000780",
    "repositoryId" : "cern.zenodo"
}

organizationQuery = gql("""query getOutputs($rorId: ID!)
{
  organization(id: $rorId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

funderQuery = gql("""query getOutputs($funderId: ID!)
{
  funder(id: $funderId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

repositoryQuery = gql("""query getOutputs($repositoryId: ID!)
{
  repository(id: $repositoryId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")
 

In [120]:
# name=select()


In [121]:
# import ipywidgets as widgets
# from ipywidgets import interact, interact_manual


In [122]:
# @interact
# def select_dmp(column=['https://ror.org/00k4n6c32', 'views', 'fans', 'reads']):
#     return column

In [123]:
def get_data(type):
    if type == "organization":
        return client.execute(organizationQuery, variable_values=json.dumps(query_params))["organization"]
    elif type == "funder":
        return client.execute(funderQuery, variable_values=json.dumps(query_params))["funder"]
    else:
        return client.execute(repositoryQuery, variable_values=json.dumps(query_params))["repository"]


## Data Transformation

Simple transformations are performed to convert the graphql response into an array that can be used..

In [124]:
def get_series_size(series_element):
    return len(series_element)

In [125]:
def get_dataset_nodes(series_element):
    return series_element['nodes']

In [126]:
def get_title(series_element):
    return series_element[0]['title']

In [127]:
def transform_dmps(dataframe):
    """Modifies each item to include attributes needed for the node visulisation

    Parameters:
    dataframe (dataframe): A dataframe with all the itemss
    parent (int): The id of the parent node

    Returns:
    dataframe:Returning vthe same dataframe with new attributes

   """
    if (dataframe) is None:
        return pd.DataFrame() 
    else: 
        dataframe = (dataframe >>
        mutate(
            datasets = X.datasets.apply(get_dataset_nodes)
        ))

        return (dataframe >>
        mutate(
            dmp = X.title.apply(get_title),
            doi = X.id,
            datasets = X.datasets.apply(get_series_size),
            organizations = X.organizations.apply(get_series_size),
            funders = X.funders.apply(get_series_size),
            people = X.people.apply(get_series_size)
        ))
  

In [128]:
def processTable(type):
    data = get_data(type)
    if len(data["dataManagementPlans"]['nodes']) == 0:
        table = None
    else:
        table = pd.DataFrame(data["dataManagementPlans"]['nodes'],columns=data["dataManagementPlans"]['nodes'][0].keys())
    return transform_dmps(table)[list(('dmp', 'doi','datasets','organizations','funders','people'))].style.set_caption(data['name'])

In [114]:
organization = processTable("organization")
funder = processTable("funder")
repository = processTable("repository")

{'name': 'European Commission', 'dataManagementPlans': {'totalCount': 47, 'nodes': [{'id': 'https://doi.org/10.5281/zenodo.3245354', 'title': [{'title': 'EURHISFIRM D1.2: Data Management Plan (first version)'}], 'datasets': {'totalCount': 0, 'nodes': []}, 'organizations': [], 'funders': [{'id': 'https://doi.org/10.13039/501100000780', 'funderIdentifierType': 'Crossref Funder ID', 'name': 'European Commission'}], 'people': [{'id': None, 'name': 'GRANDI, Elisa', 'affiliation': [{'id': None}]}, {'id': None, 'name': 'RIVA, Angelo', 'affiliation': [{'id': None}]}, {'id': None, 'name': 'YOO, Lana', 'affiliation': [{'id': None}]}]}, {'id': 'https://doi.org/10.5281/zenodo.3245353', 'title': [{'title': 'EURHISFIRM D1.2: Data Management Plan (first version)'}], 'datasets': {'totalCount': 0, 'nodes': []}, 'organizations': [], 'funders': [{'id': 'https://doi.org/10.13039/501100000780', 'funderIdentifierType': 'Crossref Funder ID', 'name': 'European Commission'}], 'people': [{'id': None, 'name': 'G

In [115]:
organization

Unnamed: 0,dmp,doi,datasets,organizations,funders,people
0,EURHISFIRM D1.2: Data Management Plan (first version),https://doi.org/10.5281/zenodo.3245354,0,0,1,3
1,EURHISFIRM D1.2: Data Management Plan (first version),https://doi.org/10.5281/zenodo.3245353,0,0,1,3
2,EURHISFIRM D1.7: Second Data Management Plan,https://doi.org/10.5281/zenodo.3246339,0,0,1,5
3,EURHISFIRM D1.7: Second Data Management Plan,https://doi.org/10.5281/zenodo.3246338,0,0,1,5
4,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,https://doi.org/10.5281/zenodo.3253683,0,0,1,8
5,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,https://doi.org/10.5281/zenodo.3253684,0,0,1,8
6,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",https://doi.org/10.5281/zenodo.3368558,0,0,1,1
7,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",https://doi.org/10.5281/zenodo.3368557,0,0,1,1
8,D6.5 Data Management Plan,https://doi.org/10.5281/zenodo.3372460,0,0,1,1
9,D6.5 Data Management Plan,https://doi.org/10.5281/zenodo.3372459,0,0,1,1


In [116]:
funder

Unnamed: 0,dmp,doi,datasets,organizations,funders,people
0,EURHISFIRM D1.2: Data Management Plan (first version),https://doi.org/10.5281/zenodo.3245354,0,0,1,3
1,EURHISFIRM D1.2: Data Management Plan (first version),https://doi.org/10.5281/zenodo.3245353,0,0,1,3
2,EURHISFIRM D1.7: Second Data Management Plan,https://doi.org/10.5281/zenodo.3246339,0,0,1,5
3,EURHISFIRM D1.7: Second Data Management Plan,https://doi.org/10.5281/zenodo.3246338,0,0,1,5
4,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,https://doi.org/10.5281/zenodo.3253683,0,0,1,8
5,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,https://doi.org/10.5281/zenodo.3253684,0,0,1,8
6,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",https://doi.org/10.5281/zenodo.3368558,0,0,1,1
7,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",https://doi.org/10.5281/zenodo.3368557,0,0,1,1
8,D6.5 Data Management Plan,https://doi.org/10.5281/zenodo.3372460,0,0,1,1
9,D6.5 Data Management Plan,https://doi.org/10.5281/zenodo.3372459,0,0,1,1


In [117]:
repository

NameError: name 'repository' is not defined