# Organisation/Funder/Repository Data Management Plans statistics

Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as free-form text and describe the data and tools employed in scientific investigations. They are often seen as an administrative exercise and not as an integral part of research practice.  Machine Actionable DMPs (maDMPs) take the DMP concept further by using PIDs and PIDs services to connected all resources associated with the DMPs.


This notebook displays in a human-friendly way all DMPs statistics for a Research Organisation, Funder and/or Repository. By the end of this notebook, you will be able to succinctly display all the DMPs stats to a organization, a funder and a repository. To demostrate this we use the Funder **Europoean Commision (EC)**. In the summary statistics you will find a row for each DMP of the EC. Each row includes the title of the DMP, its persistent identifier (PID), number of related datasets, and the people, research organizations and funders involved.


The process of displaying the DMP stats is very simple. First, and after a initial setup, we fetch all the data we need from the DataCite GraphQL API. Then, we transform this data into a data structure that can be used for computation. Finally, we take the data tranformation and supply it to a table.




In [None]:
%%capture
# Install required Python packages
!pip install dfply

In [None]:
import json
import pandas as pd
import numpy as np
from dfply import *


In [None]:
# Prepare the GraphQL client
import requests
from IPython.display import display, Markdown
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

_transport = RequestsHTTPTransport(
    url='https://api.datacite.org/graphql',
    use_json=True,
)

client = Client(
    transport=_transport,
    fetch_schema_from_transport=True,
)

## Fetching Data

We obtain all the data from the DataCite GraphQL API.


In [None]:
 # Generate the GraphQL query to retrieve data from organization European Commission, funder European Commission and repository Zenodo.
query_params = {
    "rorId" : "https://ror.org/00k4n6c32",
    "funderId" : "https://doi.org/10.13039/501100000780",
    "repositoryId" : "cern.zenodo"
}

organizationQuery = gql("""query getOutputs($rorId: ID!)
{
  organization(id: $rorId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

funderQuery = gql("""query getOutputs($funderId: ID!)
{
  funder(id: $funderId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

repositoryQuery = gql("""query getOutputs($repositoryId: ID!)
{
  repository(id: $repositoryId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organizations: contributors(contributorType: "HostingInstitution") {
          id
          name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")
 

In [None]:
# name=select()


In [None]:
# import ipywidgets as widgets
# from ipywidgets import interact, interact_manual


In [None]:
# @interact
# def select_dmp(column=['https://ror.org/00k4n6c32', 'views', 'fans', 'reads']):
#     return column

In [None]:
def get_data(type):
    if type == "organization":
        return client.execute(organizationQuery, variable_values=json.dumps(query_params))["organization"]
    elif type == "funder":
        return client.execute(funderQuery, variable_values=json.dumps(query_params))["funder"]
    else:
        return client.execute(repositoryQuery, variable_values=json.dumps(query_params))["repository"]


## Data Transformation

Simple transformations are performed to convert the graphql response into an array that can be used..

In [None]:
def get_series_size(series_element):
    return len(series_element)

In [None]:
def get_dataset_nodes(series_element):
    return series_element['nodes']

In [None]:
def get_title(series_element):
    return series_element[0]['title']

In [None]:
def transform_dmps(dataframe):
    """Modifies each item to include attributes needed for the node visulisation

    Parameters:
    dataframe (dataframe): A dataframe with all the itemss
    parent (int): The id of the parent node

    Returns:
    dataframe:Returning vthe same dataframe with new attributes

   """
    if (dataframe) is None:
        return pd.DataFrame() 
    else: 
        dataframe = (dataframe >>
        mutate(
            datasets = X.datasets.apply(get_dataset_nodes)
        ))

        return (dataframe >>
        mutate(
            dmp = X.title.apply(get_title),
            doi = X.id,
            datasets = X.datasets.apply(get_series_size),
            organizations = X.organizations.apply(get_series_size),
            funders = X.funders.apply(get_series_size),
            people = X.people.apply(get_series_size)
        ))
  

In [None]:
def processTable(type):
    data = get_data(type)
    if len(data["dataManagementPlans"]['nodes']) == 0:
        table = None
    else:
        table = pd.DataFrame(data["dataManagementPlans"]['nodes'],columns=data["dataManagementPlans"]['nodes'][0].keys())
    return transform_dmps(table)[list(('dmp', 'doi','datasets','organizations','funders','people'))].style.set_caption(data['name'])

In [None]:
organization = processTable("organization")
funder = processTable("funder")
repository = processTable("repository")

In [None]:
organization

In [None]:
funder

In [None]:
repository