# Member's Machine Actionable Data Management Plans

Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as free-form text and describe the data and tools employed in scientific investigations. They are often seen as an administrative exercise and not as an integral part of research practice. Machine Actionable DMPs (maDMPs) takes this concept further by 

This notebook displays in a human-friendly all the DMPs related to Member. By the end of this notebook, you will be able to succinctly display all the DMPs related to a DataCite Member.


In [None]:
%%capture
# Install required Python packages
!pip install dfply

In [182]:
import json
import pandas as pd
import numpy as np
from dfply import *


In [183]:
# Prepare the GraphQL client
import requests
from IPython.display import display, Markdown
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

_transport = RequestsHTTPTransport(
    url='https://api.datacite.org/graphql',
    use_json=True,
)

client = Client(
    transport=_transport,
    fetch_schema_from_transport=True,
)

## Fetching Data

We obtain all the data from the DataCite GraphQL API.


In [195]:
 # Generate the GraphQL query to retrieve up to 100 outputs of University of Oxford, with at least 100 views each.
query_params = {
    "rorId" : "https://ror.org/00k4n6c32",
    "funderId" : "https://doi.org/10.13039/501100000780",
    "repositoryId" : "bl.oxdb"
}

organizationQuery = gql("""query getOutputs($rorId: ID!)
{
  organization(id: $rorId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organisations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

funderQuery = gql("""query getOutputs($funderId: ID!)
{
  funder(id: $funderId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organisations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")

repositoryQuery = gql("""query getOutputs($repositoryId: ID!)
{
  repository(id: $repositoryId) {
    name
    dataManagementPlans {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: references(resourceTypeId: "dataset") {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
        organisations: contributors(contributorType: "HostingInstitution") {
          id
          name
          affiliation {
            id
          }
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          name: funderName
        }
        people: creators {
          id
          name
          affiliation {
            id
          }
        }
      }
    }
  }
}
""")
 

In [196]:
def get_data(type):
    if type == "organization":
        return client.execute(organizationQuery, variable_values=json.dumps(query_params))["organization"]["dataManagementPlans"]["nodes"]
    elif type == "funder":
        return client.execute(funderQuery, variable_values=json.dumps(query_params))["funder"]["dataManagementPlans"]["nodes"]
    else:
        return client.execute(repositoryQuery, variable_values=json.dumps(query_params))["repository"]["dataManagementPlans"]["nodes"]


## Data Transformation

Simple transformations are performed to convert the graphql response into an array that can be used..

In [197]:
def get_series_size(series_element):
    return len(series_element)

In [198]:
def get_dataset_nodes(series_element):
    return series_element['nodes']

In [199]:
def get_title(series_element):
    return series_element[0]['title']

In [200]:
def transform_dmps(dataframe):
    """Modifies each item to include attributes needed for the node visulisation

    Parameters:
    dataframe (dataframe): A dataframe with all the itemss
    parent (int): The id of the parent node

    Returns:
    dataframe:Returning vthe same dataframe with new attributes

   """
    if (dataframe) is None:
        return pd.DataFrame() 
    else: 
        dataframe = (dataframe >>
        mutate(
            datasets = X.datasets.apply(get_dataset_nodes)
        ))

        return (dataframe >>
        mutate(
            title = X.title.apply(get_title),
            id = X.id,
            datasets = X.datasets.apply(get_series_size),
            organisations = X.organisations.apply(get_series_size),
            funders = X.funders.apply(get_series_size),
            people = X.people.apply(get_series_size)
        ))
  

In [201]:
def processTable(type):
    data = get_data(type)
    if len(data) == 0:
        table = None
    else:
        table = pd.DataFrame(data,columns=data[0].keys())
    return transform_dmps(table).style

In [202]:
organization = processTable("organization")
funder = processTable("funder")
repository = processTable("repository")

In [203]:
organization

Unnamed: 0,id,title,datasets,organisations,funders,people
0,https://doi.org/10.5281/zenodo.3245354,EURHISFIRM D1.2: Data Management Plan (first version),0,0,1,3
1,https://doi.org/10.5281/zenodo.3245353,EURHISFIRM D1.2: Data Management Plan (first version),0,0,1,3
2,https://doi.org/10.5281/zenodo.3246339,EURHISFIRM D1.7: Second Data Management Plan,0,0,1,5
3,https://doi.org/10.5281/zenodo.3246338,EURHISFIRM D1.7: Second Data Management Plan,0,0,1,5
4,https://doi.org/10.5281/zenodo.3253683,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,0,0,1,8
5,https://doi.org/10.5281/zenodo.3253684,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,0,0,1,8
6,https://doi.org/10.5281/zenodo.3368558,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",0,0,1,1
7,https://doi.org/10.5281/zenodo.3368557,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",0,0,1,1
8,https://doi.org/10.5281/zenodo.3372460,D6.5 Data Management Plan,0,0,1,1
9,https://doi.org/10.5281/zenodo.3372459,D6.5 Data Management Plan,0,0,1,1


In [204]:
funder

Unnamed: 0,id,title,datasets,organisations,funders,people
0,https://doi.org/10.5281/zenodo.3245354,EURHISFIRM D1.2: Data Management Plan (first version),0,0,1,3
1,https://doi.org/10.5281/zenodo.3245353,EURHISFIRM D1.2: Data Management Plan (first version),0,0,1,3
2,https://doi.org/10.5281/zenodo.3246339,EURHISFIRM D1.7: Second Data Management Plan,0,0,1,5
3,https://doi.org/10.5281/zenodo.3246338,EURHISFIRM D1.7: Second Data Management Plan,0,0,1,5
4,https://doi.org/10.5281/zenodo.3253683,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,0,0,1,8
5,https://doi.org/10.5281/zenodo.3253684,European Collaboration for Healthcare Optimisation (ECHO) Data Model Specification,0,0,1,8
6,https://doi.org/10.5281/zenodo.3368558,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",0,0,1,1
7,https://doi.org/10.5281/zenodo.3368557,"REEEM-D6.6_Data Management Plan (DMP) - Collection, processing and dissemination of data",0,0,1,1
8,https://doi.org/10.5281/zenodo.3372460,D6.5 Data Management Plan,0,0,1,1
9,https://doi.org/10.5281/zenodo.3372459,D6.5 Data Management Plan,0,0,1,1


In [205]:
repository