# MHKDR Development

The purpose of this development notebook is to capture the relevant workflow that culminated in the table transformations generated from the MHKDR API call, which are deployed and continually developed in the software package. This notebook will be out of date if the API changes, or significant changes are made to the codebase that deviate from the table decomposition outlined and arrived at by this notebook. It may be helpful, however, as a detailed look at what is possible to draw from the API by doing a git checkout to understand the development cycle of the table composition arrived at here. This may also be a helpful starting palce for additional feature development using the API, and a git checkout to this commit may give a good starting place for additional branches that take development into these features using this development cycle as a starting place.

### Setup

In [1]:
import requests
import pandas as pd
import re

mhkdr_api = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
mhkdr_response = requests.get(mhkdr_api)
mhkdr_response_json = mhkdr_response.json()
mhkdr_dataframe = pd.DataFrame(mhkdr_response_json)

### Dev

In [2]:
mhkdr_response_json[0]

{'URI': 'https://mhkdr.openei.org/submissions/548',
 'type': ['Dataset', 'Document/Report'],
 'landingPage': 'https://mhkdr.openei.org/submissions/548',
 'sourceURL': 'https://mhkdr.openei.org/submissions/548',
 'title': 'CalWave - xWave Device, Non-Commercially Sensitive Project Report',
 'description': "CalWave has developed a submerged pressure differential type Wave Energy Converter (WEC) architecture called xWave. The single body device oscillates submerged, is positively buoyant, and taut moored to the sea floor and integrates novel features such as absorber submergence depth control. Since participation in the US Wave Energy Prize, CalWave has evolved the design and successfully concluded a scaled 10-month open ocean pilot. CalWave recently concluded the final design phase of a scaled up WEC version for PacWave and started component order/build of the WEC towards the grid-connected demonstration at PacWave.\n\nDocumentation here includes a Non-Commercially Sensitive Project Repo

In [3]:
mhkdr_dataframe

Unnamed: 0,URI,type,landingPage,sourceURL,title,description,author,organization,originationDate,spatial,technologyType,tags,signatureProject,modifiedDate
0,https://mhkdr.openei.org/submissions/548,"[Dataset, Document/Report]",https://mhkdr.openei.org/submissions/548,https://mhkdr.openei.org/submissions/548,"CalWave - xWave Device, Non-Commercially Sensi...",CalWave has developed a submerged pressure dif...,"[Marcus Lehmann, Ryan Davidson]",[CalWave Power Technologies Inc.],2024-02-29 07:00:00,"{'boundingCoordinatesNE': [44.63067800397145, ...",[Wave],"[MHK, Marine, Hydrokinetic, energy, power, wav...",[],2024-04-25 20:40:00
1,https://mhkdr.openei.org/submissions/547,"[Dataset, Document/Report, Dataset/Data]",https://mhkdr.openei.org/submissions/547,https://mhkdr.openei.org/submissions/547,CalWave - Reports and Plans for xWave Device D...,CalWave has developed a submerged pressure dif...,"[Thomas Boerner, Nigel Kojimoto, Marcus Lehman...",[CalWave Power Technologies Inc.],2024-02-29 07:00:00,"{'boundingCoordinatesNE': [44.69319166311689, ...",[Wave],"[MHK, Marine, Hydrokinetic, energy, power, Wav...",[],2024-04-25 20:48:08
2,https://mhkdr.openei.org/submissions/545,"[Dataset, Dataset/Archive]",https://mhkdr.openei.org/submissions/545,https://mhkdr.openei.org/submissions/545,"Wave Measurements taken NW of Culebra Is., PR,...",Wave and sea surface temperature measurements ...,"[James McVey, Molly Grear, Mikaela Freeman, Ly...",[Pacific Northwest National Laboratory],2023-07-27 06:00:00,"{'extent': 'point', 'coordinates': [18.3878, -...",[Wave],"[wave, puerto rico, sea surface temperature, w...",[],2024-04-01 14:59:21
3,https://mhkdr.openei.org/submissions/543,"[Dataset, Document/Report]",https://mhkdr.openei.org/submissions/543,https://mhkdr.openei.org/submissions/543,TidGen: Permits for Installation of Single Tur...,This is a summary of permits required and obta...,[Katie Sellers-Reynolds],[Ocean Renewable Power Company],2024-02-27 07:00:00,"{'boundingCoordinatesNE': [44.921183434206455,...",[],"[MHK, Marine, Hydrokinetic, energy, power, Cob...",[],2024-04-26 18:46:13
4,https://mhkdr.openei.org/submissions/542,"[Dataset, Document/Report]",https://mhkdr.openei.org/submissions/542,https://mhkdr.openei.org/submissions/542,TidGen: Single Turbine System (STS) Deployment...,This document provides a summary for the perfo...,[Liam Pillsbury],[Ocean Renewable Power Company],2024-01-05 07:00:00,"{'boundingCoordinatesNE': [44.92021102770398, ...",[],"[MHK, Marine, Hydrokinetic, energy, power, Tid...",[],2024-04-26 18:25:35
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
394,https://mhkdr.openei.org/submissions/14,"[Dataset, Document/Report, Dataset/Data]",https://mhkdr.openei.org/submissions/14,https://mhkdr.openei.org/submissions/14,Aquantis 2.5 MW Ocean Current Generation Devic...,Dataset contains MHK Hydrofoils Design and Opt...,"[Henry Shiu, Henry Swales, Case Van Damn]","[Dehlsen Associates, LLC]",2015-06-03 06:00:00,"{'boundingCoordinatesNE': [38.095016089622, -1...",[Current],"[MHK, Marine, Hydrokinetic, energy, power, geo...",[],2021-05-17 16:22:16
395,https://mhkdr.openei.org/submissions/5,"[Dataset, Dataset/Data, Document/Presentation,...",https://mhkdr.openei.org/submissions/5,https://mhkdr.openei.org/submissions/5,Aquantis 2.5 MW Ocean Current Generation Devic...,Items in this submission provide the detailed ...,"[Rich Banko, David Coakley, Dana Colegrove, Al...","[Dehlsen Associates, LLC]",2015-06-03 06:00:00,"{'boundingCoordinatesNE': [38.129592370005, -1...",[Current],"[MHK, Marine, Hydrokinetic, energy, power, des...",[],2020-07-17 20:59:17
396,https://mhkdr.openei.org/submissions/3,"[Dataset, Document/Report, Dataset/Data, Docum...",https://mhkdr.openei.org/submissions/3,https://mhkdr.openei.org/submissions/3,Aquantis 2.5 MW Ocean Current Generation Devic...,Aquantis 2.5 MW Ocean Current Generation Devic...,"[Henry Swales, Richard Banko, David Coakley]","[Dehlsen Associates, LLC]",2015-06-03 06:00:00,"{'boundingCoordinatesNE': [38.060423443594, -1...",[Current],"[MHK, Marine, Hydrokinetic, energy, power, Aqu...",[],2020-07-17 21:02:51
397,https://mhkdr.openei.org/submissions/2,"[Dataset, Dataset/Data, Document/Report, Docum...",https://mhkdr.openei.org/submissions/2,https://mhkdr.openei.org/submissions/2,Aquantis 2.5 MW Ocean Current Generation Devic...,Aquantis 2.5 MW Ocean Current Generation Devic...,"[Henry Swales, Ole Kils, David B. Coakley, Eri...","[Dehlsen Associates, LLC]",2015-06-03 06:00:00,"{'boundingCoordinatesNE': [38.129592370005, -1...",[Current],"[MHK, Marine, Hydrokinetic, energy, power, Aqu...",[],2020-07-17 20:51:33


### Cleaned 

In [4]:
def find_entry_id(entry_uri):
    '''
    This function takes in the url of a MHKDR entry, and returns the entry_id of that page. 
    The 'entry_id' is the integer at the end of the url, which is unique to each MHKDR entry.
    The regex used in this function relies on the fact that the only number in the url is the id.
    '''
    rule = re.compile(r'\d+')
    matches_rule = rule.findall(entry_uri)
    entry_id = int(matches_rule[0])

    return entry_id

In [5]:
def construct_authors_table(mhkdr_dataframe):
    '''
    This function creates a normalized table for the json element "author," connected to an "entry_id" that 
    may be called as a primary key to join this table to others. This disentangles the nested list structure
    present in the json to enable reporting e.g. associations among researchers, number of documents 
    attributed to each author.
    '''
    authors_of_entries = list(mhkdr_dataframe['author'])
    landing_page_uris = list(mhkdr_dataframe['URI'])
    
    entry_ids = list()  # This list will contain duplicate entry ids, as it represents the final column that will map to entry
    authors = list()    # This list will contain duplicate authors when an author contributes to multiple entries
    
    for i in range(0, len(mhkdr_dataframe)):
        # Construct "entry_id" - This will be a primary key for all future merge operations.
        entry_id = find_entry_id(landing_page_uris[i])
        
        # Construct "author" column
        num_authors = len(authors_of_entries[i])
        for j in range(0, num_authors):
            entry_ids.append(entry_id)
            authors.append(authors_of_entries[i][j])
    
    final_df = pd.DataFrame({'entry_id':entry_ids, 'author':authors})
    
    return final_df

In [6]:
def construct_organizations_table(mhkdr_dataframe):
    '''
    This function creates a normalized table for the json element "organization," connected to an "entry_id" that 
    may be called as a primary key to join this table to others. This disentangles the nested list structure
    present in the json to enable reporting e.g. associations among researchers, number of documents 
    attributed to each author.
    '''
    orgs_of_entries = list(mhkdr_dataframe['organization'])
    landing_page_uris = list(mhkdr_dataframe['URI'])
    
    entry_ids = list()  # This list will contain duplicate entry ids, as it represents the final column that will map to entry
    orgs = list()    # This list will contain duplicate authors when an author contributes to multiple entries
    
    for i in range(0, len(mhkdr_dataframe)):
        # Construct "entry_id" - This will be a primary key for all future merge operations.
        entry_id = find_entry_id(landing_page_uris[i])
        
        # Construct "organization" column
        org = orgs_of_entries[i][0]

        entry_ids.append(entry_id)
        orgs.append(org)
    
    final_df = pd.DataFrame({'entry_id':entry_ids, 'organization':orgs})
    
    return final_df

In [7]:
def construct_tags_table(mhkdr_dataframe):
    '''
    This function creates a normalized table for the json element "tags," connected to an "entry_id" that 
    may be called as a primary key to join this table to others. This disentangles the nested list structure
    present in the json to enable reporting e.g. associations among researchers, number of documents 
    attributed to each author.
    '''
    tags_of_entries = list(mhkdr_dataframe['tags'])
    landing_page_uris = list(mhkdr_dataframe['URI'])
    
    entry_ids = list()  # This list will contain duplicate entry ids, as it represents the final column that will map to entry
    tags = list()    # This list will contain duplicate authors when an author contributes to multiple entries
    
    for i in range(0, len(mhkdr_dataframe)):
        # Construct "entry_id" - This will be a primary key for all future merge operations.
        entry_id = find_entry_id(landing_page_uris[i])
        
        # Construct "tag" column
        num_tags = len(tags_of_entries[i])
        for j in range(0, num_tags):
            entry_ids.append(entry_id)
            tags.append(tags_of_entries[i][j])
    
    final_df = pd.DataFrame({'entry_id':entry_ids, 'tag':tags})
    
    return final_df

### Cleaned - Tests

In [8]:
construct_authors_table(mhkdr_dataframe)

Unnamed: 0,entry_id,author
0,548,Marcus Lehmann
1,548,Ryan Davidson
2,547,Thomas Boerner
3,547,Nigel Kojimoto
4,547,Marcus Lehmann
...,...,...
1011,2,Tyler Mayer
1012,1,Jon Weers
1013,1,Nicole Taverna
1014,1,Jay Huggins


In [9]:
construct_organizations_table(mhkdr_dataframe)

Unnamed: 0,entry_id,organization
0,548,CalWave Power Technologies Inc.
1,547,CalWave Power Technologies Inc.
2,545,Pacific Northwest National Laboratory
3,543,Ocean Renewable Power Company
4,542,Ocean Renewable Power Company
...,...,...
394,14,"Dehlsen Associates, LLC"
395,5,"Dehlsen Associates, LLC"
396,3,"Dehlsen Associates, LLC"
397,2,"Dehlsen Associates, LLC"


In [10]:
construct_tags_table(mhkdr_dataframe)

Unnamed: 0,entry_id,tag
0,548,MHK
1,548,Marine
2,548,Hydrokinetic
3,548,energy
4,548,power
...,...,...
11144,1,best practices
11145,1,guide
11146,1,API
11147,1,management
