# MHKDR APIs Comparison

The objective of this notebook is to enumerate the variables contained in both the first and second MHKDR api, in order to decide how to include MHKDR data into the package without loss of data or duplication.

The fear is that both contain a subset of each other. In such a situation, the use of one but not both APIs would cause data loss. 

This notebook is considered complete when it has answered the questions:
1. Is there data contained in API 1 that is NOT present in API 2?
2. Is there data contained in API 2 that is NOT present in API 1 - We know "yes"
3. What is the cause of mismatch of the number of records in each API?
4. Is there any reason to prefer one or the other data source when the fields are similar?

### Setup

In [1]:
import requests
import json
import pandas as pd

mhkdr_api_1 = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
mhkdr_api_2 = 'https://mhkdr.openei.org/data.json'

In [2]:
mhkdr_1_response = requests.get(mhkdr_api_1)
mhkdr_2_response = requests.get(mhkdr_api_2)

mhkdr_1_response_json = mhkdr_1_response.json()
mhkdr_2_response_json = mhkdr_2_response.json()

### Dev

|1|both|2|
|:---:|:---:|:---:|
||||

In [15]:
mhkdr_1_df = pd.DataFrame(mhkdr_1_response_json)
mhkdr_2_df = pd.DataFrame(mhkdr_2_response_json['dataset'])

In [10]:
len(mhkdr_1_df)

400

In [16]:
len(mhkdr_2_df)

335

In [17]:
mhkdr_2_df

Unnamed: 0,@type,identifier,accessLevel,bureauCode,license,issued,dataQuality,title,description,keyword,...,projectTitle,projectNumber,modified,publisher,contactPoint,programCode,landingPage,distribution,spatial,DOI
0,dcat:Dataset,https://mhkdr.openei.org/submissions/1,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2021-12-15T07:00:00Z,True,MHKDR Data Management and Best Practices for S...,Resources for MHKDR data submitters and curato...,"[MHK, Marine, Hydrokinetic, energy, power, dat...",...,Marine and Hydrokinetic Data Repository (MHKDR),35007,2022-05-26T18:08:40Z,"{'@type': 'org:Organization', 'name': 'RJ Scavo'}","{'@type': 'vcard:Contact', 'fn': 'MHKDR Help',...",[019:009],https://mhkdr.openei.org/submissions/1,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-180,-83],[...",
1,dcat:Dataset,https://mhkdr.openei.org/submissions/2,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2015-06-03T06:00:00Z,True,Aquantis 2.5 MW Ocean Current Generation Devic...,Aquantis 2.5 MW Ocean Current Generation Devic...,"[MHK, Marine, Hydrokinetic, energy, power, Aqu...",...,Aquantis 2.5 MW Ocean Current Generation Device,EE0003643,2020-07-17T20:51:33Z,"{'@type': 'org:Organization', 'name': 'Tyler M...","{'@type': 'vcard:Contact', 'fn': 'David Arthur...",[019:009],https://mhkdr.openei.org/submissions/2,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-121.276567...",10.15473/1413995
2,dcat:Dataset,https://mhkdr.openei.org/submissions/3,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2015-06-03T06:00:00Z,True,Aquantis 2.5 MW Ocean Current Generation Devic...,Aquantis 2.5 MW Ocean Current Generation Devic...,"[MHK, Marine, Hydrokinetic, energy, power, Aqu...",...,Aquantis 2.5 MW Ocean Current Generation Device,EE0003643,2020-07-17T21:02:51Z,"{'@type': 'org:Organization', 'name': 'David C...","{'@type': 'vcard:Contact', 'fn': 'David Arthur...",[019:009],https://mhkdr.openei.org/submissions/3,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-121.012895...",10.15473/1415193
3,dcat:Dataset,https://mhkdr.openei.org/submissions/5,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2015-06-03T06:00:00Z,True,Aquantis 2.5 MW Ocean Current Generation Devic...,Items in this submission provide the detailed ...,"[MHK, Marine, Hydrokinetic, energy, power, des...",...,Aquantis 2.5 MW Ocean Current Generation Device,EE0003643,2020-07-17T20:59:17Z,"{'@type': 'org:Organization', 'name': 'Stephen...","{'@type': 'vcard:Contact', 'fn': 'David Arthur...",[019:009],https://mhkdr.openei.org/submissions/5,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-121.100785...",10.15473/1417292
4,dcat:Dataset,https://mhkdr.openei.org/submissions/14,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2015-06-03T06:00:00Z,True,Aquantis 2.5 MW Ocean Current Generation Devic...,Dataset contains MHK Hydrofoils Design and Opt...,"[MHK, Marine, Hydrokinetic, energy, power, geo...",...,Aquantis 2.5 MW Ocean Current Generation Device,EE0003643,2021-05-17T16:22:16Z,"{'@type': 'org:Organization', 'name': 'Case Va...","{'@type': 'vcard:Contact', 'fn': 'David Arthur...",[019:009],https://mhkdr.openei.org/submissions/14,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-121.584184...",10.15473/1417297
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
330,dcat:Dataset,https://mhkdr.openei.org/submissions/534,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2021-11-01T06:00:00Z,True,TEAMER: Biofouling Analysis for Wave Energy Pi...,Biofouling and corrosion are a major concern f...,"[MHK, Marine, Hydrokinetic, energy, power, wav...",...,Biofouling Analysis for Wave Energy Piston Design,EE0008895,2024-02-27T23:09:36Z,"{'@type': 'org:Organization', 'name': 'Linnea ...","{'@type': 'vcard:Contact', 'fn': 'Tyler Robert...",[019:009],https://mhkdr.openei.org/submissions/534,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-126.3495,4...",10.15473/2315037
331,dcat:Dataset,https://mhkdr.openei.org/submissions/535,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2024-02-26T07:00:00Z,True,TEAMER: Drifting Hydrophone System - Block Dia...,"This data release is part of TEAMER RFTS 2, wh...","[marine, energy, TEAMER, hydrophone, calibrati...",...,Testing Expertise and Access for Marine Energy...,EE0008895,2024-04-24T00:01:20Z,"{'@type': 'org:Organization', 'name': 'Joseph ...","{'@type': 'vcard:Contact', 'fn': 'James Turnbu...",[019:009],https://mhkdr.openei.org/submissions/535,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-124.044016...",10.15473/2339895
332,dcat:Dataset,https://mhkdr.openei.org/submissions/543,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2024-02-27T07:00:00Z,True,TidGen: Permits for Installation of Single Tur...,This is a summary of permits required and obta...,"[MHK, Marine, Hydrokinetic, energy, power, Cob...",...,Advanced TidGen Power System,EE0007820,2024-04-26T18:46:13Z,"{'@type': 'org:Organization', 'name': 'Katie S...","{'@type': 'vcard:Contact', 'fn': 'Jarlath McEn...",[019:009],https://mhkdr.openei.org/submissions/543,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-67.0391209...",
333,dcat:Dataset,https://mhkdr.openei.org/submissions/545,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2023-07-27T06:00:00Z,True,"Wave Measurements taken NW of Culebra Is., PR,...",Wave and sea surface temperature measurements ...,"[wave, puerto rico, sea surface temperature, w...",...,Co-locating Wave Energy with an Integrated Mul...,FY24 AOP 2.2.5.602,2024-04-01T14:59:21Z,"{'@type': 'org:Organization', 'name': 'Lysel G...","{'@type': 'vcard:Contact', 'fn': 'James McVey'...",[019:009],https://mhkdr.openei.org/submissions/545,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-65.3899,18...",10.15473/2331255


In [3]:
len(mhkdr_1_response_json[0])

14

In [4]:
len(mhkdr_2_response_json['dataset'][0])

20

14 fields in api 1 vs 20 fields in api 2

In [5]:
mhkdr_1_response_json[0].keys()

dict_keys(['URI', 'type', 'landingPage', 'sourceURL', 'title', 'description', 'author', 'organization', 'originationDate', 'spatial', 'technologyType', 'tags', 'signatureProject', 'modifiedDate'])

In [6]:
print(mhkdr_2_response_json['dataset'][0].keys())

dict_keys(['@type', 'identifier', 'accessLevel', 'bureauCode', 'license', 'issued', 'dataQuality', 'title', 'description', 'keyword', 'projectLead', 'projectTitle', 'projectNumber', 'modified', 'publisher', 'contactPoint', 'programCode', 'landingPage', 'distribution', 'spatial'])


In [7]:
mhkdr_1_response_json[399]

{'URI': 'https://mhkdr.openei.org/submissions/1',
 'type': ['Dataset', 'Dataset/OnlineTool', 'Document/Report'],
 'landingPage': 'https://mhkdr.openei.org/submissions/1',
 'sourceURL': 'https://mhkdr.openei.org/submissions/1',
 'title': 'MHKDR Data Management and Best Practices for Submitters and Curators',
 'description': 'Resources for MHKDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the MHKDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting MHKDR metadata for federation or inclusion in their local catalogs.\n',
 'author': ['Jon Weers', 'Nicole Taverna', 'Jay Huggins', 'RJ Scavo'],
 'organization': ['National Renewable Energy Laboratory'],
 'originationDate': '2021-12-15 07:00:00',
 'spatial': {'boundingCoordinatesNE': [83, 180],
  'boundingCoordinatesSW': [-83, -180],
  'extent': 'boundingBox'}

In [8]:
mhkdr_2_response_json['dataset'][0]

{'@type': 'dcat:Dataset',
 'identifier': 'https://mhkdr.openei.org/submissions/1',
 'accessLevel': 'public',
 'bureauCode': ['019:20'],
 'license': 'https://creativecommons.org/licenses/by/4.0/',
 'issued': '2021-12-15T07:00:00Z',
 'dataQuality': True,
 'title': 'MHKDR Data Management and Best Practices for Submitters and Curators',
 'description': 'Resources for MHKDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the MHKDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting MHKDR metadata for federation or inclusion in their local catalogs.\n',
 'keyword': ['MHK',
  'Marine',
  'Hydrokinetic',
  'energy',
  'power',
  'data',
  'MHKDR',
  'federation',
  'metadata',
  'standards',
  'submission',
  'training',
  'best practices',
  'guide',
  'API',
  'management',
  'storage'],
 'projectLead': 'Bill M

In [14]:
mhkdr_2_response_json

{'@context': 'https://openei.org/data.json',
 '@id': 'https://openei.org/data.json',
 '@type': 'dcat:Catalog',
 'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
 'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',
 'dataset': [{'@type': 'dcat:Dataset',
   'identifier': 'https://mhkdr.openei.org/submissions/1',
   'accessLevel': 'public',
   'bureauCode': ['019:20'],
   'license': 'https://creativecommons.org/licenses/by/4.0/',
   'issued': '2021-12-15T07:00:00Z',
   'dataQuality': True,
   'title': 'MHKDR Data Management and Best Practices for Submitters and Curators',
   'description': 'Resources for MHKDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the MHKDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting MHKDR metadata for federation or inclusion in their 