# MHKDR APIs Comparison

The objective of this notebook is to enumerate the variables contained in both the first and second MHKDR api, in order to decide how to include MHKDR data into the package without loss of data or duplication.

The fear is that both contain a subset of each other. In such a situation, the use of one but not both APIs would cause data loss. 

This notebook is considered complete when it has answered the questions:
1. Is there data contained in API 1 that is NOT present in API 2?
2. Is there data contained in API 2 that is NOT present in API 1 - We know "yes"
3. What is the cause of mismatch of the number of records in each API?
4. Is there any reason to prefer one or the other data source when the fields are similar?

### Setup

In [1]:
import requests
import json
import pandas as pd

mhkdr_api_1 = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
mhkdr_api_2 = 'https://mhkdr.openei.org/data.json'

In [2]:
mhkdr_1_response = requests.get(mhkdr_api_1)
mhkdr_2_response = requests.get(mhkdr_api_2)

mhkdr_1_response_json = mhkdr_1_response.json()
mhkdr_2_response_json = mhkdr_2_response.json()

### Dev

|1|both|2|
|:---:|:---:|:---:|
||||

In [3]:
len(mhkdr_1_response_json[0])

14

In [4]:
len(mhkdr_2_response_json['dataset'][0])

20

14 fields in api 1 vs 20 fields in api 2

In [5]:
mhkdr_1_response_json[0].keys()

dict_keys(['URI', 'type', 'landingPage', 'sourceURL', 'title', 'description', 'author', 'organization', 'originationDate', 'spatial', 'technologyType', 'tags', 'signatureProject', 'modifiedDate'])

In [6]:
print(mhkdr_2_response_json['dataset'][0].keys())

dict_keys(['@type', 'identifier', 'accessLevel', 'bureauCode', 'license', 'issued', 'dataQuality', 'title', 'description', 'keyword', 'projectLead', 'projectTitle', 'projectNumber', 'modified', 'publisher', 'contactPoint', 'programCode', 'landingPage', 'distribution', 'spatial'])


In [7]:
mhkdr_1_response_json[399]

{'URI': 'https://mhkdr.openei.org/submissions/1',
 'type': ['Dataset', 'Dataset/OnlineTool', 'Document/Report'],
 'landingPage': 'https://mhkdr.openei.org/submissions/1',
 'sourceURL': 'https://mhkdr.openei.org/submissions/1',
 'title': 'MHKDR Data Management and Best Practices for Submitters and Curators',
 'description': 'Resources for MHKDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the MHKDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting MHKDR metadata for federation or inclusion in their local catalogs.\n',
 'author': ['Jon Weers', 'Nicole Taverna', 'Jay Huggins', 'RJ Scavo'],
 'organization': ['National Renewable Energy Laboratory'],
 'originationDate': '2021-12-15 07:00:00',
 'spatial': {'boundingCoordinatesNE': [83, 180],
  'boundingCoordinatesSW': [-83, -180],
  'extent': 'boundingBox'}

In [8]:
mhkdr_2_response_json['dataset'][0]

{'@type': 'dcat:Dataset',
 'identifier': 'https://mhkdr.openei.org/submissions/1',
 'accessLevel': 'public',
 'bureauCode': ['019:20'],
 'license': 'https://creativecommons.org/licenses/by/4.0/',
 'issued': '2021-12-15T07:00:00Z',
 'dataQuality': True,
 'title': 'MHKDR Data Management and Best Practices for Submitters and Curators',
 'description': 'Resources for MHKDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the MHKDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting MHKDR metadata for federation or inclusion in their local catalogs.\n',
 'keyword': ['MHK',
  'Marine',
  'Hydrokinetic',
  'energy',
  'power',
  'data',
  'MHKDR',
  'federation',
  'metadata',
  'standards',
  'submission',
  'training',
  'best practices',
  'guide',
  'API',
  'management',
  'storage'],
 'projectLead': 'Bill M