# MHKDR Integration

The purpose of this notebook is to integrate MHKDR API results into the primrea package using the plan outlined in the most recent iterations of the database diagram.

### Setup

In [1]:
import pandas as pd
import requests
import primrea.core
from primrea import *

In [2]:
mhkdr_1_api = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
mhkdr_2_api = 'https://mhkdr.openei.org/data.json'

In [3]:
mhkdr_1_df = primrea.core.api_to_df(mhkdr_1_api)
#mhkdr_2_df = primrea.core.api_to_df(mhkdr_2_api)

In [4]:
mhkdr_2_response = requests.get(mhkdr_2_api)
mhkdr_2_response_json = mhkdr_2_response.json()
mhkdr_2_df = pd.DataFrame(mhkdr_2_response_json['dataset'])

### Dev

In [5]:
mhkdr_1_df.head(1)

Unnamed: 0,URI,type,landingPage,sourceURL,title,description,author,organization,originationDate,spatial,technologyType,tags,signatureProject,modifiedDate
0,https://mhkdr.openei.org/submissions/553,"[Dataset, Dataset/Archive, Dataset/OnlineTool]",https://mhkdr.openei.org/submissions/553,https://mhkdr.openei.org/submissions/553,HERO WEC V1.0 - WEC-Sim Model (July 2024),**This submission supersedes submission MHKDR-...,"[Justin Panzarella, Toan Tran, Scott Jenne]",[National Renewable Energy Laboratory],2024-07-01 06:00:00,"{'extent': 'point', 'coordinates': [39.9146828...",[Wave/Point Absorber],"[MHK, Marine, Hydrokinetic, energy, power, HER...",[WEC-Sim],2024-07-17 16:50:33


In [6]:
mhkdr_2_df.head(1)

Unnamed: 0,@type,identifier,accessLevel,bureauCode,license,issued,dataQuality,title,description,keyword,...,projectTitle,projectNumber,modified,publisher,contactPoint,programCode,landingPage,distribution,spatial,DOI
0,dcat:Dataset,https://mhkdr.openei.org/submissions/1,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2021-12-15T07:00:00Z,True,MHKDR Data Management and Best Practices for S...,Resources for MHKDR data submitters and curato...,"[MHK, Marine, Hydrokinetic, energy, power, dat...",...,Marine and Hydrokinetic Data Repository (MHKDR),35007,2022-05-26T18:08:40Z,"{'@type': 'org:Organization', 'name': 'RJ Scavo'}","{'@type': 'vcard:Contact', 'fn': 'MHKDR Help',...",[019:009],https://mhkdr.openei.org/submissions/1,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-180,-83],[...",


We only want 5 variables from 
```python
mhkdr_df_2
```
 - license
 - issued
 - projectTitle
 - projectNumber
 - contactPoint

In [11]:
mhkdr_df = primrea.kh_table_gen.entry_based.construct_ts_core_table(mhkdr_1_df)
mhkdr_df.head(1)

Unnamed: 0,entry_id,originationDate,modifiedDate,URI,landingPage,sourceURL,title,description,signatureProject
0,553,2024-07-01 06:00:00,2024-07-17 16:50:33,https://mhkdr.openei.org/submissions/553,https://mhkdr.openei.org/submissions/553,https://mhkdr.openei.org/submissions/553,HERO WEC V1.0 - WEC-Sim Model (July 2024),**This submission supersedes submission MHKDR-...,[WEC-Sim]


In [12]:
mhkdr_2_df.keys()

Index(['@type', 'identifier', 'accessLevel', 'bureauCode', 'license', 'issued',
       'dataQuality', 'title', 'description', 'keyword', 'projectLead',
       'projectTitle', 'projectNumber', 'modified', 'publisher',
       'contactPoint', 'programCode', 'landingPage', 'distribution', 'spatial',
       'DOI'],
      dtype='object')

In [13]:
mhkdr_2_df.head(1)

Unnamed: 0,@type,identifier,accessLevel,bureauCode,license,issued,dataQuality,title,description,keyword,...,projectTitle,projectNumber,modified,publisher,contactPoint,programCode,landingPage,distribution,spatial,DOI
0,dcat:Dataset,https://mhkdr.openei.org/submissions/1,public,[019:20],https://creativecommons.org/licenses/by/4.0/,2021-12-15T07:00:00Z,True,MHKDR Data Management and Best Practices for S...,Resources for MHKDR data submitters and curato...,"[MHK, Marine, Hydrokinetic, energy, power, dat...",...,Marine and Hydrokinetic Data Repository (MHKDR),35007,2022-05-26T18:08:40Z,"{'@type': 'org:Organization', 'name': 'RJ Scavo'}","{'@type': 'vcard:Contact', 'fn': 'MHKDR Help',...",[019:009],https://mhkdr.openei.org/submissions/1,"[{'@type': 'dcat:Distribution', 'description':...","{""type"":""Polygon"",""coordinates"":[[[-180,-83],[...",


In [14]:
# Constructing the entry_id data
mhkdr_2_df_len = len(mhkdr_2_df)

entry_ids = list()
for i in range(0, mhkdr_2_df_len):
    entry_id = primrea.kh_table_gen.entry_based.find_entry_id(mhkdr_2_df['identifier'][i])
    entry_ids.append(entry_id)

In [15]:
mhkdr_2_df['issued'] = pd.to_datetime(mhkdr_2_df['issued'])
mhkdr_2_df['entry_id'] = entry_ids

In [16]:
mhkdr_2_df_matured = mhkdr_2_df[['entry_id', 'license', 'issued', 'projectTitle', 'projectNumber']]
mhkdr_2_df_matured.head(1)

Unnamed: 0,entry_id,license,issued,projectTitle,projectNumber
0,1,https://creativecommons.org/licenses/by/4.0/,2021-12-15 07:00:00+00:00,Marine and Hydrokinetic Data Repository (MHKDR),35007


In [33]:
len(mhkdr_df)

404

In [34]:
len(mhkdr_2_df)

340

## Error Identification and Investigation

In [17]:
len(mhkdr_2_df_matured)

340

In [18]:
len(mhkdr_df)

404

In [25]:
#c = mhkdr_df.merge(mhkdr_2_df_matured, on='entry_id', how='outer')
c = mhkdr_df.merge(mhkdr_2_df_matured, on='entry_id')

In [26]:
len(c)

340

In [21]:
c.head(1)

Unnamed: 0,entry_id,originationDate,modifiedDate,URI,landingPage,sourceURL,title,description,signatureProject,license,issued,projectTitle,projectNumber
0,553,2024-07-01 06:00:00,2024-07-17 16:50:33,https://mhkdr.openei.org/submissions/553,https://mhkdr.openei.org/submissions/553,https://mhkdr.openei.org/submissions/553,HERO WEC V1.0 - WEC-Sim Model (July 2024),**This submission supersedes submission MHKDR-...,[WEC-Sim],https://creativecommons.org/licenses/by/4.0/,2024-07-01 06:00:00+00:00,Wave-Powered Desalination Deployment & Analysis,FY23 AOP 2.2.6.404


#### Trying to understand how we +1 row

In [31]:
(c.merge(mhkdr_df, on='entry_id', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1)).head()

  .drop('_merge', 1)).head()


Unnamed: 0,entry_id,originationDate_x,modifiedDate_x,URI_x,landingPage_x,sourceURL_x,title_x,description_x,signatureProject_x,license,...,projectTitle,projectNumber,originationDate_y,modifiedDate_y,URI_y,landingPage_y,sourceURL_y,title_y,description_y,signatureProject_y
340,547,NaT,NaT,,,,,,,,...,,,2024-02-29 07:00:00,2024-04-25 20:48:08,https://mhkdr.openei.org/submissions/547,https://mhkdr.openei.org/submissions/547,https://mhkdr.openei.org/submissions/547,CalWave - Reports and Plans for xWave Device D...,CalWave has developed a submerged pressure dif...,[]
341,542,NaT,NaT,,,,,,,,...,,,2024-01-05 07:00:00,2024-05-21 21:06:46,https://mhkdr.openei.org/submissions/542,https://mhkdr.openei.org/submissions/542,https://mhkdr.openei.org/submissions/542,TidGen: Single Turbine System (STS) Deployment...,This document provides a summary for the perfo...,[]
342,541,NaT,NaT,,,,,,,,...,,,2023-02-20 07:00:00,2024-05-21 21:05:40,https://mhkdr.openei.org/submissions/541,https://mhkdr.openei.org/submissions/541,https://mhkdr.openei.org/submissions/541,TidGen: Single Turbine Subsystem (STS) Device ...,This document provides the process details for...,[]
343,540,NaT,NaT,,,,,,,,...,,,2022-10-19 06:00:00,2024-04-26 15:15:37,https://mhkdr.openei.org/submissions/540,https://mhkdr.openei.org/submissions/540,https://mhkdr.openei.org/submissions/540,TidGen: Single Turbine System Mooring Design O...,This document summarizes the design of the Sin...,[]
344,539,NaT,NaT,,,,,,,,...,,,2024-01-11 07:00:00,2024-05-21 20:57:50,https://mhkdr.openei.org/submissions/539,https://mhkdr.openei.org/submissions/539,https://mhkdr.openei.org/submissions/539,TidGen: Single Turbine System Performance Anal...,This document offers a detailed performance an...,[]


In [28]:
(c.merge(mhkdr_2_df, on='entry_id', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))

  .drop('_merge', 1))


Unnamed: 0,entry_id,originationDate,modifiedDate,URI,landingPage_x,sourceURL,title_x,description_x,signatureProject,license_x,...,projectTitle_y,projectNumber_y,modified,publisher,contactPoint,programCode,landingPage_y,distribution,spatial,DOI


In [29]:
mhkdr_2_df_matured[mhkdr_2_df_matured['entry_id']==529]

Unnamed: 0,entry_id,license,issued,projectTitle,projectNumber
328,529,https://creativecommons.org/licenses/by/4.0/,2023-09-07 06:00:00+00:00,Applied Research and Development to Support Op...,EE0009969


In [30]:
c[c['entry_id']==529]

Unnamed: 0,entry_id,originationDate,modifiedDate,URI,landingPage,sourceURL,title,description,signatureProject,license,issued,projectTitle,projectNumber
11,529,2023-09-07 06:00:00,2024-07-18 15:30:01,https://mhkdr.openei.org/submissions/529,https://mhkdr.openei.org/submissions/529,https://mhkdr.openei.org/submissions/529,Cone Penetration Tests at the PacWave South Te...,This ZIP archive contains cone penetration tes...,[],https://creativecommons.org/licenses/by/4.0/,2023-09-07 06:00:00+00:00,Applied Research and Development to Support Op...,EE0009969


From the prior experimentation, we can see that the observation is present in the **second** API but not the first. When I did my testing, this did not seem possible. Bookmark for reference and discussion with Jonathan at the next check-in.

### We can see from the code above, that the issue above was resolved.
After one or two days, the issue with entry_id 529 has been resolved. The problem was that this entry had a response from the Resources API but not the PRIMRE API. This caused some concern that there was a mistake in the PRIMRE API code for MHKDR, that could cause later situations where an entry was only added to the "Resources" API without being added to the PRIMRE API at all. If this was the case, I would need to alter the table designs accordingly to fit this, or (more likely) email Jon and try to get this fixed, while maintaining my structure, and possibly putting this work on hold for a time until this issue was better understood.

Because the issue was resolved on its own, we have reason to believe that there is no oversight in the MHKDR API adding process. I believe this is reason enough to continue the process as I had intended, possibly creating error handling to handle if there are entries in the Resources API but not the PRIMRE API, and excluding such entries from the results of the primera package. Then, once the entry is correctly added to the PRIMRE API, it will automatically be included in the primrea output.