# Package Cleaning

The purpose of this notebook is to work through the current codebase of the primrea package and remove duplicate code, as the current codebase still includes the raw contents of the development notebooks used to create the first functionality.

1. This notebook will be complete when it has been used to test the package to ensure proper functionality, insofar as the code base no longer contains redundant code and functions properly.
2. This notebook will inform 2 other notebooks:
   1. package_testing
          This notebook will continue the theme of testing, learning from the checks made in the package_cleaning notebook, to implement tests into the package to ensure code reliability.
   3. package_integration
          This notebook will continue the main development trajectory of package_cleaning by composing the cleaned codebase into an integrated system. Here I will introduce the base class which creates the cleaned tables on initialization.
   

### Setup

In [1]:
import requests
import pandas as pd
import primrea.kh_table_gen.entry_based as entry_based_table_gen

In [2]:
mhkdr_api = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
tethys_api = 'https://tethys.pnnl.gov/api/primre_export'
tethys_e_api = 'https://tethys-engineering.pnnl.gov/api/primre_export'

In [3]:
mhkdr_response = requests.get(mhkdr_api)
tethys_response = requests.get(tethys_api)       # Note: The tethys api grabs content specifically related to marine energy, and there is another API for wind energy.
tethys_e_response = requests.get(tethys_e_api)

mhkdr_response_json = mhkdr_response.json()
tethys_response_json = tethys_response.json()
tethys_e_response_json = tethys_e_response.json()

print(f'Number of MHKDR Entries: {len(mhkdr_response_json)}\nNumber of Tethys Entries: {len(tethys_response_json)}\nNumber of Tethys Engineering Entries: {len(tethys_e_response_json)}')

Number of MHKDR Entries: 400
Number of Tethys Entries: 4241
Number of Tethys Engineering Entries: 8228


In [4]:
mhkdr_dataframe = pd.DataFrame(mhkdr_response_json)
tethys_dataframe = pd.DataFrame(tethys_response_json)
tethys_e_dataframe = pd.DataFrame(tethys_e_response_json)

### Dev

In [5]:
help(entry_based_table_gen.find_entry_id)

Help on function find_entry_id in module primrea.kh_table_gen.entry_based:

find_entry_id(entry_uri)
    This function takes in the url of a MHKDR entry, and returns the entry_id of that page. 
    The 'entry_id' is the integer at the end of the url, which is unique to each MHKDR entry.
    The regex used in this function relies on the fact that the only number in the url is the id.



In [6]:
entry_based_table_gen.construct_tags_table(mhkdr_dataframe)

Unnamed: 0,entry_id,tag
0,548,MHK
1,548,Marine
2,548,Hydrokinetic
3,548,energy
4,548,power
...,...,...
11166,1,best practices
11167,1,guide
11168,1,API
11169,1,management


In [7]:
entry_based_table_gen.construct_tags_table(tethys_dataframe)

Unnamed: 0,entry_id,tag
0,499,Environment
1,499,Human Dimensions
2,500,Environment
3,500,Environmental Impact Assessment
4,501,Environment
...,...,...
13908,2078958,EMF
13909,2078958,Habitat Change
13910,2078959,Environment
13911,2078959,Fish


In [8]:
entry_based_table_gen.construct_tags_table(tethys_e_dataframe)

Unnamed: 0,entry_id,tag
0,4,Engineering
1,4,Performance
2,4,Modeling
3,6,Engineering
4,6,Array Effects
...,...,...
30698,17438,Engineering
30699,17438,Mooring
30700,17526,Engineering
30701,17526,Substructure
