# Package Cleaning

The purpose of this notebook is to work through the current codebase of the primrea package and remove duplicate code, as the current codebase still includes the raw contents of the development notebooks used to create the first functionality.

1. This notebook will be complete when it has been used to test the package to ensure proper functionality, insofar as the code base no longer contains redundant code and functions properly.
2. This notebook will inform 2 other notebooks:
   1. package_testing
          This notebook will continue the theme of testing, learning from the checks made in the package_cleaning notebook, to implement tests into the package to ensure code reliability.
   3. package_integration
          This notebook will continue the main development trajectory of package_cleaning by composing the cleaned codebase into an integrated system. Here I will introduce the base class which creates the cleaned tables on initialization.
   

### Setup

In [1]:
import requests
import pandas as pd
import primrea.kh_table_gen.entry_based as entry_based_table_gen

In [2]:
mhkdr_api = 'https://mhkdr.openei.org/api?action=getSubmissionsForPRIMRE'
tethys_api = 'https://tethys.pnnl.gov/api/primre_export'
tethys_e_api = 'https://tethys-engineering.pnnl.gov/api/primre_export'

In [3]:
mhkdr_response = requests.get(mhkdr_api)
tethys_response = requests.get(tethys_api)       # Note: The tethys api grabs content specifically related to marine energy, and there is another API for wind energy.
tethys_e_response = requests.get(tethys_e_api)

mhkdr_response_json = mhkdr_response.json()
tethys_response_json = tethys_response.json()
tethys_e_response_json = tethys_e_response.json()

print(f'Number of MHKDR Entries: {len(mhkdr_response_json)}\nNumber of Tethys Entries: {len(tethys_response_json)}\nNumber of Tethys Engineering Entries: {len(tethys_e_response_json)}')

Number of MHKDR Entries: 400
Number of Tethys Entries: 4244
Number of Tethys Engineering Entries: 8240


In [4]:
mhkdr_dataframe = pd.DataFrame(mhkdr_response_json)
tethys_dataframe = pd.DataFrame(tethys_response_json)
tethys_e_dataframe = pd.DataFrame(tethys_e_response_json)

### Dev

In [5]:
help(entry_based_table_gen.find_entry_id)

Help on function find_entry_id in module primrea.kh_table_gen.entry_based:

find_entry_id(entry_uri)
    This function takes in the url of a MHKDR entry, and returns the entry_id of that page. 
    The 'entry_id' is the integer at the end of the url, which is unique to each MHKDR entry.
    The regex used in this function relies on the fact that the only number in the url is the id.



#### MHKDR

In [6]:
entry_based_table_gen.construct_authors_table(mhkdr_dataframe)

Unnamed: 0,entry_id,author
0,548,Marcus Lehmann
1,548,Ryan Davidson
2,547,Thomas Boerner
3,547,Nigel Kojimoto
4,547,Marcus Lehmann
...,...,...
1012,2,Tyler Mayer
1013,1,Jon Weers
1014,1,Nicole Taverna
1015,1,Jay Huggins


In [7]:
entry_based_table_gen.construct_organizations_table(mhkdr_dataframe)

Unnamed: 0,entry_id,organization
0,548,CalWave Power Technologies Inc.
1,547,CalWave Power Technologies Inc.
2,545,Pacific Northwest National Laboratory
3,543,Ocean Renewable Power Company
4,542,Ocean Renewable Power Company
...,...,...
395,14,"Dehlsen Associates, LLC"
396,5,"Dehlsen Associates, LLC"
397,3,"Dehlsen Associates, LLC"
398,2,"Dehlsen Associates, LLC"


In [8]:
entry_based_table_gen.construct_tags_table(mhkdr_dataframe)

Unnamed: 0,entry_id,tag
0,548,MHK
1,548,Marine
2,548,Hydrokinetic
3,548,energy
4,548,power
...,...,...
11166,1,best practices
11167,1,guide
11168,1,API
11169,1,management


#### Tethys

In [9]:
entry_based_table_gen.construct_authors_table(tethys_dataframe)

Unnamed: 0,entry_id,author
0,499,"Soukissian, T."
1,499,"Denaxa, D."
2,499,"Karathanasi, F."
3,499,"Prospathopoulos, A."
4,499,"Sarantakos, K."
...,...,...
15873,2078986,"Li, M."
15874,2078986,"Wolf, J."
15875,2078986,"Williams, A."
15876,2078986,"Badoe, C"


In [10]:
entry_based_table_gen.construct_organizations_table(tethys_dataframe)

Unnamed: 0,entry_id,organization
0,499,Hellenic Centre for Marine Research (HCMR)
1,499,National Technical University of Athens
2,500,BioPower Systems
3,501,Marine Sensing and Acoustic Technologies (MarS...
4,501,University of Algarve
...,...,...
7948,2078973,Blue Economy Cooperative Research Centre (CRC)
7949,2078986,University of Liverpool
7950,2078986,Swansea University
7951,2078986,National Oceanography Centre (NOC)


In [11]:
entry_based_table_gen.construct_tags_table(tethys_dataframe)

Unnamed: 0,entry_id,tag
0,499,Environment
1,499,Human Dimensions
2,500,Environment
3,500,Environmental Impact Assessment
4,501,Environment
...,...,...
13929,2078986,Environment
13930,2078986,Physical Environment
13931,2078986,Sediment Transport
13932,2078986,Changes in Flow


#### Tethys Engineering

In [12]:
entry_based_table_gen.construct_authors_table(tethys_e_dataframe)

Unnamed: 0,entry_id,author
0,4,"Träsch, M."
1,4,"Déporte, A."
2,4,"Delacroix, S."
3,4,"Germain, G."
4,4,"Drevet, J."
...,...,...
31960,22471,"Xu, M."
31961,17432,PRIMRE
31962,17438,France Energies Marines
31963,17526,France Energies Marines


In [13]:
entry_based_table_gen.construct_organizations_table(tethys_e_dataframe)

Unnamed: 0,entry_id,organization
0,6,University of Algarve
1,6,University of Cadiz
2,7,University of Oxford
3,8,University of Oxford
4,9,Egypt National Research Center
...,...,...
13675,22468,Chinese Academy of Sciences
13676,22469,Yale University
13677,22469,Vanderbilt University
13678,22470,Scripps Institution of Oceanography


In [14]:
entry_based_table_gen.construct_tags_table(tethys_e_dataframe)

Unnamed: 0,entry_id,tag
0,4,Engineering
1,4,Performance
2,4,Modeling
3,6,Engineering
4,6,Array Effects
...,...,...
30745,17438,Engineering
30746,17438,Mooring
30747,17526,Engineering
30748,17526,Substructure
