# Projects Database Development

The purpose of this development notebook is to capture the relevant workflow that culminated in the table transformations generated from the ProjectsDB (calls), which are deployed and continually developed in the software package. This notebook will be out of date if the API changes, or significant changes are made to the codebase that deviate from the table decomposition outlined and arrived at by this notebook. It may be helpful, however, as a detailed look at what is possible to draw from the API by doing a git checkout to understand the development cycle of the table composition arrived at here. This may also be a helpful starting palce for additional feature development using the API, and a git checkout to this commit may give a good starting place for additional branches that take development into these features using this development cycle as a starting place.

For this set of relation tables, essentially all of the code can be applied to the 4 tables of this format. Becuase of this, the function below is designed to complete all of these tables, and must be run 4 times, once each with the respective download link for the ask query of interest. It would be __very helpful__ if I record the __link to the ask query interface alongside the variable for the download link__ of each table, so that it is easy to follow the link and interactively find errors in the source data that may be causing downstream failures.

### Setup

In [1]:
import requests
import pandas as pd

In [2]:
ask_query = 'https://openei.org/wiki/Special:Ask/-5B-5BCategory:PRIMRE-20Projects-20Database-20Devices-5D-5D/-3FName/-3FOrganization/mainlabel=/limit=500/prettyprint=true/format=csv'

Test link (100)

https://openei.org/w/index.php?title=Special:Ask&offset=0&limit=100&q=%5B%5BCategory%3APRIMRE+Projects+Database+Devices%5D%5D&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3D...-20further-20results%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FName%0A%3FOrganization%0A&eq=yes

In [3]:
test_query = 'https://openei.org/wiki/Special:Ask/-5B-5BCategory:PRIMRE-20Projects-20Database-20Devices-5D-5D/-3FName/-3FOrganization/mainlabel=/limit=100/format=csv'

### Dev

In [4]:
ask_response = requests.get(test_query)
ask_response_str = str(ask_response.text)

In [5]:
ask_response_str[0:200]

',Name,Organization\n"PRIMRE/Databases/Projects Database/Devices/40 South Energy R115","40 South Energy R115","40 South Energy"\n"PRIMRE/Databases/Projects Database/Devices/650 kW ZJU turbine","650 kW ZJ'

In [6]:
ask_response_str_splt1 = ask_response_str.split('\n')
ask_response_str_splt1_2 = ask_response_str_splt1[1:-1]

In [7]:
ask_response_str_splt1_2[0:5]

['"PRIMRE/Databases/Projects Database/Devices/40 South Energy R115","40 South Energy R115","40 South Energy"',
 '"PRIMRE/Databases/Projects Database/Devices/650 kW ZJU turbine","650 kW ZJU turbine","Zhejiang University,"',
 '"PRIMRE/Databases/Projects Database/Devices/AMOG Wave Energy Converter","AMOG Wave Energy Converter","Australian Marine Offshore Group"',
 '"PRIMRE/Databases/Projects Database/Devices/ANDRITZ Hydro HS1000","ANDRITZ Hydro HS1000","ANDRITZ Hydro"',
 '"PRIMRE/Databases/Projects Database/Devices/ANDRITZ Hydro HS1500","ANDRITZ Hydro HS1500","ANDRITZ Hydro"']

### Cleaned

In [8]:
def compose_ask_query_relational_table(ask_query):
    '''
    This Function is designed to 

    '''
    # Retrieve data from ask query
    ask_response = requests.get(ask_query)
    ask_response_str = str(ask_response.text)
    
    # Clean data
    ask_response_str_splt1 = ask_response_str.split('\n')
    ask_response_str_splt1_2 = ask_response_str_splt1[1:-1]
    
    all_data = list()
    for i in range(0, len(ask_response_str_splt1_2)):
        row = ask_response_str_splt1_2[i].split(',')
        
        # Clean funky "s that are sometimes included in the data.
        row_new = list()
        for i in range(1, 3):                # Clean only the elements that will form the columns we will use
            first_char = row[i][0]
            last_char = row[i][-1]
            if first_char == '"':
                if last_char == '"':
                    new_str = row[i][1:-1]
                else:
                    new_str = row[i][1:]
            else:
                if last_char == '"':
                    new_str = row[i][:-1]
                else:
                    new_str = row[i]
        
            row_new.append(new_str)
            
        all_data.append(row_new[0])
        all_data.append(row_new[1])
    
    device_nam_lst = all_data[0::2]
    device_org_lst = all_data[1::2]
    
    table = pd.DataFrame({'Device Name':device_nam_lst, 'Organization':device_org_lst})

    return table

### Cleaned - Tests

In [12]:
compose_ask_query_relational_table(test_query)

Unnamed: 0,Device Name,Organization
0,40 South Energy R115,40 South Energy
1,650 kW ZJU turbine,Zhejiang University
2,AMOG Wave Energy Converter,Australian Marine Offshore Group
3,ANDRITZ Hydro HS1000,ANDRITZ Hydro
4,ANDRITZ Hydro HS1500,ANDRITZ Hydro
...,...,...
80,Energy Island Lilypad,Energy Island Ltd
81,Energy Island OTEC,Energy Island Ltd
82,EnviroGen,New Energy Corporation
83,Etymol WEC,Etymol Ocean Power SpA
