# P2F Python Client Library Example

Welcome to the Client Library Example Notebook.

First, we need to import a few things, the P2F Client library and the p2f_pydantic library. The client library is an implementation of requests and will send and recieve pydantic objects between your computer and the API. 

## Pydantic?

Pydantic is a library that works on the back-end of the API to enforce data types and converting data structures to JSON.  

So why do you need it? So we can have the clients and the server in sync and agree on the definition of data. The library will also encode your local data into JSON for requests to send over the REST API standard in a fairly painless manner. 

In [None]:
from p2f_client.p2f_client import P2F_Client
import p2f_pydantic
import datetime

First, let's initialize a client, this method will probably get updated in the future, but below this is the initialization method. 

In [None]:
client = P2F_Client(hostname="localhost", port=8000, https=False)

## Datasets

Below we're going to start by uploading new datasets to the Portal. What is happening is I am creating a pydantic Datasets object for each of my three datasets. 

After that, there are two methods of uploading, what is shown here is `add_dataset`, which places the dataset into a queue that can then all be uploaded at once with `upload_datasets`. 

In [None]:
# Create Pydantic Objects
dataset1 = p2f_pydantic.datasets.Datasets(doi="10.1594/PANGAEA.920596",
                                          title="Last Glacial Maximum SST proxy collection and data assimilation",
                                          publication_date=datetime.datetime(2020, 7, 21),
                                          is_new_p2f=False,
                                          is_sub_dataset=False)

dataset2 = p2f_pydantic.datasets.Datasets(doi="10.5194/gmd-12-3149-2019-supplement",
                                          title="The DeepMIP contribution to PMIP4: methodologies for selection, compilation and analysis of latest Paleocene and early Eocene climate proxy data, incorporating version 0.1 of the DeepMIP database",
                                          publication_date=datetime.datetime(2019, 7, 25),
                                          is_new_p2f=False,
                                          is_sub_dataset=False)

dataset3 = p2f_pydantic.datasets.Datasets(doi="10.1594/PANGAEA.911847", 
                                          title="Sea surface temperature anomalies for Pliocene interglacial KM5c (PlioVAR)", 
                                          publication_date=datetime.datetime(2020, 2, 7), 
                                          is_new_p2f=False, 
                                          is_sub_dataset=False)

# Add datasets to Queue
client.datasets.add_dataset(dataset1)
client.datasets.add_dataset(dataset2)
client.datasets.add_dataset(dataset3)

# Upload datasets in the Queue
client.datasets.upload_datasets()

You can also just upload a dataset directly with `upload_dataset()`

In [None]:
dataset4 = p2f_pydantic.datasets.Datasets(doi="10.5194/gmd-12-3149-2019-supplement", 
                                          title="The DeepMIP contribution to PMIP4: methodologies for selection, compilation and analysis of latest Paleocene and early Eocene climate proxy data, incorporating version 0.1 of the DeepMIP database",
                                          sub_dataset_namee="SDF02_Sites.xlsx",
                                          publication_date=datetime.datetime(2019, 7, 25), 
                                          is_new_p2f=False, 
                                          is_sub_dataset=True)
client.datasets.upload_dataset(dataset4)

Now that we have datasets uploaded, we can check datasets that exist on the server.

In [None]:
# Get the list of datasets on the server
client.datasets.list_remote_datasets()

You can also delete a dataset, below I am just deleting the last dataset on the server. **Please don't do this in real usage**

In [None]:
client.datasets.delete_remote_dataset(client.datasets.list_remote_datasets()[-1].dataset_identifier)

## Adding a Table

Adding a table is going to be a bit more complex, we need to prepopulate our data types, and then iterate through our table. 

For this example we're going to look at one of the tables (sub-datasets) of the PlioVAR article from above. First let's add our dataset. 

In [None]:
pliovartab = p2f_pydantic.datasets.Datasets(doi="10.1594/PANGAEA.911847", 
                                            title="Sea surface temperature anomalies for Pliocene interglacial KM5c (PlioVAR)", 
                                            sub_dataset_name="PlioVAR-KM5c_T.tab", 
                                            publication_date=datetime.datetime(2020, 2, 7), 
                                            is_new_p2f=False, 
                                            is_sub_dataset=True)
pliovartab = client.datasets.upload_dataset(pliovartab)

For this dataset we have 20 columns, four of which can be grouped into being a location, and nine that are Sea Surface Temperature related. Let's check the API server for what data types exist for Sea Surface Temperatures. 

In [None]:
client.harm_data_type.list_data_types()

For me, the developer, there are currently no data types on the P2F API server, so I need to create them. 

Reading the documentation on the dataset I know I need to create a data type for each column

In [None]:
# Base SSTs
SST_UK37 = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature", 
                                                       unit_of_measurement="°C",
                                                       method="UK37",
                                                       is_proxy=True)
SST_UK37_BAYSPLINE = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature", 
                                                       unit_of_measurement="°C",
                                                       method="UK37 with Bayspline calibration",
                                                       is_proxy=True)
SST_MgCa = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature", 
                                                       unit_of_measurement="°C",
                                                       method="Ratios Mg/Ca",
                                                       is_proxy=True)
SST_MgCa_BAYMAG = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature", 
                                                       unit_of_measurement="°C",
                                                       method="Ratios Mg/Ca with BAYMAG calibration",
                                                       is_proxy=True)
SST_FORWARD = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature", 
                                                       unit_of_measurement="°C",
                                                       method="Forward-modelled 'core-top' Mg/Ca sea surface temperature from BAYMAG",
                                                       is_proxy=True)

# Anomalous SSTs
SSTA_UK37 = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature Anomaly", 
                                                       unit_of_measurement="°C",
                                                       method="UK37",
                                                       is_proxy=True)
SSTA_UK37_BAYSPLINE = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature Anomaly", 
                                                       unit_of_measurement="°C",
                                                       method="UK37 with Bayspline calibration",
                                                       is_proxy=True)
SSTA_MgCa = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature Anomaly", 
                                                       unit_of_measurement="°C",
                                                       method="Ratios Mg/Ca",
                                                       is_proxy=True)
SSTA_MgCa_BAYMAG = p2f_pydantic.harm_data_types.harm_data_type(measure="Sea Surface Temperature Anomaly", 
                                                       unit_of_measurement="°C",
                                                       method="Ratios Mg/Ca with BAYMAG calibration",
                                                       is_proxy=True)

After we create the data types, we upload them individually to the API so we can get their `datatype_id` back for use with our numerical uploads later. 

In [None]:
SST_UK37 = client.harm_data_type.upload_data_type(SST_UK37)
SSTA_UK37 = client.harm_data_type.upload_data_type(SSTA_UK37)
SST_UK37_BAYSPLINE = client.harm_data_type.upload_data_type(SST_UK37_BAYSPLINE)
SSTA_UK37_BAYSPLINE = client.harm_data_type.upload_data_type(SSTA_UK37_BAYSPLINE)
SST_MgCa = client.harm_data_type.upload_data_type(SST_MgCa)
SSTA_MgCa = client.harm_data_type.upload_data_type(SSTA_MgCa)
SST_MgCa_BAYMAG = client.harm_data_type.upload_data_type(SST_MgCa_BAYMAG)
SSTA_MgCa_BAYMAG = client.harm_data_type.upload_data_type(SSTA_MgCa_BAYMAG)
SST_FORWARD = client.harm_data_type.upload_data_type(SST_FORWARD)

In [None]:
# Confirm upload
client.harm_data_type.list_data_types()

Next let's load our dataset into Pandas. 

In [None]:
import pandas as pd

In [None]:
plio_df = pd.read_csv("Pliocene_SSTs/PlioVAR-KM5c_T.tab", skiprows=99, sep="\t")
plio_df

In [None]:
from pprint import pprint
import numpy as np

Here we will iterate through all of the records using `iterrows()` and then for each row of the table we will add in some information. 

**Note on row hash** - I was trying to create a universal and unique (kind of a UUID) way for a row to be referred to as without just using the row number but also to be recalculable over and over again. This may be a pointless thing to do here, but it is currently the way its implemented. 

The steps as seen below are:

1. Calculate a row hash, and insert the row as a data record that will continue to be referred to by its record_hash.
2. Create (or find an existing) location record and add to the database, then assign location to the data record.
3. Then for each data type, add the numerical records as long as they exist, if they are null, in my opinion it is best not to add in empty data (this can be a point for reconsideration).

In [None]:
# Get the UUID of the dataset after it was created by the database above
dataset_id = pliovartab.dataset_identifier

# Iterate through the rows using iterrows, ix is index and row is the rest of the content of the row
for ix, row in plio_df.iterrows():
    # Calculate the row hash
    row_hash = client.harm_data_records.calculate_hash(dataset_id=dataset_id,
                                                       row_number=ix,
                                                       debugging=True)
    # Create the data record object for uploading
    row_record = p2f_pydantic.harm_data_record.harm_data_record(fk_dataset=dataset_id,
                                                                record_hash=row_hash)
    # Upload the data record
    client.harm_data_records.upload_data_record(row_record)
    # Create the location object for uploading
    location = p2f_pydantic.harm_data_metadata.harm_location(location_name=row.iloc[1],
                                                             latitude=row.iloc[2],
                                                             longitude=row.iloc[3],
                                                             elevation=row.iloc[4], 
                                                             location_age=0)
    # Upload the location to the database
    location = client.harm_location.upload_harm_location(location) 
    # Assign that newly created location to this data record
    client.harm_location.assign_location_to_record(location.location_identifier, row_hash)

    # Create some lists that we can use indexes on to reuse code
    #     This first one has the data types from above so we can grab
    #      their datatype_id
    col_datatype = [SST_UK37, SST_UK37_BAYSPLINE, SST_MgCa, 
                    SST_MgCa_BAYMAG, SST_FORWARD, SSTA_UK37,
                    SSTA_UK37_BAYSPLINE, SSTA_MgCa, SSTA_MgCa_BAYMAG] 
    #     This one is the column number and this order matches the order
    #      of the data in the above list. 
    cols = [5, 6, 7, 8, 10, 12, 13, 14, 15]
    # Iterate through the column numbers
    for col in cols:
        # Check that the value is not null
        if not np.isnan(row.iloc[col]):
            # Create the numerical insert object for uploading
            new_numerical_insert = p2f_pydantic.harm_data_numerical.insert_harm_numerical(fk_data_record=row_hash,
                                                                                          fk_data_type=col_datatype[cols.index(col)].datatype_id,
                                                                                          numerical_type="FLOAT",
                                                                                          value=row.iloc[col])
            # Upload the numerical object
            client.harm_numerical.upload_harm_numerical(new_numerical_insert)