## Publish a Pointset object

This example shows how to convert pointset data in CSV format into an Evo geoscience object using the Evo Python SDK.

### Requirements

You must have a Seequent account with the Evo entitlement to use this notebook.

The following parameters must be provided:

- The client ID of your Evo application.
- The callback/redirect URL of your Evo application.

To obtain these app credentials, refer to the [Apps and tokens guide](https://developer.seequent.com/docs/guides/getting-started/apps-and-tokens) in the Seequent Developer Portal.

In [None]:
import uuid

import pandas as pd
from evo_schemas.components import (
    BoundingBox_V1_0_1,
    CategoryAttribute_V1_1_0,
    ContinuousAttribute_V1_1_0,
    Crs_V1_0_1_EpsgCode,
    NanCategorical_V1_0_1,
    NanContinuous_V1_0_1,
)
from evo_schemas.elements import (
    FloatArray1_V1_0_1,
    FloatArray3_V1_0_1,
    IntegerArray1_V1_0_1,
    LookupTable_V1_0_1,
)
from evo_schemas.objects import Pointset_V1_2_0, Pointset_V1_2_0_Locations
from IPython.display import HTML, display

from evo.aio import AioTransport
from evo.common import APIConnector, Environment
from evo.common.utils.cache import Cache
from evo.notebooks import FeedbackWidget
from evo.oauth import ClientCredentialsAuthorizer, EvoScopes, OAuthConnector
from evo.objects import ObjectAPIClient

cache_location = "data"
input_path = f"{cache_location}/input"

# # Evo app credentials
# client_id = "daves-evo-client"
# redirect_url = "http://localhost:32369/auth/callback"

# manager = await ServiceManagerWidget.with_auth_code(
#     discovery_url="https://discover.api.seequent.com",
#     redirect_url=redirect_url,
#     client_id=client_id,
#     cache_location=cache_location,
# ).login()


cache = Cache(root=cache_location, mkdir=True)
org_id = "72748d4c-442f-4442-adf2-1d6dc8058f80"
workspace_id = "0e6fc58a-4673-4265-a0ae-bf9291279bce"
client_id = "service-WYZOIBAlEbvzZXv5UfU9CGqmD"
client_secret = "5ZmaZ73/8RhBIkHR1iPTtrCZzehgJfsVmO5VggXZbyPgJu2OPGS+VRS7ywvS9E4OB34KAzKcN6a1w77j9b0GjA=="
user_agent = "seequent-evo-app"
hub_url = "https://evo-demo.api.seequent.com"
user_id = "test-runner"

environment = Environment(hub_url=hub_url, org_id=uuid.UUID(org_id), workspace_id=uuid.UUID(workspace_id))

transport = AioTransport(user_agent=user_agent)
authorizer = ClientCredentialsAuthorizer(
    oauth_connector=OAuthConnector(
        transport=AioTransport(user_agent=user_agent),
        client_id=client_id,
        client_secret=client_secret,
    ),
    scopes=EvoScopes.all_evo,
)

await authorizer.authorize()

connector = APIConnector(base_url=environment.hub_url, transport=transport, authorizer=authorizer)
async with authorizer._unwrap_token() as token:
    print(token.access_token)

### Use the Evo Python SDK to create an object client and a data client

In [None]:
# The object client will manage your auth token and Geoscience Object API requests.
object_client = ObjectAPIClient(connector=connector, environment=environment)

# The data client will manage saving your data as Parquet and publishing your data to Evo storage.
data_client = object_client.get_data_client(cache=cache)

for obj in await object_client.list_all_objects():
    print(f"Found object: {obj.name} (ID: {obj.id})")

### Define helper functions

These functions assist with assembling the elements and components of geoscience objects and for viewing the new object in the Evo portal.

In [None]:
import numpy as np


def create_category_lookup_and_values(attribute):
    """
    Create a category lookup table and the associated column of mapped key values.

    Args:
        attribute (pd.DataFrame): An attribute of a geoscience object.

    Returns:
        table_df (pd.DataFrame): The category lookup table.
        values_df (pd.DataFrame): The associated column with mapped key values.
    """

    # Replace NaN with empty string
    attribute.replace(np.nan, "", regex=True, inplace=True)
    set_obj = set(attribute["data"])
    list_obj = list(set_obj)
    list_obj.sort()
    num_unique_elements = len(list_obj)

    # Create lookup table
    table_df = pd.DataFrame([])
    table_df["key"] = list(range(1, num_unique_elements + 1))
    table_df["value"] = list_obj

    # Create data column
    values_df = pd.DataFrame([])
    values_df["data"] = attribute["data"].map(table_df.set_index("value")["key"])
    return table_df, values_df


def build_portal_url(object_metadata):
    """
    Build and display a link to view the geoscience object in the Evo Portal.

    Args:
        object_metadata: The metadata object returned after creating the geoscience object.

    Returns:
        None. Displays an HTML link to the Evo Portal for the created object.
    """

    hub_url = object_metadata.environment.hub_url
    hub_name = hub_url.split("://")[1].split(".")[0]
    org_id = object_metadata.environment.org_id
    workspace_id = object_metadata.environment.workspace_id
    object_id = object_metadata.id

    url = f"https://evo.seequent.com/{org_id}/workspaces/{hub_name}/{workspace_id}/viewer?id={object_id}"

    display(HTML(f'<a href="{url}" target="_blank">View object in the Evo Portal</a>'))

### Define object metadata

Geoscience object data must conform to a specific object schema. The `evo-schemas` package provides Pydantic models that make it easy to work with the equivalent JSON schemas. 
For this example we'll use v1.2.0 of the pointset schema, via the relevant Pydantic model.

Enter values for these parameters that are required by the object schema.
- `object_name`: The name of the object.
- `object_path`: The file path where the object will be found.
- `object_epsg_code`: (Optional) The EPSG region code that matches the location of your data. Leave as `None` if not required.
- `object_tags`: (Optional) A dictionary of additional tags to be assigned to the object. Leave as `None` is not required.

In [None]:
object_name = "Pointset_SDK_demo"
object_path = "Jupyter_Example"
object_epsg_code = 32650
object_tags = {"Source": "Jupyter Notebook", "Evo SDK": "0.1.5"}

# Define the coordinate reference system (CRS) to be unspecified.
# coordinate_reference_system = "unspecified"
object_epsg_code = 32650

# Define a coordinate reference system (CRS) for the object.
coordinate_reference_system = Crs_V1_0_1_EpsgCode(epsg_code=object_epsg_code)

# Define input and output file paths.
input_file = f"{input_path}/WP_assay.csv"

# Load the input csv file.
input_df = pd.read_csv(input_file)

# Define the object path.
full_obj_path = f"{object_path}/{object_name}.json"

### Define object attributes and keys

In [None]:
# List all of the attributes to be included in the object. Every attribute must have a unique key associated with it.
# Keys must be unique across the entire object, and we recommend saving a reference to the keys for later use.
object_attributes = {
    "WP_assay": {
        "Hole ID": str(uuid.uuid4()),
        "CU_pct": str(uuid.uuid4()),
        "AU_gpt": str(uuid.uuid4()),
        "DENSITY": str(uuid.uuid4()),
    },
}

### Coordinates

In [None]:
# Create a dataframe for the coordinates.
coordinates_df = input_df[["X", "Y", "Z"]]

# Create a bounding box for the coordinates.
bounding_box = BoundingBox_V1_0_1(
    min_x=coordinates_df["X"].min(),
    max_x=coordinates_df["X"].max(),
    min_y=coordinates_df["Y"].min(),
    max_y=coordinates_df["Y"].max(),
    min_z=coordinates_df["Z"].min(),
    max_z=coordinates_df["Z"].max(),
)

# Save the coordinates dataframe to a parquet file.
coordinates = FloatArray3_V1_0_1.from_dict(data_client.save_dataframe(coordinates_df))

### Attribute columns

In [None]:
attributes = []

for heading_name, heading_key in object_attributes["WP_assay"].items():
    values_df = pd.DataFrame()
    values_df["data"] = input_df[heading_name]
    value_dtype = values_df["data"].dtype

    if value_dtype == "object":
        print(f"Treating {heading_name} as a category type")
        table_df, values_df = create_category_lookup_and_values(values_df)

        table = LookupTable_V1_0_1.from_dict(data_client.save_dataframe(table_df))
        values = IntegerArray1_V1_0_1.from_dict(data_client.save_dataframe(values_df))

        attribute = CategoryAttribute_V1_1_0(
            name=heading_name,
            nan_description=NanCategorical_V1_0_1(values=[]),
            key=heading_key,
            table=table,
            values=values,
        )

        attributes.append(attribute)

    else:
        print(f"Treating {heading_name} as a scalar type")
        values = FloatArray1_V1_0_1.from_dict(data_client.save_dataframe(values_df))
        attribute = ContinuousAttribute_V1_1_0(
            name=heading_name,
            nan_description=NanContinuous_V1_0_1(values=[]),
            key=heading_key,
            values=values,
        )

        attributes.append(attribute)

# Define the pointset locations component.
locations = Pointset_V1_2_0_Locations(coordinates=coordinates, attributes=attributes)

### Create a new pointset and publish it to Evo

In [None]:
# Lastly, assemble the complete geoscience object by combining all previously defined components.
# - The name and UUID are used to identify the object.
# - The UUID is set to None because this is a new object. A new UUID will be assigned by the Evo service.
# - The bounding box defines the spatial extent of the object.
# - The tags provide metadata about the object.
# - The coordinate reference system defines the spatial reference for the object.
# - The locations component contains the coordinates and attributes.

pointset = Pointset_V1_2_0(
    name=object_name,
    uuid=None,
    bounding_box=bounding_box,
    tags=object_tags,
    coordinate_reference_system=coordinate_reference_system,
    locations=locations,
)

# Upload the Parquet data to Evo.
await data_client.upload_referenced_data(pointset.as_dict(), FeedbackWidget("Uploading data"))

# Create the geoscience object.
new_pointset_metadata = await object_client.create_geoscience_object(full_obj_path, pointset.as_dict())

### View the object in the Evo portal

In [None]:
build_portal_url(new_pointset_metadata)

Success! You now have a new geoscience object in Evo containing your pointset data.

## Summary

In this example, we've completed the following:
* Analysed the coordinates and constructed the elements and components required for coordinates.
* Analysed the data columns and constructed the elements and components required for attribute.
* Converted the input coordinate and attribute data into Parquet format and saved it to the local cache.
* Combined all of the elements, components and data references into the pointset schema format.
* Uploaded the Parquet files and the newly assembled object in JSON format to Evo.