In [1]:
import os
view_server = os.environ.get("VIEW_SERVER","view-server")
url = os.environ.get("EGERIA_VIEW_SERVER_URL","https://host.docker.internal:9443")
user_id = os.environ.get("EGERIA_USER", "peterprofile")
user_pwd = os.environ.get("EGERIA_USER_PASSWORD")

from pyegeria import AutomatedCuration, ServerOps
import asyncio
import nest_asyncio
nest_asyncio.apply()

import time

ucServerTemplateGUID="dcca9788-b30f-4007-b1ac-ec634aff6879"  
ucCatalogTemplateGUID="5ee006aa-a6d6-411b-9b8d-5f720c079cae" 
ucSchemaTemplateGUID="5bf92b0f-3970-41ea-b0a3-aacfbf6fd92e"  
ucVolumeTemplateGUID="92d2d2dc-0798-41f0-9512-b10548d312b7"  
ucTableTemplateGUID="6cc1e5f5-4c1e-4290-a80e-e06643ffb13d"   
ucFunctionTemplateGUID="a490ba65-6104-4213-9be9-524e16fed8aa"

s_client = ServerOps("integration-daemon",url,'erinoverview')

austinHostURL="http://egeria.pdr-associates.com"
austinHostPort=8070


![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/main/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria and Unity Catalog demo

# Cataloguing Unity Catalog (UC)

## Introduction

Both Unity Catalog and Egeria are open source projects with the LF AI and Data.  The difference between these technologies is:

 * Unity Catalog is responsible for governing access to data; whereas Egeria governs the exchange of metadata between tools and systems, such as Unity Catalog.

 * Similarly, Unity Catalog maintains a metadata repository describing the data it is protecting.  In contrast, Egeria maintains a distributed network of metadata repositories containing metadata about the technology (systems, tools, data), the processes that are operating on them, along with the people and organizations involved.

This demo shows the new Egeria connectors that synchronize metadata between Unity Catalog (UC) and the open metadata ecosystem.  The setup is shown below:

![systems](unity-catalog-demo-systems.png)

Starting on the left hand side, you can see JupyterLab (that is running this notebook) with Unity Catalog above it.
Egeria's REST APIs are being called by the notebooks and commands running in JupyterLab. Egeria is, in turn, calling Unity Catalog via its REST API.  Egeria also produces events when metadata changes.  

Egeria's runtime is called the [OMAG Server Platform](https://egeria-project.org/concepts/omag-server-platform/).   It hosts OMAG Servers that are configured to perform certain tasks:

* Active Metadata Store manages the metadata repository (XTDB)
* View Server provides REST APIs for the python environment
* Integration Daemon synchronizes metadata with Unity Catalog
* Engine Host runs governance functions.

## Tell Egeria where Unity Catalog is

Egeria's metadata repository holds both the metadata it is synchronizing between systems as well as the configuration used to control its behaviour.  Therefore the first step to connect Unity Catalog and Egeria ia to create a metadata element in Egeria that describes the Unity Catalog Server.  

The code below uses a template to add details of the Unity Catalog Server to Egeria.  The template creates an [Asset](https://egeria-project.org/concepts/asset/) entity to represent the server linked to a [Connection](https://egeria-project.org/concepts/connection/) that is used by connectors in Egeria to call Unity Catalog.  It then connects this asset to the integration connector that is responsible for synchronizing metadata between Unity Catalog and Egeria.

---

In [2]:
egeria_client = AutomatedCuration(view_server, url, user_id, user_pwd)
token = egeria_client.create_egeria_bearer_token()

In [3]:
catalogUnityCatalogServerName="UnityCatalogGovernanceServices:catalog-unity-catalog-server"

requestParameters = {
    "hostURL" : "http://host.docker.internal",
    "portNumber" : "8080",
    "serverName" : "Unity Catalog 1",  
    "versionIdentifier" : "V1.0",
    "description" : "First instance of the Unity Catalog (UC) Server.",
    "serverUserId" : "uc1"
}

egeria_client.initiate_gov_action_type(catalogUnityCatalogServerName, None, None, None, requestParameters, None)


'fd47ca53-3856-4b3a-8506-8aeea973d95e'

----

The next code snippet adds the second unity catalog ...

----

In [4]:
requestParameters = {
    "hostURL" : austinHostURL,
        "portNumber" : austinHostPort,
        "serverName" : "Unity Catalog 2",
        "versionIdentifier" : "V1.0",
        "description" : "Second instance of the Unity Catalog (UC) Server.",
        "serverUserId" : "uc2"
}

egeria_client.initiate_gov_action_type(catalogUnityCatalogServerName, None, None, None, requestParameters, None)

'710905f1-b0ad-401a-9904-caa5e9880efd'

---

So there is a lot more you can do... but here are some take aways ...

* The Server Survey allows you to generate a high level summary of the contents of a UC server.   This is a useful to figure out what is worth cataloguing.   A regular scheduled run of this survey can show how the contents are changing over time.

* Cataloguing UC allows you to run searches over the content and also, I haven’t shown you this but you can perform more detailed surveys on individual resources in UC, and receive notifications from Egeria each time something changes in UC.

* Notice how dynamically configurable Egeria is.  This means its actions can be coded in your pipelines or scripts.  Plus you can see the results and act on them too.

* Egeria is extremely scalable.  It runs on a Raspberry Pi, so it can manage sensor data at the edge.  We have shown it here in a small team setting, plus it scales to a multi-tenant cloud environment - and all of these deployments can be linked together to share metadata.  It also can be its own deployment or embedded in other projects/products.

* Other processes running in Egeria can augment the metadata from UC, enhancing the level of governance that can be offered.  We are going to show this in the next call.

---

# Onboarding data safely into Unity Catalog (UC)

## Introduction

Whether Callie can process the data and for what type of processing requires more than access control.  She has access to the data (provided by UC) because she is on the project.  There are additional legal obligations - such as who it can be shared with, when it must be deleted, the type of data protection mechanisms that must be in place.  These are partly her responsibility, and partly the responsibility of others - and this coordination across different professionals/departments/tools  is difficult in a busy organization.

## Clinical Trials

Clinical trials are used to test that new treatments are both safe and effective.  They involve taking measurements from various patients both before and after they start the treatment.  Callie has to analysis these measurements as part of the package to submit to the regulators.

The data comes in from a variety of hospitals.  It is personally sensitive to the patients, of importance to the business and subject to regulatory control and so care is needed that:

* it has been collected correctly
* it is protected at all times
* the correct data sharing agreements are in place to provide legal cover, both for the hospitals and Coco Pharmaceuticals.

In this example, there are three hospitals supplying data:

![Onboarding Process](unity-catalog-onboarding-process.png)

## Setting up the clinical trail

The first step is to create the processes that will be used by the staff during the clinical trial.  
It uses generic process steps and creates processes for the clinical trial that are initialized with all of the correct values.  
This reduces the chance that someone will use the wrong value by accident.

---

In [5]:
token = egeria_client.create_egeria_bearer_token()

In [7]:
setUpClinicalTrialName="ClinicalTrials@CocoPharmaceuticals:set-up-clinical-trial"

projectGUID="a2915132-9d9a-4449-846f-43a871b5a6a0"


action_targets = [{
      "class" : "NewActionTarget",
      "actionTargetName": "clinicalTrialProject",
      "actionTargetGUID": projectGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "processOwner",
      "actionTargetGUID": "685b486b-d627-448a-9396-07b6c06da071"
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "custodian",
      "actionTargetGUID": "8fb16e47-77a1-4e31-9917-79d77d108b0c"
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "steward",
      "actionTargetGUID": "58e34d15-3043-4c1f-b148-3f0671b0a369"
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "onboardingPipeline",
      "actionTargetGUID": "58050a15-745a-4727-8257-074c19b02796"
    }]

requestParameters = {
    "dataLakeVolumeTemplateGUID" : ucVolumeTemplateGUID,
    "dataLakeSchemaTemplateGUID" : ucSchemaTemplateGUID,
    "dataLakeFileTemplateGUID" : "b2ec7c9d-3462-488a-897d-8e873658dded",
    "landingAreaDirectoryTemplateGUID" : "fbdd8efd-1b69-474c-bb6d-0a304b394146",
    "landingAreaFileTemplateGUID" : "5e5ffc97-237d-46c6-95c3-49405035dedc"
}

egeria_client.initiate_gov_action_type(setUpClinicalTrialName, None, action_targets, None, requestParameters, 'view-server')


InvalidParameterException: OMAG-GENERIC-HANDLERS-400-013 Unable to initiate an instance of the ClinicalTrials@CocoPharmaceuticals:set-up-clinical-trial governance action type because the name is not recognized

---

## Setting up the data lake resources in Unity Catalog (UC)

The first step in the demo is to create the schema and volume in the data lake as the destination for the files from the hospital.

----

In [None]:
token = egeria_client.create_egeria_bearer_token()

In [None]:
setUpDataLakeProcessName="ClinicalTrials:PROJ-CT-TBDF:set-up-data-lake"

dataLakeDirectoryPathName="/deployments/data/coco-data-lake/research/clinical-trials/drop-foot/weekly-measurements"
catalogGUID="1502f4ac-52d5-4f94-905d-64d6c8244579"

action_targets = [{
      "class" : "NewActionTarget",
      "actionTargetName": "dataLakeCatalog",
      "actionTargetGUID": catalogGUID
    }]

requestParameters = {
     "dataLakeSchemaName" : "teddy_bear_drop_foot",
     "dataLakeSchemaDescription" : "Data for the Teddy Bear Drop Foot Clinical Trial.",
     "dataLakeVolumeName" : "weekly_measurements",
     "dataLakeVolumeDescription" : "Weekly patient measurements",
     "dataLakeVolumeDirectoryPathName" : dataLakeDirectoryPathName
}

egeria_client.initiate_gov_action_type(setUpDataLakeProcessName, None, action_targets, None, requestParameters, None)

----

This process creates a description of the volume required in Egeria.  When the Integration Connectors next refresh, the volume is pushed into Unity Catalog (UC).

## Creating the onboarding pipelines

Once the volume is in place, the next step is to create the pipelines for the three hospitals.

----

In [None]:

onboardHospitalName = "ClinicalTrials:PROJ-CT-TBDF:nominate-hospital"
newFileProcessName="Coco:GovernanceActionProcess:ClinicalTrials:WeeklyMeasurements:Onboarding"
genericOnboardingProcessGUID="508d3878-8eae-47e5-8507-ee936f33b418"

oakDeneHospitalGUID="7905f803-7b7e-47c4-8b35-d0a0cfa47469"
oakDeneContactPerson="80bf48b0-5ef2-4294-950d-0c6fd568a1b2"
oldMarketHospitalGUID="fe8f4065-6664-4739-9438-3330909e6b98"
oldMarketContactPerson="fabc88d6-d28e-4e2d-9086-6affc8c45a7a"
hamptonHospitalGUID="c596f5c4-0aee-4fdc-969b-69fa26b72529"
hamptonContactPerson="e2bcf56b-f822-47d8-82fa-94cd2a5a772c"

landingAreaRootDirectoryName="landing-area"
oakDeneLandingAreaDirectoryName="landing-area/hospitals/oak-dene/clinical-trials/drop-foot"
oldMarketLandingAreaDirectoryName="landing-area/hospitals/old-market/clinical-trials/drop-foot"
hamptonLandingAreaDirectoryName="landing-area/hospitals/hampton/clinical-trials/drop-foot"



In [None]:
from datetime import datetime


actionTargets = [{
      "class" : "NewActionTarget",
      "actionTargetName": "hospital",
      "actionTargetGUID": oakDeneHospitalGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "hospitalContactPerson",
      "actionTargetGUID": oakDeneContactPerson
    }]


egeria_client.initiate_gov_action_process(onboardHospitalName, None, actionTargets, datetime.now(), None, None, None, "view-server")

In [None]:
actionTargets = [{
      "class" : "NewActionTarget",
      "actionTargetName": "hospital",
      "actionTargetGUID": oldMarketHospitalGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "hospitalContactPerson",
      "actionTargetGUID": oldMarketContactPerson
    }]


egeria_client.initiate_gov_action_process(onboardHospitalName, None, actionTargets, datetime.now(), None, None, None, "view-server")

In [None]:
actionTargets = [{
      "class" : "NewActionTarget",
      "actionTargetName": "hospital",
      "actionTargetGUID": hamptonHospitalGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "hospitalContactPerson",
      "actionTargetGUID": hamptonContactPerson
    }]


egeria_client.initiate_gov_action_process(onboardHospitalName, None, actionTargets, datetime.now(), None, None, None, "view-server")

In [None]:
actionTargets2 = [
    {
      "class" : "NewActionTarget",
      "actionTargetName": "clinicalTrialProject",
      "actionTargetGUID": projectGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "hospital",
      "actionTargetGUID": hamptonHospitalGUID
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "landingAreaConnector",
      "actionTargetGUID": "1b98cdac-dd0a-4621-93db-99ef5a1098bc"
    },
    {
      "class" : "NewActionTarget",
      "actionTargetName": "hospitalContactPerson",
      "actionTargetGUID": hamptonContactPerson
    }]

requestParameters2 = {
       "landingAreaDirectoryTemplateGUID" : landingAreaFolderTemplateGUID,
       "landingAreaDirectoryPathName" : hamptonLandingAreaDirectoryName,
       "landingAreaFileTemplateGUID" : landingAreaTemplateGUID,
       "dataLakeFileTemplateGUID" : dataLakeTemplateGUID,
       "destinationDirectory" : dataLakeDirectoryPathName,
       "newFileProcessName" : newFileProcessName
    }

egeria_client.initiate_gov_action_type(onboardHospitalName, None, actionTargets2, None, None, None)