![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/master/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Hands-On Lab
# Welcome to the Simple Cohort Demo Lab

## Introduction

Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogs and platforms together so they can share information (called metadata) about data and the technology that supports it.

In this hands-on lab you will set up four open metadata servers and connect them together via an [Open Metadata Repository Cohort](https://egeria-project.org/features/cohort-operation/overview/) (or cohort for short).  Then issue different types of requests to retrieve and update metadata.  The aim is to demonstrate how the cohort works and explore the business value of this capability.

The Egeria team use the personas and scenarios from the fictitious company called [Coco Pharmaceuticals](https://egeria-project.org/practices/coco-pharmaceuticals/).
As part of the huge business transformation that Coco Pharmaceuticals has embarked on, they
have decided to use Egeria to manage their metadata across the enterprise.


## The Scenario

[Polly Tasker](https://egeria-project.org/practices/coco-pharmaceuticals/personas/polly-tasker/),
the lead of IT development asked [Peter Profile](https://egeria-project.org/practices/coco-pharmaceuticals/personas/peter-profile/) and [Erin Overview](https://egeria-project.org/practices/coco-pharmaceuticals/personas/erin-overview/) from the governance team to give a series of talks and a demos about Egeria to her team, who will be building integration code to connect various tools and applications into Egeria.


<figure style="margin-left: 7%; display:inline-block;">  
  <img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/polly-tasker.png">
  <figcaption style="margin-left: 7%;"><strong>Polly Tasker</strong></figcaption>
</figure>

<figure style="margin-left: 7%; display:inline-block;">  
  <img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/peter-profile.png">
  <figcaption style="margin-left: 7%;"><strong>Peter Profile</strong></figcaption>
</figure>

<figure style="margin-left: 7%; display:inline-block;">  
  <img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/erin-overview.png">
  <figcaption style="margin-left: 7%;"><strong>Erin Overview</strong></figcaption>
</figure>

Coco Pharamaceuticals has always been pretty open to allowing their teams to choose how they work.  The result is that the different teams have their own tools deployed with little commonality or integration.  There is also variation in the level of governance that is in place.  The teams that report to their regulators have their specific processes, but other teams rarely think about monitoring and improving their processes or how they can work more effectively across the organization.

The proposed shift to personalized medicine requires faster cycle times and greater collaboration between teams.  Egeria's role is to help them in both aspects.  Peter and Erin plan to use this session to show how Egeria can connect different tools together to share information and add governance capabilities over and above the support they provide.

They have been asked to set up the demo on the **Development [OMAG Server Platform](https://egeria-project.org/concepts/omag-server-platform/)** so that the development team can run it themselves at a later time.

----

Peter first checks that the Development OMAG Server Platform is running.

----

In [None]:
%run ../common/common-functions.ipynb

print(" ")
result = checkServerPlatform(devPlatformName, devPlatformURL)
print(" ")



----

If the platform is not running, follow [this link to set up and run the platform](https://egeria-project.org/education/open-metadata-labs/overview/).  Then re-run the previous step to ensure the platform is started.

----

Egeria provides a set of standards for representing the structure, operation, people and assets of an organization.  These standards are called the [Open Metadata Types](https://egeria-project.org/types/).  Egeria then provides technology that:

* Acts as an adapter for a tool, mapping the data held by the tool into the open metadata type format.
* Manages the synchronization of data between the different tools using the open metadata type format.

<center>
    <img src="../images/egeria-as-tool-integrator.png">
</center>

Peter sets up three [metadata access stores](https://egeria-project.org/concepts/metadata-access-store/) to represent three different types of catalog tools that Coco Pharmaceuticals runs.  Each server has sample metadata pre-loaded through an [open metadata archive backup file](https://egeria-project.org/concepts/open-metadata-archive/) that represents the data in their corresponding catalog tool.  The metadata they are using comes from retail data they they have been using in testing.  There is no real pharmaceutical data used in the demos.

<center>
    <img src="../images/simple-catalog-demo-independent-catalogs.png">
</center>

| Server Name | Description  |
| :----------- | :------------ |
| SimpleAPICatalog | API metadata typically found in an API catalog. |
| SimpleDataCatalog | Data Source metadata typically found in an Data catalog. |
| SimpleEventCatalog | Event metadata typically found in an API catalog. |

----

In [None]:
simpleCohort = "simpleCohort"

def configureSimpleCohortCatalog(mdrServerName, mdrRepositoryType, metadataCollectionId, metadataCollectionName, archiveFileName):
    eventBusURLroot   = os.environ.get('eventBusURLroot', 'localhost:9092')
    eventBusBody      = {
        "producer": {
             "bootstrap.servers": eventBusURLroot
         },
         "consumer": {
             "bootstrap.servers": eventBusURLroot
         }
    }
    print("Configuring " + mdrServerName + "...")
    configurePlatformURL(devPlatformURL, adminUserId, mdrServerName, devPlatformURL)
    configureMaxPageSize(devPlatformURL, adminUserId, mdrServerName, '600')
    clearServerType(devPlatformURL, adminUserId, mdrServerName)
    configureOwningOrganization(devPlatformURL, adminUserId, mdrServerName, "Coco Pharmaceuticals Dev Systems")
    configureUserId(devPlatformURL, adminUserId, mdrServerName, "simpleMDSnpa")
    configurePassword(devPlatformURL, adminUserId, mdrServerName, "simpleMDSpassw0rd")
    configureEventBus(devPlatformURL, adminUserId, mdrServerName, eventBusBody)
    configureMetadataRepository(devPlatformURL, adminUserId, mdrServerName, mdrRepositoryType)
    configureDescriptiveName(devPlatformURL, adminUserId, mdrServerName, metadataCollectionName)
    configureMetadataCollectionId(devPlatformURL, adminUserId, mdrServerName, metadataCollectionId)
    removeAllStartupArchive(devPlatformURL, adminUserId, mdrServerName)
    addStartupArchive(devPlatformURL, adminUserId, mdrServerName, archiveFileName)
    deleteCohortMembership(devPlatformURL, adminUserId, mdrServerName, simpleCohort)
    configureAccessService(devPlatformURL, adminUserId, mdrServerName, 'data-manager', {})
    configureAccessService(devPlatformURL, adminUserId, mdrServerName, 'asset-owner', {})
    configureAccessService(devPlatformURL, adminUserId, mdrServerName, 'asset-consumer', {})


inMemoryRepositoryOption  = "in-memory-repository"
readOnlyRepositoryOption  = "read-only-repository"

simpleAPICatalog = "SimpleAPICatalog"
simpleDataCatalog = "SimpleDataCatalog"
simpleEventCatalog = "SimpleEventCatalog"

configureSimpleCohortCatalog(simpleAPICatalog, readOnlyRepositoryOption, "9e594f24-2494-4000-ac20-59f374eaa0e6", "Simple API Catalog", "../opt/content-packs/SimpleAPICatalog.omarchive")
configureSimpleCohortCatalog(simpleDataCatalog, readOnlyRepositoryOption, "2216ab62-176a-46c0-b889-9aa081754b54", "Simple Data Catalog", "../opt/content-packs/SimpleDataCatalog.omarchive")
configureSimpleCohortCatalog(simpleEventCatalog, readOnlyRepositoryOption, "e5114849-4341-4eab-b1b7-5a4b037363c4", "Simple Event Catalog", "../opt/content-packs/SimpleEventCatalog.omarchive")

print("\nDone.")


----

Now that the servers are configured, Peter starts them up to check that they are configured correctly ...

----

In [None]:
reActivatePlatform(devPlatformName, devPlatformURL, [simpleAPICatalog, simpleDataCatalog, simpleEventCatalog])

print("\nDone.")


----

Peter checks that the servers are running on the platform ...

----

In [None]:
queryActiveServers(devPlatformName, devPlatformURL)

----

## Reviewing the catalogs

The three catalogs are running independently, each providing their own variety of data to their users.


Erin begins with the data catalog.  She issues a query to list the assets in the data catalog.  It contains a database called *BRANCH*.  This has one database schema called *RETAILSCHEMA* that includes a *CUSTOMER* table.

The properties of the asset are displayed as follows:

<center>
    <img src="../images/simple-catalog-demo-asset-properties.png">
</center>

| Properties | Description  |
| :----------- | :------------ |
| Display Name | A simple name for display. |
| Unique Identifer (GUID) | Globally unique identifier assigned by the repository. |
| Unique Name  | A unique name assigned by the creator of the metadata. Known as the *qualifiedName*.|
| Type  | The open metadata type [DeployedDatabaseSchema](https://egeria-project.org/types/2/0224-Databases/) define the specialized properties (attributes) that can be added to a metadata element of the the type. |
| Super Types | The types that DeployedDatabaseSchema inherits from.  The attributes allowed in a metadata element are accumulated from its type and all of its super types. |


----

In [None]:
assetOwnerPrintAssets(simpleDataCatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")


When the detail for one of these assets is retrieved, it can be see that they are linked together.

This query retrieves details of *RETAILSCHEMA*.  You can see that *BRANCH* is shown as a related asset.  The name of the relationship is [DataContentForDataSet](https://egeria-project.org/types/2/0210-Data-Stores/) which indicates that the data returned for *RETAILSCHEMA* is stored in *BRANCH*.

----

In [None]:
printSelectiveAssetUniverse(simpleDataCatalog, devPlatformName, devPlatformURL, 'data-manager', erinsUserId, "4782e08b-043c-4017-9b2f-d63163f67fd8", False, False)

----

Erin then moves on the the event catalog.  It contains a description of an Apache Kafka Topic that handles events indicating that a customer status has changed.

---

In [None]:
assetOwnerPrintAssets(simpleEventCatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")


Finally, Erin queries the API catalog. It provides information about the APIs and their schema that are available for us.  In this simple demo, there is one *Customer API* defined in the catalog.

----

In [None]:
assetOwnerPrintAssets(simpleAPICatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")


## Connecting the catalogs with a cohort

At this point in the demo, each catalog is isolated.  If someone wanted to find out about all of the customer assets, they would have to query each catalog and aggregate the results.

An [Open Metadata Repository Cohort](https://egeria-project.org/features/cohort-operation/overview/) connects catalogs together and allows them to share metadata.  This means, all of the assets can be retrieved with a single query.

<center>
    <img src="../images/simple-catalog-demo-connected-catalogs.png">
</center>

Peter links the three catalogs together and restarts their servers so that they pick up the new configuration.

----

In [None]:

configureCohortMembership(devPlatformURL, adminUserId, simpleAPICatalog, simpleCohort)
configureCohortMembership(devPlatformURL, adminUserId, simpleEventCatalog, simpleCohort)
configureCohortMembership(devPlatformURL, adminUserId, simpleDataCatalog, simpleCohort)

reActivatePlatform(devPlatformName, devPlatformURL, [simpleAPICatalog, simpleDataCatalog, simpleEventCatalog])

queryActiveServers(devPlatformName, devPlatformURL)

print("\nDone.")


----
Each server is set up to connect to the "simpleCohort" cohort.  When a server connects to a cohort, it broadcasts registration information about itself to the other members.  Providing the existing members accept, the server becomes a new member of the cohort.

This is the registration information for *SimpleAPICatalog*.  The [*metadata collection id* (and optional *metadata collection name*)](https://egeria-project.org/concepts/metadata-collection/) are used to identify the origin of data.  The *URL for metadata queries* provides the network address that other members should use when routing requests to it.

----

In [None]:
printLocalRegistration(simpleAPICatalog, devPlatformName, devPlatformURL)

----

Below are the registration information for the other 2 servers.  Notice that each server has a unique *metadata collection id*.

---

In [None]:
printLocalRegistration(simpleDataCatalog, devPlatformName, devPlatformURL)

In [None]:
printLocalRegistration(simpleEventCatalog, devPlatformName, devPlatformURL)

----

When the servers register with the cohort, they broadcast their registration information.
This information is received by the other servers and stored in their local [cohort registry](https://egeria-project.org/features/cohort-operation/overview/#cohort-registration).  It is possible to query a server's cohort registry.  Below is the information from *SimpleAPICatalog's* cohort registry.
It lists the information from the other cohort members and the time that they first connected.

----

In [None]:
printRemoteRegistrations(simpleAPICatalog, simpleCohort, devPlatformName, devPlatformURL)

----

With the cohort in place, Erin shows that a query at any server returns all four of the assets.  This is the query for all assets made to the data catalog.

----

In [None]:
assetOwnerPrintAssets(simpleDataCatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")


Erin also shows that it is possible to get the asset detail for the Apache Kafka topic via the data catalog.   Notice the *origin* of the results.  Although *SimpleDataCatalog* was queried, the result came from *SimpleEventCatalog*.

<center>
    <img src="../images/simple-catalog-metadata-origin.png">
</center>

----

In [None]:
printSelectiveAssetUniverse(simpleDataCatalog, devPlatformName, devPlatformURL, 'data-manager', erinsUserId, "baded87e-7fe2-4d50-963c-b87178afc452", True, False)

----

## Linking metadata

For the purposes of this demo, the three catalogs have been configured *read only*, which means any attempt to add
new metadata will fail.  For example, this command attempts to add a comment to *RETAILSCHEMA*.

In [None]:
commentType = "QUESTION"
commentText = "Why isn't the table for the CUSTCARDID column in the catalog?"
isPublic    = True

commentGUID = addCommentToAsset(simpleDataCatalog, devPlatformName, devPlatformURL, erinsUserId, "4782e08b-043c-4017-9b2f-d63163f67fd8", commentText, commentType, isPublic)

print (" ")
if commentGUID:
    print ('Erin\'s comment guid is: ' + commentGUID)

----

This is not an uncommon situation.  Often tools do not support extensions to their repository's data model and so can not store information from other tools.  So Erin's query to the data catalog that returned all of the assets, issued the query to each server in the cohort (using the URLs passed in the cohort registration information) and then aggregated the results.  This is called a *federated query*.

Peter now configures a new catalog called *SimpleGovernanceCatalog* and adds it to the cohort.
This catalog uses the *In Memory* repository which is read/write so new metadata can be added.

<center>
    <img src="../images/simple-catalog-demo.png">
</center>

| Server Name | Description  |
| :----------- | :------------ |
| SimpleAPICatalog | API metadata typically found in an API catalog. |
| SimpleDataCatalog | Data Source metadata typically found in an Data catalog. |
| SimpleEventCatalog | Event metadata typically found in an API catalog. |
| SimpleGovernanceCatalog | Additional metadata to augment the other 3 catalogs. |

----

In [None]:
simpleGovernanceCatalog = "SimpleGovernanceCatalog"
configureSimpleCohortCatalog(simpleGovernanceCatalog, inMemoryRepositoryOption, "e915f2fa-aaac-4396-8bde-bcd65e642b1d", "Simple Governance Catalog", "../opt/content-packs/SimpleGovernanceCatalog.omarchive")
configureCohortMembership(devPlatformURL, adminUserId, simpleGovernanceCatalog, simpleCohort)

reActivatePlatform(devPlatformName, devPlatformURL, [simpleGovernanceCatalog])

queryActiveServers(devPlatformName, devPlatformURL)

print("\nDone.")

----

This is the registration information for the *SimpleGovernanceCatalog*: 

----

In [None]:
printLocalRegistration(simpleGovernanceCatalog, devPlatformName, devPlatformURL)

----

When Peter queries the cohort registry of the *SimpleDataCatalog*, the new *SimpleGovernanceCatalog* is shown as part of the cohort.

----

In [None]:
printRemoteRegistrations(simpleDataCatalog, simpleCohort, devPlatformName, devPlatformURL)

----

Erin shows that the *SimpleGovernanceCatalog* can retrieve all of the assets.

----

In [None]:
assetOwnerPrintAssets(simpleGovernanceCatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")

----

In addition to queries, the cohort is able to route create, update and delete requests to an appropriate server.  So with the *SimpleGovernanceCatalog* in the cohort, the command to add the comment to *RETAILSCHEMA* succeeds.

Notice the command is issued to *SimpleDataCatalog* !!!

----

In [None]:
commentGUID = addCommentToAsset(simpleDataCatalog, devPlatformName, devPlatformURL, erinsUserId, "4782e08b-043c-4017-9b2f-d63163f67fd8", commentText, commentType, isPublic)

print (" ")
if commentGUID:
    print ('Erin\'s comment guid is: ' + commentGUID)
    print (" ")


----

Erin then queries *RETAILSCHEMA* and the comment is there.  Notice the origin of the comment shows it is stored in *SimpleGovernanceCatalog*.

<center>
    <img src="../images/simple-catalog-metadata-origin-comment.png">
</center>

----

In [None]:
printSelectiveAssetUniverse(simpleAPICatalog, devPlatformName, devPlatformURL, 'data-manager', erinsUserId, "4782e08b-043c-4017-9b2f-d63163f67fd8", True, False)

----

Erin also points out the appearance of the [*LatestChange* classification](https://egeria-project.org/features/anchor-management/overview/#latestchange-classification) that records the last change to the asset or any of its dependent object.  This is part of Egeria's governance of the asset data it synchronizes.

The *SimpleGovernanceCatalog* also adds a glossary term called *UniqueCustomerIdentifier* to the cohort.  This glossary term describes the concept of a unique identifier for a customer.

----

In [None]:
meanings = findMeanings(simpleGovernanceCatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")

for meaning in meanings:
    meaningsProperties = meaning.get('meaningProperties')
    elementHeader = meaning.get('elementHeader')
    printName("", meaningsProperties.get('name'), elementHeader.get('guid'))
    propertyIndent = "    "
    printStringProperty(propertyIndent, "qualifiedName", meaningsProperties.get('qualifiedName'))
    printStringProperty(propertyIndent, "description", meaningsProperties.get('description'))
    elementType = elementHeader.get('type')
    printType(propertyIndent, elementType.get('typeName'), elementType.get('superTypeNames'))
    elementOrigin = elementHeader.get('origin')
    sourceServer = elementOrigin.get('sourceServer')
    metadataCollectionType = elementOrigin.get('originCategory')
    metadataCollectionId = elementOrigin.get('homeMetadataCollectionId')
    metadataCollectionName = elementOrigin.get('homeMetadataCollectionName')
    printMetadataCollection(propertyIndent, sourceServer, metadataCollectionType, metadataCollectionId, metadataCollectionName)
    


----

There are also [SemanticAssignment](https://egeria-project.org/types/3/0370-Semantic-Assignment/) relationships from this glossary term to each of the appropriate data fields of the assets .  

![Data Map](../images/simple-catalog-demo-data-map.png)

----

The semantic assignment relationships make it possible to query whch assets contain one or more data fields that are set to the unique customer identifier.

----

In [None]:
assetGUIDs = getAssetsByMeaning(simpleGovernanceCatalog, devPlatformName, devPlatformURL, erinsUserId, "4eecca25-b8b8-4d87-9f0d-31f8255b6c96")

if assetGUIDs:
    for x in range(len(assetGUIDs)):
        assetResponse = getAssetUniverse(simpleGovernanceCatalog, devPlatformName, devPlatformURL, "data-manager", erinsUserId, assetGUIDs[x])
        if assetResponse:
            asset = assetResponse.get('asset')
            if asset:
                printName("", asset.get('name'), asset.get('guid'))
                propertyIndent = "    "
                printStringProperty(propertyIndent, "qualifiedName", asset.get('qualifiedName'))
                printStringProperty(propertyIndent, "description", asset.get('description'))
                elementType = asset.get('type')
                printType(propertyIndent, elementType.get('typeName'), elementType.get('superTypeNames'))


----

Erin points out that the *BRANCH* asset is not returned because it does not have any links to the glossary term.

## Unregistering from the cohort

The final part of the demo is to show how the cohort behaves when servers unregister.


Peter issues the request to unregister the data catalog from the cohort.

----

In [None]:
unregisterFromCohort(simpleDataCatalog, simpleCohort, devPlatformName, devPlatformURL)

print("\nDone.")

----

Peter queries the *SimpleAPICatalog* to check that the *SimpleDataCatalog* has unregistered.

----

In [None]:
printRemoteRegistrations(simpleAPICatalog, simpleCohort, devPlatformName, devPlatformURL)

----

With only the *SimpleEventCatalog*, *SimpleAPICatalog* and *SimpleGovernanceCatalog* connected, Erin asks her audience "how many assets are available in the cohort?".  Most people thought the answer was two - the API asset and the Topic asset.

Erin issues the search command ...

----

In [None]:
assetOwnerPrintAssets(simpleAPICatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")


There is obvious surprise with the result.  The reason that *BRANCH* and *RETAILSCHEMA* are still returned is because the *SimpleGovernanceCatalog* is caching metadata from the other catalogs.  This is to ensure the metadata is available, even if the home repository is unavailable.

Erin shows the origin settings of the *SimpleDataCatalog*'s assets.  The provenance type has changed to DEREGISTERED_REPOSITORY to reflect that the originator is no longer a member of the cohort.  This is to warn consumers that the metadata may be out of date since the origin has gone.

----

In [None]:
printSelectiveAssetUniverse(simpleAPICatalog, devPlatformName, devPlatformURL, 'data-manager', erinsUserId, "4782e08b-043c-4017-9b2f-d63163f67fd8", True, False)

----

Next, Peter unregisters the SimpleGovernanceCatalog from the cohort.

----

In [None]:

unregisterFromCohort(simpleGovernanceCatalog, simpleCohort, devPlatformName, devPlatformURL)

print("\nDone.")

----

Now, the other servers no longer see the *SimpleGovernanceCatalog* in the cohort.

----

In [None]:
printRemoteRegistrations(simpleAPICatalog, simpleCohort, devPlatformName, devPlatformURL)

----

Erin asks how many assets will be returned and this time everyone got the answer correct - there are only two because the reference copies stored in the *SimpleGovernanceCatalog* are no longer reachable from the cohort.

----

In [None]:
assetOwnerPrintAssets(simpleAPICatalog, devPlatformName, devPlatformURL, erinsUserId, ".*")

----

## Summary

This demo shows how Egeria is able to combine from multiple tools into a single query as well as augment their content, enabling both collaboration and governance.  With Egeria, organizations are not restricted to a single tool suite.  They can use best-of-breed tools, suited to the work that people do, whilst still collaborating and sharing knowledge.


## Where to next?

* The [Cohort Operation](https://egeria-project.org/features/cohort-operation/overview/) web page describes the inner workings of the cohort.
* The [Representing metadata lab](metadata-representations.ipynb) shows how different types of data are represented in the [Open Metadata Types](https://egeria-project.org/types/).
* The [Governance server operation lab](governance-server-operation.ipynb) shows how [integration connectors](https://egeria-project.org/concepts/integration-connector/) and [governance action services](https://egeria-project.org/concepts/governance-action-service/) run on Egeria's [Governance Servers](https://egeria-project.org/concepts/governance-server/).
* The [Building a data catalog lab](../asset-management-labs/building-a-data-catalog.ipynb) shows more examples of governance metadata that can be added to assets.

----