# Interact with the Purview API using Python
In this demo, you will explore how to interact with Azure Purview's Atlas API by creating custom assets and custom lineage between them. 
You can find a [tutorial on Microsoft Docs](https://docs.microsoft.com/en-us/azure/purview/tutorial-using-rest-apis) on interacting with the REST API directly, however, in this notebook you simplify the process by making use of the [PyApacheAtlas](https://github.com/wjohnson/pyapacheatlas) Python package.
![PyApacheAtlas](./img/pyapacheatlas.png)

In [None]:
# First, start by installing PyApacheAtlas
!pip install pyapacheatlas

## Authenticate with Purview
In order to authenticate with Purview, you can use a service principal that you have given access to your Purview instance. If you don't yet have a service principal set up, you can follow the steps described in section [Configure your catalog to trust the service principal](https://docs.microsoft.com/en-us/azure/purview/tutorial-using-rest-apis#configure-your-catalog-to-trust-the-service-principal-application) of the REST API tutorial.

Before running the next cell, make sure to add your `tenant_id`, `client_id`, `client_secret` and `account_name`.

In [None]:
from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core import PurviewClient

# Create authentication
# The parameters can be found in the 'Overview' and 'Certificates & secrets' pages of your service principal
auth = ServicePrincipalAuthentication(
    tenant_id = '', # Add directory (tenant) ID of your service principal (Overview page)
    client_id = '', # Add application (client) ID of your service principal (Overview page)
    client_secret = '' # Add client secret of your service principal (create on Certificates & secrets page)
)

# Note: Ideally we would store the secret in Azure Key Vault. We will have it here for demo purposes only!

# Create a client to connect to your service
client = PurviewClient(
    account_name = '', # Add name of your purview resource (see Properties page)
    authentication = auth
)

## Creating our first asset
Files, SQL tables and partitioned datasets are all referred to within Purview as assets. Note that Purview stores the metadata of these assets, not the actual data they contain. Assets are usually created when a resource is scanned by Purview, however, you can also create assets through the API.
With PyApacheAtlas you create an asset by first creating an `AtlasEntity` and then uploading it to Purview. During the creation of the entity, you specify the name, the type of the asset and its [qualified name](https://en.wikipedia.org/wiki/Fully_qualified_name). You can create your own type definition or select on of the built-in types, such as `azure_blob_object` or `azure_cosmosdb_database`. For now, you can just use the generic `DataSet` type definition.
In addition, you must provide a [GUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) as an arbitrary negative value. When you upload the entity, it will be assigned a valid GUID. 

In [None]:
from pyapacheatlas.core import AtlasEntity
import json

# Create a new entity
firstAsset = AtlasEntity(
    name = "MyFirstCustomAsset", 
    typeName = "DataSet", 
    qualified_name = "demo://MyFirstCustomAsset",
    guid = -1000
)

# Upload that entity with the client
upload_results = client.upload_entities(firstAsset)
print(json.dumps(upload_results, indent=2))

Check if your asset has been created. Navigate to your Purview Studio. Select **Browse assets** and then **Atlas Core** on the bottom. If everything worked, your asset will appear here.
![Our first custom asset in the Purview Studio](./img/check_first_asset.png)

## Creating custom data lineage
Purview can keep track of data lineage between assets, and you can create your own data lineage as well. To do so, first create two new assets. Then, connect them by creating an `AtlasProcess` and specify your assets as `input` and `output` respectively.

In [None]:
# Create two new entities
input01 = AtlasEntity(
    name = "Input01", 
    typeName = "DataSet", 
    qualified_name = "demo://input01",
    guid = -1001
)

output01 = AtlasEntity(
    name = "Output01", 
    typeName = "DataSet", 
    qualified_name = "demo://output01",
    guid = -1002
)

# Upload both entities with the client
upload_results = client.upload_entities([input01, output01])
print(json.dumps(upload_results, indent=2))

# Extract the assigned GUIDs from the upload results to be able to refer to our created assets
input01_guid = upload_results['guidAssignments']['-1001']
output01_guid = upload_results['guidAssignments']['-1002']

In [None]:
from pyapacheatlas.core import AtlasProcess

# Create custom lineage process
# In addition to the parameters we use when creating an AtlasEntity, we also specify the inputs and outputs
process01 = AtlasProcess(
    name = 'Process01',
    typeName = 'Process',
    qualified_name = 'demo://process01',
    inputs = [{'guid': input01_guid}],
    outputs = [{'guid': output01_guid}],
    guid = -1003
)

# Upload process with the client
upload_results = client.upload_entities(process01)
print(upload_results)

# Extract the assigned GUID from the upload result to be able to refer to our created process
process01_guid = upload_results['guidAssignments']['-1003']

Now, let's check your lineage in the Purview Studio. Refresh your assets page until you see **Process01**. After selecting the process, select the **Lineage** tab. You should now see a graph of your custom lineage.
![Custom lineage graph](./img/custom_lineage.png)

## Update the data lineage
When creating data lineage, you are not limited to just one input and output each. Next, update the data lineage and add another input.

In [None]:
# Create a new entity
input02 = AtlasEntity(
    name = 'Input02', 
    typeName = 'DataSet', 
    qualified_name = 'demo://input02',
    guid = -1004
)

# Create a dummy process to save your updates into
process01_update = AtlasProcess(
    name = 'Process01',
    typeName = 'Process',
    qualified_name = 'demo://process01',
    inputs = None,  # We will update this with .inputs below
    outputs = None, # Set to None so no update will occur
    guid = -1005
)

# Get the lineage process to update
process01 = client.get_entity(
    typeName="Process",
    guid=process01_guid
)["entities"][0]

In [None]:
# Get the list of existing outputs from the attributes.
existing_inputs = process01["attributes"]["inputs"]

# Add the new input to the process
process01_update.inputs = existing_inputs + [input02]

# Upload the new input and the updated process
upload_results = client.upload_entities([process01_update, input02])
print(json.dumps(upload_results, indent=2))

If you refresh the lineage of **Process01** in the Purview Portal, you should now see that the lineage includes two inputs.
![Updated lineage](./img/updated_lineage.png)

## Deleting assets and processes
Now, clean up all of the custom assets you created. You can delete individual assets by passing their GUID to `client.delete_entity`. However, if you don't want to look for each asset's GUID, you can also delete them based on their qualified name.

In [None]:
# Query for your custom assets by name
assets = client.get_entity(
    qualifiedName=['demo://MyFirstCustomAsset', 'demo://input01', 'demo://input02', 'demo://output01'],
    typeName='DataSet'
).get('entities')

# Query for your lineage process
processes = client.get_entity(
    qualifiedName=['demo://process01'],
    typeName='Process'
).get('entities')

# Iterate over each entity in both the assets and processes and delete each
for entity in assets + processes:
    guid = entity['guid']
    delete_response = client.delete_entity(guid=guid)
    print(json.dumps(delete_response, indent=2))

Congratulations, you have now created your first custom assets and lineage in Azure Purview! You can find more code samples on the [PyApacheAtlas Github](https://github.com/wjohnson/pyapacheatlas/tree/master/samples).