# Create Entity Matching pipelines with SDK-experimental demo

This notebook shows how to create entity matching pipelines. 

## Accessing CDF
This tutorial assumes you have some basic knowledge of CDF and the Python SDK. If not, please follow the 'lab' tutorials first.

For this tutorial the 'contextualization' tenant is used. 

## Import modules
We need to import some Python modules in order to interact with CDF. We will use the Python SDK with Experimental Extensions, which we below refer to as a client. 

In [None]:
from cognite.experimental import CogniteClient
from cognite.experimental.data_classes import EntityMatchingPipeline
from datetime import date
from getpass import getpass
import os
import requests

## Create a client

When you create the CogniteClient below, getpass will ask for your API key in an extra password field. Simply paste ypu publicdata API-key and press shift+enter.

In [None]:
project = "contextualization"
api_key = getpass("Please enter API-KEY: ")
client = CogniteClient(project=project, api_key=api_key, client_name="dshub")

## Create an entity matching pipeline

First define the pipeline

In [None]:
my_external_id = f"my_ds_hub_pipeline_{date.today()}"

my_pipeline = EntityMatchingPipeline(name="my_ds_hub_pipeline",
                                     description="Test pipeline created using dshub tutorial",
                                     external_id=my_external_id,
                                     sources= {'dataSetIds': [{'id': 4677214669402260}],
                                               'resource': 'timeseries'},
                                     targets= {'dataSetIds': [{'id': 1181171615083226}], 
                                                    'resource': 'assets'},
                                     model_parameters =  {'featureType': 'bigram',
                                                          'matchFields': [{'source': 'name', 'target': 'name'}]}
                                    )   

Then create the pipeline

In [None]:
res = client.entity_matching.pipelines.create(pipeline=my_pipeline)
my_pipeline_id = res.dump()["id"]

Retrive the pipeline

In [None]:
my_pipeline = client.entity_matching.pipelines.retrieve(id=my_pipeline_id)
my_pipeline

List all pipelines and find the one you created by filtering on the external_id

In [None]:
pipeline_list =  client.entity_matching.pipelines.list(limit=None)

In [None]:
my_pipeline_id = [pipeline["id"] for pipeline in pipeline_list.dump() 
                  if ("external_id" in pipeline.keys() and (my_external_id in pipeline["external_id"]))][0]

## Run the pipeline

In [None]:
res = client.entity_matching.pipelines.run(external_id=my_external_id)

### Look at the results

List all runs for the pipeline

In [None]:
client.entity_matching.pipelines.runs.list(id=my_pipeline_id)

List only last run

In [None]:
last_run = client.entity_matching.pipelines.runs.retrieve_latest(id=my_pipeline_id)
last_run_id = last_run.dump()['job_id']
last_run

In [None]:
# Not able to retrive the results from run with SDK -> using API
headers = {
    'Content-Type': 'application/json', 'API-key': api_key
}

In [None]:
url = f"https://api.cognitedata.com/api/playground/projects/{project}/context/entitymatching/pipelines/run/{last_run_id}"

response_get_pipeline_run = requests.get(url=url, headers=headers)
run_results = response_get_pipeline_run.json()

Look at the results

In [None]:
run_results["matches"]

In [None]:
run_results["generatedRules"]

## Update the pipeline with matches and rules

We looked through the first five matches and can confirm that these are correct. -> We want to update the pipeline with these as confirmed matches.

In [None]:
confirmed_matches = [{"sourceId": match["source"]["id"],
                      "targetId": match["target"]["id"]} for match in run_results["matches"][0:5]]
update_request_body = {
  "items": [
    {
      "update": {
        "confirmedMatches": {
            "set": confirmed_matches
        }
      },
      "id": my_pipeline_id
    }
  ]
}

In [None]:
url_update = f"https://api.cognitedata.com/api/playground/projects/{project}/context/entitymatching/pipelines/update"
response_update = requests.post(url=url_update, headers=headers, json=update_request_body)
response_update.json()

In [None]:
# We also want to confirm the two first of the generated rules

In [None]:
confirmed_rules = [{"extractors":rule['extractors'], 
                    "conditions":rule['conditions'],
                    "priority":rule['priority']} for rule in run_results["generatedRules"][0:2]]
update_request_body = {
  "items": [
    {
      "update": {
        "rules": {
            "set": confirmed_rules
        }
      },
      "id": my_pipeline_id
    }
  ]
}

In [None]:
response_update = requests.post(url=url_update, headers=headers, json=update_request_body)
response_update.json()

## Run the pipeline again and look at the new results

In [None]:
res = client.entity_matching.pipelines.run(external_id=my_external_id)

In [None]:
pipeline_run_id= res.dump()["job_id"]

In [None]:
url = f"https://api.cognitedata.com/api/playground/projects/{project}/context/entitymatching/pipelines/run/{pipeline_run_id}"

response_get_pipeline_run = requests.get(url=url, headers=headers)
run_results = response_get_pipeline_run.json()

In [None]:
# The matches we confirmed above have now 'matchType'='previously-confirmed' and score = 1.
# Matches created by one of the rules we confirmed have 'matchType': 'match-rules X' and score = 1.
run_results["matches"]

## Delete a pipeline

In [None]:
client.entity_matching.pipelines.delete(external_id=my_external_id)