## Read the Cognite Learn content before running code examples.

## 1. Environment Set Up

### Install the Cognite SDK package

If you recieve the errors:

`ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.`

`ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.`

You can disregard them and do not need to click "Restart Runtime".

In [None]:
!pip install cognite-sdk --quiet

[K     |████████████████████████████████| 186 kB 5.1 MB/s 
[K     |████████████████████████████████| 1.0 MB 55.7 MB/s 
[K     |████████████████████████████████| 16.7 MB 306 kB/s 
[K     |████████████████████████████████| 6.3 MB 41.0 MB/s 
[?25h

###Install the MSAL




In [None]:
!pip install msal --quiet

[K     |████████████████████████████████| 82 kB 564 kB/s 
[K     |████████████████████████████████| 4.1 MB 11.5 MB/s 
[?25h

### Connect to Cognite Data Fusion
Import all the needed libraries and instantiate the Cognite Client.

This client object is how all queries will be sent to the Cognite API to retrieve data.

For successfully retreiving the  Cognite Client instance, you need to authenicate msal on your browser. Run the cell bellow, copy the code from output and enter it on the [site](https://microsoft.com/devicelogin) and authorize the Learn-participants application. If everything completes successfully you will get the output with the token description.

In [None]:
from cognite.client import CogniteClient
from msal import PublicClientApplication
# Contact Project Administrator to get these
TENANT_ID = "48d5043c-cf70-4c49-881c-c638f5796997"
CLIENT_ID = "944fd93f-6705-49fe-8ba4-6f912d63ef5b"
CLIENT_SECRET = "" # Enter secret
CDF_CLUSTER = "westeurope-1" # api, westeurope-1 etc
COGNITE_PROJECT = "learn"
SCOPES = [f'https://{CDF_CLUSTER}.cognitedata.com/.default']
AUTHORITY_HOST_URI = 'https://login.microsoftonline.com'
AUTHORITY_URI = AUTHORITY_HOST_URI + '/' + TENANT_ID
app = PublicClientApplication(client_id=CLIENT_ID, authority=AUTHORITY_URI)

def authenticate_device_code(app):
  # Firstly, check the cache to see if this end user has signed in before
  accounts = app.get_accounts()
  if accounts:
    creds = app.acquire_token_silent(SCOPES, account=accounts[0])
  else:
    device_flow = app.initiate_device_flow(scopes=SCOPES)
    print(device_flow['message']) # print device code to screen
    creds = app.acquire_token_by_device_flow(flow=device_flow)
  return creds

def get_token():
  return authenticate_device_code(app)['access_token']

client = CogniteClient(
  token_url=f'{AUTHORITY_URI}/v2.0',
  token=get_token,
  token_client_id=CLIENT_ID,
  project=COGNITE_PROJECT,
  base_url=f'https://{CDF_CLUSTER}.cognitedata.com',
  client_name='cognite-python-dev',
)

print(client.iam.token.inspect().projects)

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DTSVKXGYJ to authenticate.
[{
    "url_name": "learn-cdf",
    "groups": [
        3815372601471309,
        7438730174258846
    ]
}, {
    "url_name": "learn",
    "groups": [
        3380746474616222,
        3705350280326987
    ]
}, {
    "url_name": "ds-basics",
    "groups": [
        447085316193861
    ]
}, {
    "url_name": "ds-cognitefunctions",
    "groups": []
}]


Set the prefix for recources names.

In [None]:
PREFIX = "" # enter your name and birth year
DATASET_NAME = f"{PREFIX}-ds"
DATABASE_NAME = f"{PREFIX}-db"
ASSET_NAME = f"{PREFIX}-root"

### Create credentials for use in Transformations

To run a transformation, we should set up OIDC credentials for the source and destination projects. It could be different CDF projects, but we only have one in our case.

In [None]:
from cognite.client.data_classes import OidcCredentials

creds = OidcCredentials(
        client_id = CLIENT_ID,
        client_secret = CLIENT_SECRET,
        scopes = " ".join(SCOPES),
        token_uri = f"{AUTHORITY_URI}/oauth2/v2.0/token",
        cdf_project_name = COGNITE_PROJECT
)

## 2. Prepare RAW resources

### Data description

We have three tables with the weather data from [The Norwegian Meteorological Institute](https://www.met.no/). You can get the API key on the site and play more with different kinds of meteorologic data from the site. 

sources.csv - the list of [sources](https://frost.met.no/api.html#/sources) (Sensor Systems) with geospatial metadata,

elements.csv - the metadata about the weather and climate [elements](https://frost.met.no/api.html#!/elements/getElements) that are defined for use in the API,

observations.csv - the list of [observations](https://frost.met.no/api.html#!/observations/observations) for the particular sources and limited date range.

All the data from [FrostAPI](https://frost.met.no/index.html) were saved in CSV tables and provided for the Transformation via SDK course.

### Prepare database

Create database for RAW tables

In [None]:
client.raw.databases.create(DATABASE_NAME)

Unnamed: 0,value
name,roman1984-db
created_time,1660142953851


List databases and check out our database

In [None]:
databases = client.raw.databases.list(limit=-1)
for db in databases:
  if db.name == DATABASE_NAME:
    print(db)

{
    "name": "roman1984-db",
    "created_time": "2022-08-10 14:49:13"
}


### Prepare tables

Create tables

In [None]:
client.raw.tables.create(DATABASE_NAME, "sources")
client.raw.tables.create(DATABASE_NAME, "elements")
client.raw.tables.create(DATABASE_NAME, "observations")

Unnamed: 0,value
name,observations
created_time,1660142977954


List tables, we should see 3 our tables in the database

In [None]:
client.raw.tables.list(db_name=DATABASE_NAME)

Unnamed: 0,name,created_time
0,sources,1660142977530
1,observations,1660142977954
2,elements,1660142977749


### Import rows

Upload CSV files to pandas DataFrames. Before running the cell upload files to the Colab environment. 

In [None]:
import pandas as pd
sources_df = pd.read_csv('sources.csv', index_col=0).fillna('')
elements_df = pd.read_csv('elements.csv', index_col=0).fillna('')
observations_df = pd.read_csv('observations.csv', index_col=0).fillna('')

Check out DataFrames. Run the cells below to display the first five rows in each data frame and check which data you have in each table.

In [None]:
sources_df.head()

Unnamed: 0,@type,id,name,shortName,country,countryCode,geometry,masl,validFrom,county,countyId,municipality,municipalityId,stationHolders,externalIds,wigosId,wmoId,icaoCodes,shipCodes
0,SensorSystem,SN47230,ÅKRA UNGDOMSSKOLE,Åkra,Norge,NO,"{'@type': 'Point', 'coordinates': [5.1963, 59....",18.0,2013-10-29T00:00:00.000Z,ROGALAND,11.0,KARMØY,1149.0,['KARMØY KOMMUNE'],['506131077'],0-578-0-47230,,,
1,SensorSystem,SN23670,E16 RYFOSS,E16 Ryfoss,Norge,NO,"{'@type': 'Point', 'coordinates': [8.8175, 61....",406.0,2018-01-23T00:00:00.000Z,INNLANDET,34.0,VANG,3454.0,['STATENS VEGVESEN'],"['1755', '3000021']",0-578-0-23670,,,
2,SensorSystem,SN59450,STADLANDET,Stadlandet,Norge,NO,"{'@type': 'Point', 'coordinates': [5.2115, 62....",75.0,1923-01-01T00:00:00.000Z,VESTLAND,46.0,STAD,4649.0,['MET.NO'],['10.249.1.80'],0-578-0-59450,,,
3,SensorSystem,SN12590,E6 MJØSBRUA,E6 Mjøsbrua,Norge,NO,"{'@type': 'Point', 'coordinates': [10.6725, 60...",128.0,2011-01-01T00:00:00.000Z,INNLANDET,34.0,RINGSAKER,3411.0,['STATENS VEGVESEN'],"['149', '429003']",0-578-0-12590,,,
4,SensorSystem,SN26640,E134 DARBU,E134 Darbu,Norge,NO,"{'@type': 'Point', 'coordinates': [9.7773, 59....",155.0,2016-04-19T00:00:00.000Z,VIKEN,30.0,ØVRE EIKER,3048.0,['STATENS VEGVESEN'],"['1645', '629024']",0-578-0-26640,,,


In [None]:
elements_df.head()

Unnamed: 0,id,name,description,unit,status,calculationMethod,category,oldConvention,cfConvention,timeOffsets,sensorLevels,codeTable
0,accumulated(precipitation_amount),Precipitation in gauge,Total precipitation amount in gauge (accumulat...,mm,CF-name,"{'baseName': 'precipitation_amount', 'method':...",Precipitation,"{'elementCodes': ['RA', 'RACPLUV_ALGOR_1', 'RA...","{'standardName': 'precipitation_amount', 'cell...",,,
1,air_pressure,Air pressure as measured at sensor height,Air pressure as measured at sensor height with...,hPa,CF-name,{'baseName': 'air_pressure'},Air Pressure,"{'elementCodes': ['PA'], 'unit': 'hPa'}","{'standardName': 'air_pressure', 'unit': 'hPa'...",{'values': ['PT0H']},,
2,air_pressure_at_sea_level,Air pressure at sea level,Air pressure reduced to mean sea level. The pa...,hPa,CF-name,{'baseName': 'air_pressure_at_sea_level'},Air pressure,"{'elementCodes': ['PR', 'X1PR'], 'unit': 'hPa'}","{'standardName': 'air_pressure_at_sea_level', ...",,,
3,air_pressure_at_sea_level_qnh,Air pressure (QNH),Air pressure reduced to sea level by applying ...,hPa,METNO-name,{'baseName': 'air_pressure_at_sea_level_qnh'},Air pressure,"{'elementCodes': ['PH', 'QNH'], 'unit': 'hPa'}",,,,
4,air_temperature,Air temperature,"Air temperature (default 2 m above ground), pr...",degC,CF-name,{'baseName': 'air_temperature'},Temperature,"{'elementCodes': ['TA', 'TA0050', 'TA1', 'TA10...","{'standardName': 'air_temperature', 'unit': 'K...","{'values': ['PT0H', 'PT18H', 'PT20M']}","{'levelType': 'height_above_ground', 'unit': '...",


In [None]:
observations_df.head()

Unnamed: 0,sourceId,referenceTime,observations
0,SN10380:0,2022-07-01T00:00:00.000Z,"{'elementId': 'air_temperature', 'value': 14.1..."
1,SN10380:0,2022-07-01T01:00:00.000Z,"{'elementId': 'air_temperature', 'value': 12.6..."
2,SN10380:0,2022-07-01T02:00:00.000Z,"{'elementId': 'air_temperature', 'value': 11.6..."
3,SN10380:0,2022-07-01T03:00:00.000Z,"{'elementId': 'air_temperature', 'value': 12.3..."
4,SN10380:0,2022-07-01T04:00:00.000Z,"{'elementId': 'air_temperature', 'value': 13.9..."


Insert dataframes into RAW tables

In [None]:
client.raw.rows.insert_dataframe(DATABASE_NAME, "sources", sources_df)
client.raw.rows.insert_dataframe(DATABASE_NAME, "elements", elements_df)
client.raw.rows.insert_dataframe(DATABASE_NAME, "observations", observations_df)

### Create dataset

In [None]:
from cognite.client.data_classes import DataSet
client.data_sets.create(DataSet(external_id=DATASET_NAME, name=DATASET_NAME))

In [None]:
DATASET_ID = client.data_sets.retrieve_multiple(external_ids=[DATASET_NAME])[0].id

### Make a root asset

Organising our assets makes a root asset, which will be a parent for other created assets.

In [None]:
from cognite.client.data_classes import Asset
client.assets.create(Asset(external_id=ASSET_NAME, name=ASSET_NAME, data_set_id=DATASET_ID))

Unnamed: 0,value
external_id,roman1984-root
name,roman1984-root
data_set_id,38334871179642
id,51713001985853
created_time,1660143025983
last_updated_time,1660143025983
root_id,51713001985853


## Transformations

### Create and debug queries

#### Create an asset query


Let's take a look at the sources table and make a query to create assets. To create a transformation, you should set the destination type. Each destination type has requested and optional columns. You can check the schema for a particular destination type by running the following code.

In [None]:
from cognite.client.data_classes import TransformationDestination
client.transformations.schema.retrieve(destination=TransformationDestination.asset_hierarchy())

Unnamed: 0,name,sql_type,type,nullable
0,externalId,STRING,<cognite.client.data_classes.transformations.s...,False
1,parentExternalId,STRING,<cognite.client.data_classes.transformations.s...,False
2,source,STRING,<cognite.client.data_classes.transformations.s...,True
3,name,STRING,<cognite.client.data_classes.transformations.s...,False
4,description,STRING,<cognite.client.data_classes.transformations.s...,True
5,metadata,"MAP<STRING, STRING>",<cognite.client.data_classes.transformations.s...,True
6,dataSetId,BIGINT,<cognite.client.data_classes.transformations.s...,True
7,labels,ARRAY<STRING>,<cognite.client.data_classes.transformations.s...,True


Considering the required and optional columns, we compose our query.

In [None]:
asset_query = f'''with 
countries as
(select 
  distinct country as name,
  concat('{PREFIX}-', country) as externalId,
  '{ASSET_NAME}' as parentExternalId,
  {DATASET_ID} as dataSetId
from `{DATABASE_NAME}`.sources
where country <> ''),
  
counties as
(select
  county as name,
  concat('{PREFIX}-', 'county-', county) as externalId,
  concat('{PREFIX}-', first(country)) as parentExternalId,
  {DATASET_ID} as dataSetId
from `{DATABASE_NAME}`.sources
where county <> ''
group by county),

municipalities as
(select
  municipality as name,
  concat('{PREFIX}-', 'municipality-', municipality) as externalId,
  (
    case
    	when first(county) = '' then concat('{PREFIX}-', first(country))
    	else concat('{PREFIX}-', 'county-', first(county))
	end
  ) as parentExternalId,
  {DATASET_ID} as dataSetId
from `{DATABASE_NAME}`.sources
where municipality <> ''
group by municipality),

sensors as
(select
  name as name,
  concat('{PREFIX}-', id) as externalId,
  (
    case
    	when municipality <> '' then concat('{PREFIX}-', 'municipality-', municipality)
  		when county <> '' then concat('{PREFIX}-', 'county-', county)
    	else concat('{PREFIX}-', country)
	end
  ) as parentExternalId,
  {DATASET_ID} as dataSetId,
  to_metadata(*) as metadata
from `{DATABASE_NAME}`.sources
)
  
select * from sensors
union all select *, null as metadata from countries
union all select *, null as metadata from counties
union all select *, null as metadata from municipalities'''

Query explanation: 

As you can see in the source table, we have different source attributes, such as country, county, and municipality. Of course, we can model this as a flat structure of Sensor Systems, but it is way more convenient to use hierarchical structures. 

We use the _with_ clause to define different levels of our assets. For example, we select distinct values from the _country_ column to get all the countries. Name and external id are the same in this case and we use the root asset as a parent. 

```
countries as
(select 
  distinct country as name,
  country as externalId,
  '{ASSET_NAME}' as parentExternalId
from `{DATABASE_NAME}`.sources
where country <> '')
```
To get the list of counties, we use the _group by_ statement. We don't use the rows with empty county fields and use the country as a parent asset.

```
counties as
(select
  the county as name,
  concat('county-', county) as externalId,
  first(country) as parentExternalId
from `{DATABASE_NAME}`.sources
where county <> ''
group by county)
```

To get the municipalities we use the _group by_ statement and skip the lines without municipalities. To resolve the ParentExternalId field we use the _case_ statement. If there is a county for that municipality then we use that as a parent otherwise we use the coutry.

```
municipalities as
(select
  municipality as name,
  concat('municipality-', municipality) as externalId,
  (
    case
    	when first(county) = '' then first(country)
    	else concat('county-', first(county))
	end
  ) as parentExternalId
from `{DATABASE_NAME}`.sources
where municipality <> ''
group by municipality),
```
Each row in the table represents the sensor. So we use the _case_ statement to get the parentExternalId. If the row has a municipality, it is a parent; it could also be a county or country. Also, we use the _to\_metadata_ function to create a column with mappings from all the original columns.
```
sensors as
(select
  name as name,
  id as externalId,
  (
    case
    	when municipality <> '' then concat('municipality-', municipality)
  		when county <> '' then concat('county-', county)
    	else country
	end
  ) as parentExternalId,
  to_metadata(*) as metadata
from `{DATABASE_NAME}`.sources
)
```
In the end, we combine all the assets using the _union all_ statement. And since we only have metadata for sensors, we add the null columns for other levels. 

```
select * from sensors
union all select *, null as metadata from countries
union all select *, null as metadata from counties
union all select *, null as metadata from municipalities
```

To debug the transformation via SDK, you can use the _preview_ method as shown below.

In [None]:
transformation_preview = client.transformations.preview(asset_query)
preview_df = pd.DataFrame(transformation_preview.results)
preview_df.head()

Unnamed: 0,name,externalId,parentExternalId,dataSetId,metadata
0,LÅNGTRÄSK D,roman1984-SN214990,roman1984-municipality-VÄSTERBOTTENS LÄN,38334871179642,"{'@type': 'SensorSystem', 'country': 'Sverige'..."
1,RV7 RALLERUD,roman1984-SN24240,roman1984-municipality-RINGERIKE,38334871179642,"{'@type': 'SensorSystem', 'country': 'Norge', ..."
2,RV15 BREIDALEN,roman1984-SN15950,roman1984-municipality-SKJÅK,38334871179642,"{'@type': 'SensorSystem', 'country': 'Norge', ..."
3,KIRKENES LUFTHAVN,roman1984-SN99370,roman1984-municipality-SØR-VARANGER,38334871179642,"{'@type': 'SensorSystem', 'country': 'Norge', ..."
4,NORRKOPING,roman1984-SN257400,roman1984-Sverige,38334871179642,"{'@type': 'SensorSystem', 'country': 'Sverige'..."


### Create and run asset transformation job

Create the transformation using OIDC credentials defined above and _asset\_query_. The fields _name_ and _external\_id_ are mandatory when you create a transformation.

In [None]:
from cognite.client.data_classes import Transformation, TransformationDestination, OidcCredentials

transformations = [Transformation(name=f"{PREFIX}-assets",
                                  external_id=f"{PREFIX}-assets",
                                  source_oidc_credentials=creds,
                                  destination_oidc_credentials=creds, 
                                  destination=TransformationDestination.asset_hierarchy(), 
                                  conflict_mode="upsert", 
                                  query=asset_query)]

client.transformations.create(transformations)

Unnamed: 0,id,external_id,name,query,destination,conflict_mode,is_public,ignore_null_fields,has_source_api_key,has_destination_api_key,has_source_oidc_credentials,has_destination_oidc_credentials,created_time,last_updated_time,owner,owner_is_current_user
0,9801,roman1984-assets,roman1984-assets,with \ncountries as\n(select \n distinct coun...,<cognite.client.data_classes.transformations.c...,upsert,True,False,False,False,True,True,1660143091725,1660143091725,{'user': 'd5hkaona8xzavrty_7ppsz46nu7ji3il0vhf...,True


### Retrieve the asset transformation

To retrieve thge transformation you can use a several methods, the easiest one, knowing id or external_id is the retrieve method.

It has the following signature:
```
TransformationsAPI.retrieve(id: Optional[int] = None, external_id: Optional[str] = None) → Optional[cognite.client.data_classes.transformations.Transformation]
Retrieve a single transformation by id.

Parameters:	id (int, optional) – ID
Returns:	Requested transformation or None if it does not exist.
Return type:	Optional[Transformation]
```


In [None]:
asset_transformation = client.transformations.retrieve(external_id=f'{PREFIX}-assets')
asset_transformation

Unnamed: 0,value
id,9801
external_id,roman1984-assets
name,roman1984-assets
query,with \ncountries as\n(select \n distinct coun...
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
is_public,True
ignore_null_fields,False
has_source_api_key,False
has_destination_api_key,False


If we need to get several transformations then it's better to use retrieve_multiple. 

```
TransformationsAPI.retrieve_multiple(ids: Sequence[int] = None, external_ids: Sequence[str] = None, ignore_unknown_ids: bool = False) → cognite.client.data_classes.transformations.TransformationList
Retrieve multiple transformations.

Parameters:	
ids (List[int]) – List of ids to retrieve.
external_ids (List[str]) – List of external ids to retrieve.
ignore_unknown_ids (bool) – Ignore IDs and external IDs that are not found rather than throw an exception.
Returns:	
Requested transformation or None if it does not exist.

Return type:	
TransformationList
```

In [None]:
asset_transformation = client.transformations.retrieve_multiple(external_ids=[f'{PREFIX}-assets'])[0]
asset_transformation

Unnamed: 0,value
id,9801
external_id,roman1984-assets
name,roman1984-assets
query,with \ncountries as\n(select \n distinct coun...
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
is_public,True
ignore_null_fields,False
has_source_api_key,False
has_destination_api_key,False


Before the creation of the timeseries and datapoints transformations we should run the asset transformation. You can do it by id or external_id. Also, it's possible to set timeout and the flag if should wait until the transformation ends. The run method signature:
```
TransformationsAPI.run(transformation_id: int = None, transformation_external_id: str = None, wait: bool = True, timeout: Optional[float] = None) → cognite.client.data_classes.transformations.jobs.TransformationJob
Run a transformation.

Parameters:	
transformation_id (int) – Transformation internal id
transformation_external_id (str) – Transformation external id
wait (bool) – Wait until the transformation run is finished. Defaults to True.
timeout (Optional[float]) – maximum time (s) to wait, default is None (infinite time). Once the timeout is reached, it returns with the current status. Won’t have any effect if wait is False.
Returns:	
Created transformation job
```

In [None]:
client.transformations.run(asset_transformation.id, wait=False)

Unnamed: 0,value
id,7546441
status,Created
transformation_id,9801
transformation_external_id,roman1984-assets
source_project,learn
destination_project,learn
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
query,with \ncountries as\n(select \n distinct coun...
ignore_null_fields,False


After the running we can list the jobs to check the status, or use the object returning from the run method.

In [None]:
client.transformations.jobs.list(transformation_id=asset_transformation.id)

Unnamed: 0,id,status,transformation_id,transformation_external_id,source_project,destination_project,destination,conflict_mode,query,ignore_null_fields,created_time,started_time,finished_time,last_seen_time
0,7546441,Completed,9801,roman1984-assets,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with \ncountries as\n(select \n distinct coun...,False,1660654745468,1660654749624,1660654755296,1660654751499
1,7295117,Completed,9801,roman1984-assets,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with \ncountries as\n(select \n distinct coun...,False,1660143326494,1660143327682,1660143337338,1660143328354


A transformation object has a last_finished_job attribute which is convenient to check from API to be sure that your transformation was run and completed on time.

In [None]:
asset_transformation.last_finished_job

Unnamed: 0,value
id,7295117
status,Completed
transformation_id,9801
transformation_external_id,roman1984-assets
source_project,learn
destination_project,learn
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
query,with \ncountries as\n(select \n distinct coun...
ignore_null_fields,False


### Create a time series query

For the beginning, let's check which fields we need to create a time series. Retrive the schema of TransformationDestination.timeseries().

In [None]:
client.transformations.schema.retrieve(destination=TransformationDestination.timeseries())

Let's create a query taking into account the table data and schema fields.

In [None]:
ts_query = f'''with measurements as
(select
  split(sourceId, ":")[0] as sensor,  
  get_json_object(observations, "$.value") as temp,
  get_json_object(observations, "$.elementId") as measure_type,
  get_json_object(observations, "$.unit") as unit
from `{DATABASE_NAME}`.observations),

ts_data as
(select 
  unit, 
  sensor, 
  measure_type
from measurements
group by sensor, measure_type, unit),

assets as
(
select 
  * 
from 
  _cdf.assets
where
  dataSetId = {DATASET_ID}
)

select
  concat(sensor, '-', measure_type, '-', unit) as name,
  concat('{PREFIX}-', sensor, '-', measure_type, '-', unit) as externalId,
  unit as unit,
  assets.id as assetId,
  {DATASET_ID} as dataSetId
from ts_data
join assets on concat('{PREFIX}-', ts_data.sensor) = assets.externalId'''

Query explanation:
In the _observation_ table we only have 3 columns: sourceID, referenceTime and observations. The _observations_ column contains the JSON with data. The _sourceId_ values have such a template _\<sensor\_id\>:\<number of the sensor\>_.
We use the _with_ clause to create needed subqueries. 
The first one is _measurements_, we split the _sourceId_ values in two parts and only use _sensor\_id_ because we have the same values as external ids in assets. To extract the values from the _observations_ column we use the _get\_json\_object_ function. 
```
measurements as
(select
  split(sourceId, ":")[0] as sensor,  
  get_json_object(observations, "$.value") as temp,
  get_json_object(observations, "$.elementId") as measure_type,
  get_json_object(observations, "$.unit") as unit
from `{DATABASE_NAME}`.observations)
```
To create time series we should aggregate measurements data, we group it by sensor, measure\_type and unit. In our case, it's redundant since we only observe the air temperature in Celsius degrees in the subset of data from MET Norway, but it's good to have this feature for future measurements.
```
ts_data as
(select 
  unit, 
  sensor, 
  measure_type
from measurements
group by sensor, measure_type, unit)
```
We need to have a subquery of assets so we don't create a time series not connected to any asset. We query only assets from our data set.
```
assets as
(
select 
  * 
from 
  _cdf.assets
where
  dataSetId = {DATASET_ID}
)
```
Finally, we format the external id and set the name, unit, assetId and dataSetId to create a dataset.
```
select
  concat(sensor, '-', measure_type, '-', unit) as name,
  concat(sensor, '-', measure_type, '-', unit) as externalId,
  unit as unit,
  assets.id as assetId,
  {DATASET_ID} as dataSetId
from ts_data
join assets on ts_data.sensor = assets.externalId
```

In [None]:
tr = client.transformations.preview(ts_query, source_limit=500)
df = pd.DataFrame(tr.results)
df.head()

Unnamed: 0,name,externalId,unit,assetId,dataSetId
0,SN230800-air_temperature-degC,roman1984-SN230800-air_temperature-degC,degC,1440815695120849,38334871179642
1,SN218410-air_temperature-degC,roman1984-SN218410-air_temperature-degC,degC,1448174101736512,38334871179642
2,SN1615800-air_temperature-degC,roman1984-SN1615800-air_temperature-degC,degC,1224074457604222,38334871179642
3,SN13655-air_temperature-degC,roman1984-SN13655-air_temperature-degC,degC,163701295112207,38334871179642


### Create and run timeseries transformation job

We create and run the timeseries transformation in the same way we did before with the asset transformation.

In [None]:
from cognite.client.data_classes import Transformation, TransformationDestination, OidcCredentials

transformations = [Transformation(name=f"{PREFIX}-ts",
                                  external_id=f"{PREFIX}-ts",
                                  source_oidc_credentials=creds,
                                  destination_oidc_credentials=creds, 
                                  destination=TransformationDestination.timeseries(), 
                                  conflict_mode="upsert", 
                                  query=ts_query)]

client.transformations.create(transformations)

Unnamed: 0,id,external_id,name,query,destination,conflict_mode,is_public,ignore_null_fields,has_source_api_key,has_destination_api_key,has_source_oidc_credentials,has_destination_oidc_credentials,created_time,last_updated_time,owner,owner_is_current_user
0,9802,roman1984-ts,roman1984-ts,with measurements as\n(select\n split(sourceI...,<cognite.client.data_classes.transformations.c...,upsert,True,False,False,False,True,True,1660143100491,1660143100491,{'user': 'd5hkaona8xzavrty_7ppsz46nu7ji3il0vhf...,True


In [None]:
timeseries_transformation = client.transformations.retrieve(external_id=f'{PREFIX}-ts')

In [None]:
client.transformations.run(timeseries_transformation.id)

Unnamed: 0,value
id,7326656
status,Completed
transformation_id,9802
transformation_external_id,roman1984-ts
source_project,learn
destination_project,learn
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
query,with measurements as\n(select\n split(sourceI...
ignore_null_fields,False


Then we can check the transformation jobs list to the status of the running the timeseries transformation job.

In [None]:
client.transformations.jobs.list(transformation_id=timeseries_transformation.id)

### Create datapoints transformation query

Let's check out the schema of the datapoints destination. In CDF, you can use numeric and string data points. In our case, we want to have a numeric temperature value.

In [None]:
client.transformations.schema.retrieve(destination=TransformationDestination.datapoints())

Unnamed: 0,name,sql_type,type,nullable
0,id,BIGINT,<cognite.client.data_classes.transformations.s...,True
1,externalId,STRING,<cognite.client.data_classes.transformations.s...,True
2,timestamp,TIMESTAMP,<cognite.client.data_classes.transformations.s...,False
3,value,DOUBLE,<cognite.client.data_classes.transformations.s...,False


The query will be quite basic this time.

In [None]:
query_dp = f'''with measurements as
(select
  split(sourceId, ":")[0] as sensor,  
  cast(get_json_object(observations, "$.value") as double) as temperature,
  get_json_object(observations, "$.elementId") as measure_type,
  get_json_object(observations, "$.unit") as unit,
  to_timestamp(referenceTime) as reference_time
from `{DATABASE_NAME}`.observations)

select
  concat('{PREFIX}-', sensor, '-', measure_type, '-', unit) as externalId,
  reference_time as timestamp,
  temperature as value
from measurements'''

Query explanation: as we already have a measurement table we can use it directly to get the datapoints. But let's use the temporary table for clarity. We extract all the needed data from the JSON column 'observations'.

```
with measurements as
(select
  split(sourceId, ":")[0] as sensor,  
  cast(get_json_object(observations, "$.value") as double) as temperature,
  get_json_object(observations, "$.elementId") as measure_type,
  get_json_object(observations, "$.unit") as unit,
  to_timestamp(referenceTime) as reference_time
from `{DATABASE_NAME}`.observations)
```
And then we format our data to fit the schema requirements.
```
select
  concat('{PREFIX}-', sensor, '-', measure_type, '-', unit) as externalId,
  reference_time as timestamp,
  temperature as value
from measurements
```

You can run the query before the creation of the transformation in the same way we did for assets and time series to check if the result looks good.

In [None]:
tr = client.transformations.preview(query_dp)

In [None]:
df = pd.DataFrame(tr.results)
df.head()

Unnamed: 0,externalId,timestamp,value
0,SN13655-air_temperature-degC,2022-07-08 18:00:00,9.5
1,SN13655-air_temperature-degC,2022-07-08 21:00:00,7.0
2,SN13655-air_temperature-degC,2022-07-09 04:00:00,4.1
3,SN13655-air_temperature-degC,2022-07-09 07:00:00,6.7
4,SN13655-air_temperature-degC,2022-07-09 09:00:00,8.3


### Create and run datapoints transformation

To create and run the datapoints transformation we use all the same methods as before.

In [None]:
from cognite.client.data_classes import Transformation, TransformationDestination, OidcCredentials

transformations = [Transformation(name=f"{PREFIX}-dp",
                                  external_id=f"{PREFIX}-dp",
                                  source_oidc_credentials=creds,
                                  destination_oidc_credentials=creds, 
                                  destination=TransformationDestination.datapoints(), 
                                  conflict_mode="upsert", 
                                  query=query_dp)]

client.transformations.create(transformations)

Unnamed: 0,id,external_id,name,query,destination,conflict_mode,is_public,ignore_null_fields,has_source_api_key,has_destination_api_key,has_source_oidc_credentials,has_destination_oidc_credentials,created_time,last_updated_time,owner,owner_is_current_user
0,9803,roman1984-dp,roman1984-dp,with measurements as\n(select\n split(sourceI...,<cognite.client.data_classes.transformations.c...,upsert,True,False,False,False,True,True,1660143116350,1660143116350,{'user': 'd5hkaona8xzavrty_7ppsz46nu7ji3il0vhf...,True


In [None]:
datapoints_transformation = client.transformations.retrieve(external_id=f'{PREFIX}-dp')
client.transformations.run(datapoints_transformation.id)

Unnamed: 0,value
id,7588659
status,Completed
transformation_id,9803
transformation_external_id,roman1984-dp
source_project,learn
destination_project,learn
destination,<cognite.client.data_classes.transformations.c...
conflict_mode,upsert
query,with measurements as\n(select\n split(sourceI...
ignore_null_fields,False


Also we can use external id to get the list of the transformation jobs.

In [None]:
client.transformations.jobs.list(transformation_external_id=f'{PREFIX}-dp')

Unnamed: 0,id,status,transformation_id,transformation_external_id,source_project,destination_project,destination,conflict_mode,query,ignore_null_fields,created_time,started_time,finished_time,last_seen_time,error
0,7588659,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660743370805,1660743384712,1660743394853,1660743000000.0,
1,7330748,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660214211674,1660214213112,1660214223331,1660214000000.0,
2,7330710,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660214159502,1660214171533,1660214176986,,SQL Transformations error. Please report this ...
3,7330174,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660213049467,1660213054686,1660213067343,1660213000000.0,
4,7326709,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205992629,1660205993894,1660206002426,1660206000000.0,
5,7326706,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205918440,1660205918612,1660205918612,,A job already runs for this transform (it has ...
6,7326705,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205914831,1660205918190,1660205932143,1660206000000.0,
7,7326704,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205883939,1660205885267,1660205899236,,
8,7326666,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205849704,1660205852361,1660205867330,,
9,7326538,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,False,1660205682843,1660205686984,1660205688859,1660206000000.0,Request with id bfbf860f-6c17-9200-b592-4c8395...


### Cancel a transformation job

If your job is executed for too long you could cancel it having the job id. Let's create a transformation job and cancel it, then check the jobs list.

In [None]:
datapoints_job = client.transformations.run(datapoints_transformation.id, wait=False)
datapoints_job.cancel()
dp_jobs = client.transformations.jobs.list(transformation_external_id=f'{PREFIX}-dp')
dp_jobs

Unnamed: 0,id,status,transformation_id,transformation_external_id,source_project,destination_project,destination,conflict_mode,query,error,ignore_null_fields,created_time,started_time,finished_time,last_seen_time
0,7588664,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,Job cancelled by the user.,False,1660743452400,1660743452822,1660743452822,
1,7588660,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,Job cancelled by the user.,False,1660743398978,1660743399431,1660743400306,
2,7588659,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660743370805,1660743384712,1660743394853,1660743000000.0
3,7330748,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660214211674,1660214213112,1660214223331,1660214000000.0
4,7330710,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,SQL Transformations error. Please report this ...,False,1660214159502,1660214171533,1660214176986,
5,7330174,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660213049467,1660213054686,1660213067343,1660213000000.0
6,7326709,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660205992629,1660205993894,1660206002426,1660206000000.0
7,7326706,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,A job already runs for this transform (it has ...,False,1660205918440,1660205918612,1660205918612,
8,7326705,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660205914831,1660205918190,1660205932143,1660206000000.0
9,7326704,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660205883939,1660205885267,1660205899236,


You can also check the error message because the canceled job has the same status as the failed one.

In [None]:
dp_jobs[0].error

'Job cancelled by the user.'

There is also a possibility to cancel job by transformation id or external id because only one job could be running in a moment for every transformation.

In [None]:
datapoints_job = client.transformations.run(datapoints_transformation.id, wait=False)
client.transformations.cancel(transformation_external_id=f'{PREFIX}-dp')
dp_jobs = client.transformations.jobs.list(transformation_external_id=f'{PREFIX}-dp')
dp_jobs

Unnamed: 0,id,status,transformation_id,transformation_external_id,source_project,destination_project,destination,conflict_mode,query,error,ignore_null_fields,created_time,started_time,finished_time,last_seen_time
0,7588786,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,Job cancelled by the user.,False,1660743645982,1660743646466,1660743646466,
1,7588664,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,Job cancelled by the user.,False,1660743452400,1660743452822,1660743454572,
2,7588660,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,Job cancelled by the user.,False,1660743398978,1660743399431,1660743400306,
3,7588659,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660743370805,1660743384712,1660743394853,1660743000000.0
4,7330748,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660214211674,1660214213112,1660214223331,1660214000000.0
5,7330710,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,SQL Transformations error. Please report this ...,False,1660214159502,1660214171533,1660214176986,
6,7330174,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660213049467,1660213054686,1660213067343,1660213000000.0
7,7326709,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660205992629,1660205993894,1660206002426,1660206000000.0
8,7326706,Failed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,A job already runs for this transform (it has ...,False,1660205918440,1660205918612,1660205918612,
9,7326705,Completed,9803,roman1984-dp,learn,learn,<cognite.client.data_classes.transformations.c...,upsert,with measurements as\n(select\n split(sourceI...,,False,1660205914831,1660205918190,1660205932143,1660206000000.0


## Clean up resources

In [None]:
# Delete RAW tables
client.raw.tables.delete(db_name=DATABASE_NAME, name=['sources', 'elements', 'observations'])
# Delete RAW database
client.raw.databases.delete(name=DATABASE_NAME)
# Delete time series
tss = client.time_series.list(data_set_ids=[DATASET_ID], limit=-1)
client.time_series.delete([ts.id for ts in tss])
# Delete assets recursively
client.assets.delete(external_id=f"{PREFIX}-root", recursive=True)
# Delete transformations
client.transformations.delete(external_id=[f"{PREFIX}-assets", f"{PREFIX}-ts", f"{PREFIX}-dp"])