# Downloading assets from the language data space

In this notebook we will show how to download assets from the language data space. In this case, the assests are already available in the data space, and we will negotiate a contract, transfer, and download a asset.

First we need to install the Keycloak Python client library.

In [1]:
!pip install python-keycloak



Here we input our credentials, including the One-Time Password (OTP).

In [2]:
USER = ""
PASSWORD = ""
OTP = ""

We log into the language data space using Keycloak, and this will return an access token.

In [3]:
from keycloak import KeycloakOpenID

keycloak_openid = KeycloakOpenID(
    server_url="https://auth.ds.inesdata-project.eu/",
    client_id="dataspace-users",
    realm_name="language"
)

token = keycloak_openid.token(USER, PASSWORD, totp=OTP)

Acess token has to be constantly renewed, so we will use this function to check if it has expired, and in if so, refresh it.

In [4]:
from time import time

expiry = time() + token['expires_in']

def get_valid_token():
    global token, expiry
    if time() >= expiry:
        token = keycloak_openid.refresh_token(token['refresh_token'])
        expiry = time() + token['expires_in']
    return token['access_token']

In our case, we have used credentials for the "conn-consumer" connector, we will make the first call to query the federated catalog and check all the available assets in the data space.

In [5]:
import requests
import json

url = "https://conn-consumer-language.ds.inesdata-project.eu/federatedcatalog/v1alpha/catalog/query"

payload = json.dumps({
  "@context": {
    "@vocab": "https://w3id.org/edc/v0.0.1/ns/"
  },
  "operandLeft": "",
  "operandRight": "",
  "operator": "",
  "Criterion": ""
})

headers = {
  'Authorization': f'Bearer {get_valid_token()}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(json.dumps(response.json(),indent=2))

[
  {
    "@id": "73aff7d5-9c50-4251-83d6-897732aba8ca",
    "@type": "dcat:Catalog",
    "dcat:dataset": [],
    "dcat:catalog": [],
    "dcat:distribution": [],
    "dcat:service": {
      "@id": "8e3c4b2b-b5d1-43f5-8917-6d558601487f",
      "@type": "dcat:DataService",
      "dcat:endpointDescription": "dspace:connector",
      "dcat:endpointUrl": "https://conn-language-iic-language.ds.inesdata-project.eu/protocol",
      "dcat:endpointURL": "https://conn-language-iic-language.ds.inesdata-project.eu/protocol"
    },
    "dspace:participantId": "conn-language-iic",
    "originator": "https://conn-language-iic-language.ds.inesdata-project.eu/protocol",
    "@context": {
      "@vocab": "https://w3id.org/edc/v0.0.1/ns/",
      "edc": "https://w3id.org/edc/v0.0.1/ns/",
      "odrl": "http://www.w3.org/ns/odrl/2/",
      "dcat": "http://www.w3.org/ns/dcat#",
      "dct": "http://purl.org/dc/terms/",
      "dspace": "https://w3id.org/dspace/v0.8/"
    }
  },
  {
    "@id": "0d3976df-fba6-

We can see that the are different connectors offering different assets. For this notebook we will download an asset of type "machineLearning", specifically we will download the asset with id "llama-2-7b-insurance-5-sft".

In [6]:
url = "https://conn-consumer-language.ds.inesdata-project.eu/management/federatedcatalog/request"

payload = json.dumps({
  "@context": {
    "@vocab": "https://w3id.org/edc/v0.0.1/ns/"
  },
  "offset": 0,
  "limit": 100,
  "filterExpression": [{"operandLeft":"genericSearch","operator":"=","operandRight":"llama-2-7b-insurance-5-sft"}]
  
})

headers = {
  'Authorization': f'Bearer {get_valid_token()}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)
filtered_catalog = response.json()
print(json.dumps(filtered_catalog,indent=2))

[
  {
    "@id": "1b8d3b8f-bc99-4a05-888b-359784f76989",
    "@type": "http://www.w3.org/ns/dcat#Catalog",
    "http://www.w3.org/ns/dcat#catalog": [],
    "http://www.w3.org/ns/dcat#distribution": [],
    "http://www.w3.org/ns/dcat#service": {
      "@id": "5484651a-4417-48b9-9bd3-83c86f1b9952",
      "@type": "http://www.w3.org/ns/dcat#DataService",
      "http://www.w3.org/ns/dcat#endpointDescription": "dspace:connector",
      "http://www.w3.org/ns/dcat#endpointUrl": "https://conn-expert-language.ds.inesdata-project.eu/protocol",
      "http://www.w3.org/ns/dcat#endpointURL": "https://conn-expert-language.ds.inesdata-project.eu/protocol"
    },
    "https://w3id.org/dspace/v0.8/participantId": "conn-expert",
    "originator": "https://conn-expert-language.ds.inesdata-project.eu/protocol",
    "http://www.w3.org/ns/dcat#dataset": {
      "@id": "llama-2-7b-insurance-5-sft",
      "@type": "http://www.w3.org/ns/dcat#Dataset",
      "http://www.w3.org/ns/dcat#distribution": [
        

We can see that the asset has a policy, let's negotiate the contract for this asset.

In [8]:
url = "https://conn-consumer-language.ds.inesdata-project.eu/management/v3/contractnegotiations"

payload = json.dumps({
  "@context": {
    "@vocab": "https://w3id.org/edc/v0.0.1/ns/",
    "odrl": "http://www.w3.org/ns/odrl.jsonld"
  },
  "@type": "ContractRequest",
  "counterPartyAddress": filtered_catalog[0]["http://www.w3.org/ns/dcat#dataset"]['http://www.w3.org/ns/dcat#distribution'][0]['http://www.w3.org/ns/dcat#accessService']['http://www.w3.org/ns/dcat#endpointUrl'],
  "protocol": "dataspace-protocol-http",
  "policy": {
      "@context": "http://www.w3.org/ns/odrl.jsonld",
    **filtered_catalog[0]["http://www.w3.org/ns/dcat#dataset"]['odrl:hasPolicy']['offer'],
    "assigner": filtered_catalog[0]["http://www.w3.org/ns/dcat#dataset"]['participantId'],
    "target": filtered_catalog[0]["http://www.w3.org/ns/dcat#dataset"]['@id']
  }
})
headers = {
  'Authorization': f'Bearer {get_valid_token()}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)


{"@type":"IdResponse","@id":"787d61e1-4b43-4ca9-beb3-0c2eba2c1b70","createdAt":1750844605140,"@context":{"@vocab":"https://w3id.org/edc/v0.0.1/ns/","edc":"https://w3id.org/edc/v0.0.1/ns/","odrl":"http://www.w3.org/ns/odrl/2/"}}


We can check the contract negotiation status.

In [9]:
url = f"https://conn-consumer-language.ds.inesdata-project.eu/management/v3/contractnegotiations/{response.json()['@id']}"

payload = ""
headers = {
  'Authorization':  f'Bearer {get_valid_token()}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


{"@type":"ContractNegotiation","@id":"787d61e1-4b43-4ca9-beb3-0c2eba2c1b70","type":"CONSUMER","protocol":"dataspace-protocol-http","state":"FINALIZED","counterPartyId":"conn-expert","counterPartyAddress":"https://conn-expert-language.ds.inesdata-project.eu/protocol","callbackAddresses":[],"createdAt":1750844605140,"contractAgreementId":"248a2293-34ea-4f6a-8d29-98397c8aa367","@context":{"@vocab":"https://w3id.org/edc/v0.0.1/ns/","edc":"https://w3id.org/edc/v0.0.1/ns/","odrl":"http://www.w3.org/ns/odrl/2/"}}


We can now start the transfer.

In [10]:
url = "https://conn-consumer-language.ds.inesdata-project.eu/management/v3/transferprocesses"

payload = json.dumps({
  "@context": {
    "@vocab": "https://w3id.org/edc/v0.0.1/ns/"
  },
  "@type": "TransferRequestDto",
  "connectorId": "conn-consumer-language",
  "counterPartyAddress": response.json()["counterPartyAddress"],
  "contractId":response.json()["contractAgreementId"],
  "assetId": filtered_catalog[0]["http://www.w3.org/ns/dcat#dataset"]['@id'],
  "protocol": "dataspace-protocol-http",
  "transferType": "HttpData-PULL"
})
headers = {
  'Authorization': f'Bearer {get_valid_token()}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

transferId = response.json()['@id']


{"@type":"IdResponse","@id":"5b32fa23-cad7-47ce-ba5d-5c9268ffc4e5","createdAt":1750844829782,"@context":{"@vocab":"https://w3id.org/edc/v0.0.1/ns/","edc":"https://w3id.org/edc/v0.0.1/ns/","odrl":"http://www.w3.org/ns/odrl/2/"}}


Let's check the transfer status.

In [11]:
url = f"https://conn-consumer-language.ds.inesdata-project.eu/management/v3/transferprocesses/{transferId}"

payload = ""
headers = {
  'Authorization':  f'Bearer {get_valid_token()}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


{"@id":"5b32fa23-cad7-47ce-ba5d-5c9268ffc4e5","@type":"TransferProcess","state":"STARTED","stateTimestamp":1750844832329,"type":"CONSUMER","callbackAddresses":[],"correlationId":"b374c40d-2a82-4907-aed8-629e83096365","assetId":"llama-2-7b-insurance-5-sft","contractId":"248a2293-34ea-4f6a-8d29-98397c8aa367","transferType":"HttpData-PULL","@context":{"@vocab":"https://w3id.org/edc/v0.0.1/ns/","edc":"https://w3id.org/edc/v0.0.1/ns/","odrl":"http://www.w3.org/ns/odrl/2/"}}


We see that the transfer process has started. We can get the transfer endpoint.

In [None]:
url = f"https://conn-consumer-language.ds.inesdata-project.eu/management/v3/edrs/{transferId}/dataaddress"

payload = ""
headers = {
  'Authorization': f'Bearer {get_valid_token()}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)


Now we have everything to download the asset, the endpoint and the authorization token. We download it as a zip file.

In [14]:
url = response.json()["endpoint"]

payload = ""
headers = {
  'Authorization': response.json()["authorization"]
}

r = requests.request("GET", url, headers=headers, data=payload, stream=True)

with open("download.zip", 'wb') as fd:
    for chunk in r.iter_content(chunk_size=1024):
        fd.write(chunk)


Once downloaded, the file can be extracted and the model can be used.

In [131]:
!unzip download.zip

Archive:  download.zip
   creating: llama-2-7b-insurance-5-sft/
  inflating: llama-2-7b-insurance-5-sft/adapter_model.safetensors  
  inflating: llama-2-7b-insurance-5-sft/tokenizer.json  
  inflating: llama-2-7b-insurance-5-sft/adapter_config.json  
  inflating: llama-2-7b-insurance-5-sft/training_args.bin  
  inflating: llama-2-7b-insurance-5-sft/README.md  
  inflating: llama-2-7b-insurance-5-sft/tokenizer_config.json  
  inflating: llama-2-7b-insurance-5-sft/special_tokens_map.json  
