Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
3f73738
added deploy script with uploading to given rclone remote
gg46ixav Jul 3, 2025
9edc0dc
added webdav-url argument
gg46ixav Jul 4, 2025
a56f01d
added deploying to the databus without upload to nextcloud
gg46ixav Jul 25, 2025
5fdf78b
Merge branch 'download-capabilities' into nextcloudclient
gg46ixav Oct 21, 2025
800256c
updated pyproject.toml and content-hash
gg46ixav Oct 21, 2025
66f1c8e
Merge branch 'main' into nextcloudclient
gg46ixav Oct 28, 2025
4259229
Merge remote-tracking branch 'origin/main' into nextcloudclient
gg46ixav Oct 28, 2025
b179f90
updated README.md
gg46ixav Oct 28, 2025
a504b9d
Merge remote-tracking branch 'origin/nextcloudclient' into nextcloudc…
gg46ixav Oct 28, 2025
0ce0c24
added checksum validation
gg46ixav Oct 28, 2025
6596cbc
updated upload_to_nextcloud function to accept list of source_paths
gg46ixav Oct 28, 2025
b9f9854
only add result if upload successful
gg46ixav Oct 28, 2025
2f8493d
use os.path.basename instead of .split("/")[-1]
gg46ixav Oct 28, 2025
07359cc
added __init__.py and updated README.md
gg46ixav Oct 28, 2025
8047968
changed append to extend (no nested list)
gg46ixav Oct 28, 2025
0172450
fixed windows separators and added rclone error message
gg46ixav Oct 28, 2025
f957512
moved deploy.py to cli upload_and_deploy
gg46ixav Nov 3, 2025
607f527
changed metadata to dict list
gg46ixav Nov 3, 2025
6cb7e11
removed python-dotenv
gg46ixav Nov 3, 2025
7651c31
small updates
gg46ixav Nov 3, 2025
df17a7c
refactored upload_and_deploy function
gg46ixav Nov 3, 2025
7492531
updated README.md
gg46ixav Nov 3, 2025
c985603
updated metadata_string for new metadata format
gg46ixav Nov 3, 2025
62a3611
updated README.md
gg46ixav Nov 3, 2025
22ac02f
updated README.md
gg46ixav Nov 3, 2025
3faaf4d
Changed context url back
gg46ixav Nov 3, 2025
5dfebe5
added check for known compressions
gg46ixav Nov 3, 2025
f9367c0
updated checksum to sha256
gg46ixav Nov 3, 2025
5d474db
updated README.md
gg46ixav Nov 3, 2025
bef78ef
size check
gg46ixav Nov 3, 2025
529f2ae
updated checksum validation
gg46ixav Nov 3, 2025
77dca5a
added doc
gg46ixav Nov 3, 2025
02b1873
- refactored deploy, upload_and_deploy and deploy_with_metadata to on…
gg46ixav Nov 4, 2025
04c0b6e
updated README.md
gg46ixav Nov 4, 2025
fb93bc9
fixed docstring
gg46ixav Nov 4, 2025
8e6167b
removed metadata.json
gg46ixav Nov 4, 2025
943e30b
moved COMPRESSION_EXTS out of loop
gg46ixav Nov 4, 2025
1274cbc
removed unnecessary f-strings
gg46ixav Nov 4, 2025
02481b3
set file_format and compression to None
gg46ixav Nov 4, 2025
a5ec24d
get file_format and compression from metadata file
gg46ixav Nov 4, 2025
f95155f
updated README.md
gg46ixav Nov 4, 2025
274f252
chores
Integer-Ctrl Nov 5, 2025
f22c71d
updated metadata format (removed filename - used url instead)
gg46ixav Nov 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 82 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,13 +163,25 @@ databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHER
databusclient deploy --help
```
```
Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...

Arguments:
DISTRIBUTIONS... distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the
download URL and CV the key=value pairs (_ separted)
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]
Flexible deploy to databus command:

- Classic dataset deployment

- Metadata-based deployment

- Upload & deploy via Nextcloud

Arguments:
DISTRIBUTIONS... Depending on mode:
- Classic mode: List of distributions in the form
URL|CV|fileext|compression|sha256sum:contentlength
(where URL is the download URL and CV the key=value pairs,
separated by underscores)
- Upload mode: List of local file or folder paths (must exist)
- Metdata mode: None

Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
Expand All @@ -179,24 +191,86 @@ Options:
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--metadata PATH Path to metadata JSON file (for metadata mode)
--webdav-url TEXT WebDAV URL (e.g.,
https://cloud.example.com/remote.php/webdav)
--remote TEXT rclone remote name (e.g., 'nextcloud')
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
--help Show this message and exit.

```
Examples of using deploy command
#### Examples of using deploy command
##### Mode 1: Classic Deploy (Distributions)
```
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

```
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

A few more notes for CLI usage:

* The content variants can be left out ONLY IF there is just one distribution
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`


##### Mode 2: Deploy with Metadata File

Use a JSON metadata file to define all distributions.
The metadata.json should list all distributions and their metadata.
All files referenced there will be registered on the Databus.
```bash
databusclient deploy \
--metadata /home/metadata.json \
--version-id https://databus.org/user/dataset/version/1.0 \
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"
```
Metadata file structure (file_format and compression are optional):
```json
[
{
"checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
"size": 12345,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
"file_format": "ttl"
},
{
"checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
"size": 54321,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
"file_format": "csv",
"compression": "gz"
}
]

```


##### Mode 3: Upload & Deploy via Nextcloud

Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus.
Rclone is required.

```bash
databusclient deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.org/user/dataset/version/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
./localfile1.ttl \
./data_folder
```


#### Authentication with vault

Expand All @@ -221,8 +295,8 @@ If using vault authentication, make sure the token file is available in the cont
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat
```

## Module Usage

## Module Usage
### Step 1: Create lists of distributions for the dataset

```python
Expand Down
74 changes: 64 additions & 10 deletions databusclient/cli.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#!/usr/bin/env python3
import json
import os

import click
from typing import List
from databusclient import client

from nextcloudclient import upload

@click.group()
def app():
Expand All @@ -22,18 +26,68 @@ def app():
@click.option("--description", required=True, help="Dataset description")
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
@click.option("--apikey", required=True, help="API key")
@click.argument(
"distributions",
nargs=-1,
required=True,
)
def deploy(version_id, title, abstract, description, license_url, apikey, distributions: List[str]):

@click.option("--metadata", "metadata_file", type=click.Path(exists=True),
help="Path to metadata JSON file (for metadata mode)")
@click.option("--webdav-url", "webdav_url", help="WebDAV URL (e.g., https://cloud.example.com/remote.php/webdav)")
@click.option("--remote", help="rclone remote name (e.g., 'nextcloud')")
@click.option("--path", help="Remote path on Nextcloud (e.g., 'datasets/mydataset')")

@click.argument("distributions", nargs=-1)
def deploy(version_id, title, abstract, description, license_url, apikey,
metadata_file, webdav_url, remote, path, distributions: List[str]):
"""
Deploy a dataset version with the provided metadata and distributions.
Flexible deploy to Databus command supporting three modes:\n
- Classic deploy (distributions as arguments)\n
- Metadata-based deploy (--metadata <file>)\n
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)
"""
click.echo(f"Deploying dataset version: {version_id}")
dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
client.deploy(dataid=dataid, api_key=apikey)

# Sanity checks for conflicting options
if metadata_file and any([distributions, webdav_url, remote, path]):
raise click.UsageError("Invalid combination: when using --metadata, do not provide --webdav-url, --remote, --path, or distributions.")
if any([webdav_url, remote, path]) and not all([webdav_url, remote, path]):
raise click.UsageError("Invalid combination: when using WebDAV/Nextcloud mode, please provide --webdav-url, --remote, and --path together.")

# === Mode 1: Classic Deploy ===
if distributions and not (metadata_file or webdav_url or remote or path):
click.echo("[MODE] Classic deploy with distributions")
click.echo(f"Deploying dataset version: {version_id}")

dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
client.deploy(dataid=dataid, api_key=apikey)
return

# === Mode 2: Metadata File ===
if metadata_file:
click.echo(f"[MODE] Deploy from metadata file: {metadata_file}")
with open(metadata_file, 'r') as f:
metadata = json.load(f)
client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)
return

# === Mode 3: Upload & Deploy (Nextcloud) ===
if webdav_url and remote and path:
if not distributions:
raise click.UsageError("Please provide files to upload when using WebDAV/Nextcloud mode.")

#Check that all given paths exist and are files or directories.#
invalid = [f for f in distributions if not os.path.exists(f)]
if invalid:
raise click.UsageError(f"The following input files or folders do not exist: {', '.join(invalid)}")

click.echo("[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")
click.echo(f"→ Uploading to: {remote}:{path}")
metadata = upload.upload_to_nextcloud(distributions, remote, path, webdav_url)
client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)
return

raise click.UsageError(
"No valid input provided. Please use one of the following modes:\n"
" - Classic deploy: pass distributions as arguments\n"
" - Metadata deploy: use --metadata <file>\n"
" - Upload & deploy: use --webdav-url, --remote, --path, and file arguments"
)


@app.command()
Expand Down
100 changes: 99 additions & 1 deletion databusclient/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from SPARQLWrapper import SPARQLWrapper, JSON
from hashlib import sha256
import os
import re

__debug = False

Expand Down Expand Up @@ -205,6 +204,56 @@ def create_distribution(

return f"{url}|{meta_string}"

def create_distributions_from_metadata(metadata: List[Dict[str, Union[str, int]]]) -> List[str]:
"""
Create distributions from metadata entries.

Parameters
----------
metadata : List[Dict[str, Union[str, int]]]
List of metadata entries, each containing:
- checksum: str - SHA-256 hex digest (64 characters)
- size: int - File size in bytes (positive integer)
- url: str - Download URL for the file
- file_format: str - File format of the file [optional]
- compression: str - Compression format of the file [optional]

Returns
-------
List[str]
List of distribution identifier strings for use with create_dataset
"""
distributions = []
counter = 0

for entry in metadata:
# Validate required keys
required_keys = ["checksum", "size", "url"]
missing_keys = [key for key in required_keys if key not in entry]
if missing_keys:
raise ValueError(f"Metadata entry missing required keys: {missing_keys}")

checksum = entry["checksum"]
size = entry["size"]
url = entry["url"]
if not isinstance(size, int) or size <= 0:
raise ValueError(f"Invalid size for {url}: expected positive integer, got {size}")
# Validate SHA-256 hex digest (64 hex chars)
if not isinstance(checksum, str) or len(checksum) != 64 or not all(
c in '0123456789abcdefABCDEF' for c in checksum):
raise ValueError(f"Invalid checksum for {url}")

distributions.append(
create_distribution(
url=url,
cvs={"count": f"{counter}"},
file_format=entry.get("file_format"),
compression=entry.get("compression"),
sha256_length_tuple=(checksum, size)
)
)
counter += 1
return distributions

def create_dataset(
version_id: str,
Expand Down Expand Up @@ -393,6 +442,55 @@ def deploy(
print(resp.text)


def deploy_from_metadata(
metadata: List[Dict[str, Union[str, int]]],
version_id: str,
title: str,
abstract: str,
description: str,
license_url: str,
apikey: str
) -> None:
"""
Deploy a dataset from metadata entries.

Parameters
----------
metadata : List[Dict[str, Union[str, int]]]
List of file metadata entries (see create_distributions_from_metadata)
version_id : str
Dataset version ID in the form $DATABUS_BASE/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION
title : str
Dataset title
abstract : str
Short description of the dataset
description : str
Long description (Markdown supported)
license_url : str
License URI
apikey : str
API key for authentication
"""
distributions = create_distributions_from_metadata(metadata)

dataset = create_dataset(
version_id=version_id,
title=title,
abstract=abstract,
description=description,
license_url=license_url,
distributions=distributions
)

print(f"Deploying dataset version: {version_id}")
deploy(dataset, apikey)

print(f"Successfully deployed to {version_id}")
print(f"Deployed {len(metadata)} file(s):")
for entry in metadata:
print(f" - {entry['url']}")


def __download_file__(url, filename, vault_token_file=None, auth_url=None, client_id=None) -> None:
"""
Download a file from the internet with a progress bar using tqdm.
Expand Down
Empty file added nextcloudclient/__init__.py
Empty file.
Loading