feat Nextcloudclient #29

gg46ixav · 2025-10-28T09:32:20Z

Summary by CodeRabbit

New Features
- Deploy now supports three modes: Classic, Metadata-file (JSON), and Nextcloud WebDAV upload.
- Nextcloud mode uploads files/directories, computes per-file SHA‑256 checksums and sizes, and returns remote URLs for deployment.
Improvements
- New CLI flags for mode-driven workflows with validation to prevent conflicting options.
- README updated with mode-specific usage, examples, metadata format, and authentication notes.

# Conflicts: # README.md # databusclient/client.py # poetry.lock # pyproject.toml # test.sh

…lient # Conflicts: # README.md # poetry.lock # pyproject.toml

gg46ixav · 2025-10-28T10:15:40Z

To review:

changed the dataset context url in databusclient/client.py
added "Deploy to Databus" and "Upload to Nextcloud and Deploy to Databus" to README.md

coderabbitai · 2025-10-28T15:39:29Z

📝 Walkthrough

Walkthrough

Adds a Nextcloud upload module, metadata-driven distribution creation and deploy helpers, and refactors the CLI and README to support three deployment modes: classic distributions, metadata-file deploy, and Nextcloud (WebDAV/rclone) upload → deploy.

Changes

Cohort / File(s)	Change Summary
Documentation `README.md`	Rewrote deploy docs to describe three modes (Classic, Metadata-file, Nextcloud/WebDAV), added mode-specific flags (`--metadata`, `--webdav-url`, `--remote`, `--path`), JSON metadata example, rclone/WebDAV guidance, and updated usage/examples.
Client library `databusclient/client.py`	Added `create_distributions_from_metadata(metadata)` for metadata validation and distribution creation; added `deploy_from_metadata(...)` (appears duplicated); removed unused `re` import.
CLI `databusclient/cli.py`	Expanded `deploy(...)` signature to accept `metadata_file`, `webdav_url`, `remote`, `path`; added imports (`json`, `os`, `nextcloudclient.upload`); implemented mode selection/validation and wiring for: classic distributions, metadata-file deploy, and Nextcloud upload → deploy.
Nextcloud upload module `nextcloudclient/upload.py`	New module with `compute_sha256_and_length(filepath)`, `get_all_files(path)`, and `upload_to_nextcloud(source_paths, remote_name, remote_path, webdav_url)` which enumerates files, computes per-file (filename, checksum, size, url), runs rclone subprocesses (`copy` / `copyto`), prints progress, and returns a flat metadata list.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant CLI as databusclient.cli.deploy()
    participant Upload as nextcloudclient.upload.upload_to_nextcloud()
    participant Rclone as rclone (subprocess)
    participant Client as databusclient.client

    rect rgb(220,240,255)
    Note over User,CLI: Mode — Nextcloud Upload & Deploy
    User->>CLI: --webdav-url, --remote, --path + inputs
    CLI->>Upload: source_paths, remote, path, webdav_url
    loop per source path
        Upload->>Upload: enumerate files, compute_sha256_and_length()
        Upload->>Rclone: rclone copy / copyto (subprocess)
        Rclone-->>Upload: success / error
    end
    Upload-->>CLI: list of (filename, checksum, size, url)
    CLI->>Client: deploy_from_metadata(metadata, version_id, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end

sequenceDiagram
    participant User as User/CLI
    participant CLI as databusclient.cli.deploy()
    participant Client as databusclient.client

    rect rgb(220,255,230)
    Note over User,CLI: Mode — Metadata-file Deploy
    User->>CLI: --metadata metadata.json
    CLI->>CLI: load JSON metadata
    CLI->>Client: deploy_from_metadata(metadata, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end

    rect rgb(255,245,220)
    Note over User,CLI: Mode — Classic Distributions
    User->>CLI: distributions (positional)
    CLI->>Client: create_dataset(distributions, ...)
    Client-->>CLI: dataset_id
    CLI->>Client: deploy(dataset_id, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Inspect and remove the duplicate deploy_from_metadata() in databusclient/client.py.
Review metadata validation (required keys, checksum length/format, size positivity) in create_distributions_from_metadata().
Audit rclone subprocess construction/escaping in nextcloudclient/upload.py for injection and error handling; verify correct use of absolute paths and return behavior.
Verify CLI mode-selection logic, mutual-exclusion checks, and user-facing error messages in databusclient/cli.py.
Review README changes for accuracy with the implemented CLI flags and modes.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'feat Nextcloudclient' is vague and generic, using generic prefixes without clearly conveying the main change or purpose of the PR.	Provide a more descriptive title that clearly summarizes the main change, such as 'Add Nextcloud upload and metadata-based deployment modes' or 'Support multi-mode deployment: Classic, Metadata-based, and Nextcloud.'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch nextcloudclient

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

nextcloudclient/upload.py (1)

38-66: Good: append results only after successful upload.

This fixes the earlier issue where failed uploads polluted metadata. Nice improvement.

🧹 Nitpick comments (5)

nextcloudclient/upload.py (5)
1-5: Add imports for URL building and typing.

Required by the path/URL fixes and signature broadening.

Apply this diff:
 import hashlib
 import os
 import subprocess
 import posixpath
+from urllib.parse import urljoin, quote
+from typing import Iterable
6-16: Add typing and use a larger chunk size for throughput.

Return typing improves API clarity; 1 MiB chunks reduce syscall overhead.

Apply this diff:
-def compute_sha256_and_length(filepath):
+def compute_sha256_and_length(filepath: str) -> tuple[str, int]:
     sha256 = hashlib.sha256()
     total_length = 0
-    with open(filepath, 'rb') as f:
+    with open(filepath, "rb") as f:
         while True:
-            chunk = f.read(4096)
+            chunk = f.read(1024 * 1024)
             if not chunk:
                 break
             sha256.update(chunk)
             total_length += len(chunk)
     return sha256.hexdigest(), total_length
18-25: Type hints and deterministic ordering.

Sorted filenames produce stable results; typing clarifies contract.

Apply this diff:
-def get_all_files(path):
+def get_all_files(path: str) -> list[str]:
     if os.path.isfile(path):
         return [path]
     files = []
-    for root, _, filenames in os.walk(path):
-        for name in filenames:
+    for root, _, filenames in os.walk(path):
+        for name in sorted(filenames):
             files.append(os.path.join(root, name))
     return files
27-27: Broaden input type and annotate return.

Iterable[str] is more flexible and callers remain compatible; document the tuple shape.

Apply this diff:
-def upload_to_nextcloud(source_paths: list[str], remote_name: str, remote_path: str, webdav_url: str):
+def upload_to_nextcloud(
+    source_paths: Iterable[str],
+    remote_name: str,
+    remote_path: str,
+    webdav_url: str,
+) -> list[tuple[str, str, int, str]]:
27-70: Optional: switch prints to logging and add docstrings/NamedTuple.

Improves library ergonomics and readability without behavior changes.

I can provide a small logging wrapper and a NamedTuple like:

class UploadedFile(NamedTuple): filename: str; checksum: str; size: int; url: str
Would you like me to draft it?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07359cc and 8047968.

📒 Files selected for processing (1)

nextcloudclient/upload.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.2)

nextcloudclient/upload.py

63-63: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build
GitHub Check: build
GitHub Check: build

🔇 Additional comments (1)

nextcloudclient/upload.py (1)

61-63: S603 note (FYI): subprocess input is safe here.

You pass args as a list and don’t use shell=True; low injection risk. Keep validating inputs upstream.

Do you want me to scan call sites to ensure user-provided values for remote_name/remote_path aren’t sourced from untrusted input?

nextcloudclient/upload.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

nextcloudclient/upload.py (2)

43-43: Minor: Add space after comma for PEP 8 compliance.

Apply this diff:

-            checksum,size  = compute_sha256_and_length(file)
+            checksum, size = compute_sha256_and_length(file)

59-65: Optional refactor: Eliminate duplicate destination assignment.

The destination variable is set identically in both branches (lines 61 and 64). You can simplify by computing it once before the conditional.

Apply this diff:

         dest_subpath = posixpath.join(remote_path.lstrip("/"), basename)
+        destination = f"{remote_name}:{dest_subpath}"
         if os.path.isdir(path):
-            destination = f"{remote_name}:{dest_subpath}"
             command = ["rclone", "copy", abs_path, destination, "--progress"]
         else:
-            destination = f"{remote_name}:{dest_subpath}"
             command = ["rclone", "copyto", abs_path, destination, "--progress"]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8047968 and 0172450.

📒 Files selected for processing (1)

nextcloudclient/upload.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.2)

nextcloudclient/upload.py

69-69: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (4)

nextcloudclient/upload.py (4)

1-5: LGTM! Well-chosen imports.

All imports are from the standard library and are appropriate for the file's functionality—hashing, file operations, subprocess calls, and URL handling.

8-27: LGTM! Helper functions are well-implemented.

Both compute_sha256_and_length and get_all_files are straightforward and correct. The chunked reading approach is memory-efficient for large files, and the recursive directory handling is appropriate.

69-69: Static analysis S603 is a false positive here.

The subprocess warning flags potential execution of untrusted input. However, this is safe because:

The command is built as a list (not shell string), preventing injection.

rclone is hardcoded; user inputs are passed as arguments to rclone.

abs_path is validated and normalized; remote_name and remote_path are interpreted by rclone as configuration and paths, not executed as commands.

68-77: LGTM! Upload execution and error handling are well-structured.

The implementation correctly:

Accumulates file metadata in tmp_results before upload

Only extends the final result list upon successful upload (line 70)

Handles both subprocess errors and missing rclone binary

Continues to the next path on failure, allowing partial success

This properly addresses the previous review concern about including metadata for failed uploads.

databusclient/client.py

README.md

databusclient/metadata.json

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

databusclient/client.py (2)
208-279: Well-implemented metadata validation and distribution creation.

The function correctly validates all required fields, handles edge cases in filename parsing (including compound extensions like .tar.gz), and generates distributions with unique content variants using a sequential counter. The type hints and docstring are clear and comprehensive.

Optional: Consider custom exception classes to address static analysis hints.

The static analysis tool flags lines 233, 239, and 244 for using long error messages directly in ValueError. While not critical, you could optionally create a custom exception class to follow the Ruff TRY003 guideline:
class InvalidMetadataError(ValueError):
    """Raised when metadata validation fails."""
    pass
Then replace ValueError with InvalidMetadataError in the validation checks. However, this is purely a style improvement and can be deferred.

469-516: LGTM! Clean metadata-driven deployment workflow.

The function provides a clear, straightforward interface for deploying datasets from metadata entries. The implementation correctly chains create_distributions_from_metadata, create_dataset, and deploy, with appropriate progress messages.

Optional: Consider returning deployment information.

For programmatic usage, you might consider returning the version_id or dataset information instead of None, allowing callers to use the deployment result:
-) -> None:
+) -> str:
     """
     Deploy a dataset from metadata entries.
     
     ...
+    
+    Returns
+    -------
+    str
+        The deployed dataset version ID
     """
     distributions = create_distributions_from_metadata(metadata)
     
     dataset = create_dataset(
         version_id=version_id,
         title=title,
         abstract=abstract,
         description=description,
         license_url=license_url,
         distributions=distributions
     )
     
     print(f"Deploying dataset version: {version_id}")
     deploy(dataset, apikey)
     
     print(f"Successfully deployed to {version_id}")
     print(f"Deployed {len(metadata)} file(s):")
     for entry in metadata:
         print(f"  - {entry['filename']}")
+    
+    return version_id
This is optional and depends on whether you expect programmatic consumers to need the deployment result.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 529f2ae and 77dca5a.

📒 Files selected for processing (1)

databusclient/client.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (33-39)

🪛 Ruff (0.14.2)

databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)

239-239: Avoid specifying long messages outside the exception class

(TRY003)

244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (1)

databusclient/client.py (1)

760-760: Good catch on the typo.

gg46ixav · 2025-11-03T14:45:46Z

I removed the deploy.py and added a upload_and_deploy and a deploy_with_metadata function to cli.py.
Furthermore I refactored some code and added help functions for these two functions to client.py.
Matching those changes I updated the README.md.
I also changed the context url back to the initial one.

Integer-Ctrl

I thought you were going to add the functionality to the existing databusclient deploy CLI command, but now we have three separate commands instead. Could you please merge the deploy-related options into a single command? Or reasons why not?

From what I can see, the following options are common across all deploy commands:
title, abstract, description, license, and apikey. You could check that the appropriate additional options for each deployment mode are provided, and not more or less, depending on the chosen deploy method/way.

The key requirement is that the existing deploy command remains correct.

Also, since the project will likely be refactored at some point, it would be great to start adding at least some tests. Would you be willing to include a few basic pytest tests for the new functionality? I understand that testing the current methods can be tricky, so feel free to decide whether you want to do this now or leave it.

TL;DR:
Open points:

What about metadata.json?
Merge deploy cli commands
Tests or no tests?

databusclient/metadata.json

…e single deploy command - added simple validate_distributions function

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

README.md (1)
204-210: Add language specifier to code blocks.

The fenced code blocks at lines 204 and 208 are missing language specifiers. This affects syntax highlighting and accessibility.

Apply this diff to add bash language specifiers:
-```
+```bash
 databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
- +bash
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
As per static analysis hints
databusclient/client.py (1)
208-279: Consider moving COMPRESSION_EXTS to module level.

The COMPRESSION_EXTS set is defined inside the loop (line 246), which means it's recreated for every metadata entry. While the performance impact is negligible for typical use cases, moving it to a module-level constant would be more idiomatic.

Define at module level (e.g., after line 11):
_COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"}
Then reference it in the function:
-        # Known compression extensions
-        COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"}
-
         parts = filename.split(".")
         if len(parts) == 1:
             file_format = "none"
             compression = "none"
         elif len(parts) == 2:
             file_format = parts[-1]
             compression = "none"
         else:
             # Check if last part is a known compression
 
-            if parts[-1] in COMPRESSION_EXTS:
+            if parts[-1] in _COMPRESSION_EXTS:
                 compression = parts[-1]
                 # Handle compound extensions like .tar.gz
-                if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS:
+                if len(parts) > 2 and parts[-2] in _COMPRESSION_EXTS:
                     file_format = parts[-3] if len(parts) > 3 else "file"
Otherwise, the implementation correctly handles metadata validation, extension inference, and compound extensions like .tar.gz.
databusclient/cli.py (1)
47-61: Remove unnecessary f-string prefixes.

Lines 57, 81, and 86 have f-string prefixes but contain no placeholders. This is flagged by static analysis and should be cleaned up.

Apply this diff:
-        click.echo(f"[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")
+        click.echo("[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")
Repeat for lines 81 and 86.

As per static analysis hints

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77dca5a and 04c0b6e.

📒 Files selected for processing (3)

README.md (3 hunks)
databusclient/cli.py (2 hunks)
databusclient/client.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (38-91)

databusclient/cli.py (2)

databusclient/client.py (4)

deploy (444-485)

deploy_from_metadata (488-534)

validate_distributions (282-298)

create_dataset (301-441)

nextcloudclient/upload.py (1)

upload_to_nextcloud (29-82)

🪛 markdownlint-cli2 (0.18.1)

README.md

204-204: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

208-208: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.3)

databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)

239-239: Avoid specifying long messages outside the exception class

(TRY003)

244-244: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)

55-55: Avoid specifying long messages outside the exception class

(TRY003)

57-57: f-string without any placeholders

Remove extraneous f prefix

(F541)

76-76: f-string without any placeholders

Remove extraneous f prefix

(F541)

81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)

86-91: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (3)

databusclient/client.py (1)

488-534: LGTM!

The function correctly orchestrates metadata-based deployment with clear type hints, comprehensive documentation, and user-friendly output. The success message appropriately shows the deployed version and file list.

databusclient/cli.py (2)

63-91: LGTM! Mode detection logic is well-structured.

The three-mode deployment strategy is correctly implemented:

Mode 1 (Nextcloud): Requires all three parameters (webdav_url, remote, path) and validates input paths before upload

Mode 2 (Metadata): Loads metadata from file and deploys

Mode 3 (Classic): Validates distribution format before deployment

The mode priority ordering is logical (most specific to most general), validation occurs before any operations, and the fallback error message clearly guides users. Integration with client module functions is correct.

4-4: Remove the unused re import on line 4.

The re module is imported but never used anywhere in this file. Removing it will clean up the imports.

databusclient/client.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

databusclient/client.py (1)
233-244: Consider custom exception class to address style warnings.

The static analysis tool flags long exception messages at lines 233, 239, and 244. While the current approach provides good debugging context, you could optionally address this by creating a custom exception class:
class InvalidMetadataError(ValueError):
    """Raised when metadata validation fails."""
    
    def __init__(self, field: str, filename: str = None, details: str = None):
        self.field = field
        self.filename = filename
        msg = f"Invalid {field}"
        if filename:
            msg += f" for {filename}"
        if details:
            msg += f": {details}"
        super().__init__(msg)
Then replace the ValueError raises with InvalidMetadataError.

Based on learnings (past review comments suggest this pattern was discussed previously).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04c0b6e and 8e6167b.

📒 Files selected for processing (1)

databusclient/client.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (38-91)

🪛 Ruff (0.14.3)

databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)

239-239: Avoid specifying long messages outside the exception class

(TRY003)

244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (3)

databusclient/client.py (3)

282-299: LGTM - Basic validation is appropriate here.

The regex pattern provides a reasonable sanity check for distribution strings in the classic deploy mode. It ensures:

Valid HTTP/HTTPS URL prefix

At least one metadata field after the pipe separator

More detailed validation (CV structure, format, compression, checksum format) occurs in downstream functions like __get_file_info and create_distribution, which is the appropriate place for such checks.

488-535: LGTM - Well-structured orchestration function.

The function properly orchestrates the metadata-based deployment workflow:

Creates distributions with validation (via create_distributions_from_metadata)

Builds the dataset object

Deploys to the Databus

Provides clear user feedback with the version ID and file list

The error handling is appropriate—any failures in the called functions will propagate as exceptions, which is the expected behavior for CLI operations.

779-779: LGTM - Typo fix in comment.

Good catch fixing the typo "fwith" → "with" in the comment.

databusclient/client.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

databusclient/client.py (1)
259-268: Critical: Compound extension parsing still produces incorrect file_format.

Despite being marked as addressed in previous reviews, the compound extension logic still has the original bug. For archive.tar.gz:

parts = ["archive", "tar", "gz"] (length 3)

Line 263 evaluates parts[-3] if len(parts) > 3 else "file"

Since len(parts) == 3, this returns "file" instead of "tar"

Apply this diff:
             if parts[-1] in COMPRESSION_EXTS:
                 compression = parts[-1]
                 # Handle compound extensions like .tar.gz
                 if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS:
-                    file_format = parts[-3] if len(parts) > 3 else "file"
+                    file_format = parts[-2]
                 else:
                     file_format = parts[-2]
This ensures .tar.gz correctly produces file_format="tar" and compression="gz".

🧹 Nitpick comments (4)

databusclient/client.py (1)
780-780: Fix typo in comment.

The comment contains a typo: "fwith" should be "with".

Apply this diff:
-    parts += [None] * (6 - len(parts))  # pad fwith None if less than 6 parts
+    parts += [None] * (6 - len(parts))  # pad with None if less than 6 parts
databusclient/cli.py (3)
31-61: Consider validating partial Nextcloud options to prevent confusion.

The current logic only activates Nextcloud mode when ALL three flags (--webdav-url, --remote, --path) are provided. If a user provides only some flags (e.g., --webdav-url without --path), the code silently falls through to other modes, which could be confusing.

Consider adding explicit validation after the options are parsed:
+    # Validate Nextcloud options
+    nextcloud_flags = [webdav_url, remote, path]
+    nextcloud_provided = [f for f in nextcloud_flags if f is not None]
+    if nextcloud_provided and len(nextcloud_provided) < 3:
+        raise click.UsageError(
+            "Nextcloud mode requires all three options: --webdav-url, --remote, and --path"
+        )
+
     # === Mode 1: Upload & Deploy (Nextcloud) ===
     if webdav_url and remote and path:
This provides clearer error messages when users partially specify Nextcloud options.

52-52: Fix comment formatting.

The comment has extra hash marks at the end.

Apply this diff:
-        #Check that all given paths exist and are files or directories.#
+        # Check that all given paths exist and are files or directories.
76-76: Remove unnecessary f-string prefix.

Line 76 uses an f-string but contains no placeholders.

Apply this diff:
-            raise click.UsageError(
-                f"The following distributions are not in a valid format:\n"
-                + "\n".join(invalid)
+            raise click.UsageError(
+                "The following distributions are not in a valid format:\n"
+                + "\n".join(invalid)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e6167b and 1274cbc.

📒 Files selected for processing (2)

databusclient/cli.py (2 hunks)
databusclient/client.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (38-91)

databusclient/cli.py (2)

databusclient/client.py (4)

deploy (445-486)

deploy_from_metadata (489-535)

validate_distributions (283-299)

create_dataset (302-442)

nextcloudclient/upload.py (1)

upload_to_nextcloud (29-82)

🪛 Ruff (0.14.3)

databusclient/client.py

237-237: Avoid specifying long messages outside the exception class

(TRY003)

243-243: Avoid specifying long messages outside the exception class

(TRY003)

248-248: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)

55-55: Avoid specifying long messages outside the exception class

(TRY003)

76-76: f-string without any placeholders

Remove extraneous f prefix

(F541)

86-91: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (3)

databusclient/client.py (2)

283-300: LGTM!

The validation function is well-documented and correctly validates distribution strings against the expected URL-based pattern.

489-536: LGTM!

The function is well-documented with proper type hints and provides clear user feedback with the deployment summary.

databusclient/cli.py (1)

38-91: Well-structured multi-mode deployment flow!

The three-mode design (Nextcloud, metadata, classic) provides good flexibility. The error handling at the end clearly guides users on which mode to use.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

databusclient/client.py (1)

208-257: Well-implemented metadata-to-distributions converter.

The function correctly validates all required metadata fields (keys, size, checksum format) and creates distributions with sequential counter-based content variants. The validation logic is thorough, and letting create_distribution infer format/compression from URLs is a sensible design choice.

Note: Static analysis flags TRY003 about long error messages at lines 234, 240, 245. Creating custom exception classes (e.g., InvalidMetadataError) would follow Python style guidelines but is optional given the clear error messages.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1274cbc and 02481b3.

📒 Files selected for processing (1)

databusclient/client.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (38-91)

🪛 Ruff (0.14.3)

databusclient/client.py

234-234: Avoid specifying long messages outside the exception class

(TRY003)

240-240: Avoid specifying long messages outside the exception class

(TRY003)

245-245: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (2)

databusclient/client.py (2)

260-276: LGTM: Basic validation is appropriate for this use case.

The function provides a quick sanity check for distribution format (URL followed by pipe and metadata). The intentionally basic validation catches obvious formatting errors early, while detailed validation happens later in create_dataset via __get_file_info. This two-tier approach balances early error detection with comprehensive validation.

466-512: LGTM: Clean orchestration of metadata-driven deployment.

The function properly orchestrates the deployment workflow (create distributions → create dataset → deploy) with clear type hints and documentation. The success message provides helpful feedback by listing deployed filenames. Good integration with the multi-mode CLI design.

databusclient/client.py

gg46ixav · 2025-11-04T14:42:34Z

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

gg46ixav · 2025-11-04T14:51:54Z

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

README.md (1)

204-211: Add language specifiers to code blocks.

The fenced code blocks lack language specifiers, which reduces readability and prevents syntax highlighting.

Apply this diff:

-```
+```bash
 databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
-```
+```bash

-```
+```bash
 databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
-```
+```bash

databusclient/cli.py (1)

74-74: Fix comment formatting.

The comment has inconsistent spacing around the hash marks.
-        #Check that all given paths exist and are files or directories.#
+        # Check that all given paths exist and are files or directories.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02481b3 and 274f252.

📒 Files selected for processing (3)

README.md (3 hunks)
databusclient/cli.py (2 hunks)
databusclient/client.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (37-90)

databusclient/cli.py (2)

databusclient/client.py (3)

deploy (401-442)

create_dataset (258-398)

deploy_from_metadata (445-491)

nextcloudclient/upload.py (1)

upload_to_nextcloud (29-82)

🪛 markdownlint-cli2 (0.18.1)

README.md

204-204: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

208-208: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.3)

databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)

239-239: Avoid specifying long messages outside the exception class

(TRY003)

244-244: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

48-48: Avoid specifying long messages outside the exception class

(TRY003)

50-50: Avoid specifying long messages outside the exception class

(TRY003)

72-72: Avoid specifying long messages outside the exception class

(TRY003)

77-77: Avoid specifying long messages outside the exception class

(TRY003)

85-90: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (3)

databusclient/client.py (2)

207-256: LGTM! Well-implemented metadata processing with comprehensive validation.

The function properly validates all required metadata fields including:

Required key presence

Positive integer size validation

SHA-256 hex checksum validation (64 chars, hex format)

Optional file_format and compression support

The implementation correctly constructs distribution identifiers with a counter-based content variant to ensure uniqueness.

Note: The static analysis TRY003 warnings about long error messages are style preferences and can be safely ignored—the detailed error messages aid debugging.

445-492: LGTM! Clean deployment orchestration.

The function effectively coordinates the metadata-to-deployment workflow:

Creates distributions from metadata

Builds the dataset

Deploys via the existing deploy function

Provides clear user feedback with version ID and file list

databusclient/cli.py (1)

37-90: Well-structured multi-mode deployment with clear separation of concerns.

The refactored deploy function effectively handles three distinct modes:

Mode 1 (Classic): Uses traditional distribution arguments—maintains backward compatibility.

Mode 2 (Metadata): Loads JSON metadata and delegates to deploy_from_metadata.

Mode 3 (Nextcloud): Uploads files via rclone, generates metadata, then deploys.

The conflict detection and validation logic correctly prevents invalid option combinations.

Minor: Line 74 has inconsistent comment formatting (#Check that...# should be # Check that...), but this is a trivial formatting issue.

Integer-Ctrl · 2025-11-05T10:21:11Z

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

Not needed.

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

I don't have time right now to think about meaningful tests. It's not a must-have, but if you want to work on it, doing so in a new branch would be best. (Tests are important, but the whole project isn't well tested anyway..)

If there’s anything else you’d like to change, please comment. Otherwise, if your local tests of deploy work fine, just let me know and I’ll merge the PR.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

databusclient/client.py (1)
445-492: Prefer logging filenames when present.

When running the Nextcloud flow the metadata already carries filename; printing only the URL makes the summary harder to scan. Consider falling back to the URL only if no filename is available.
-    for entry in metadata:
-        print(f"  - {entry['url']}")
+    for entry in metadata:
+        label = entry.get("filename") or entry["url"]
+        print(f"  - {label}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274f252 and f22c71d.

📒 Files selected for processing (1)

databusclient/client.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

databusclient/client.py (1)

databusclient/cli.py (1)

deploy (37-90)

🪛 Ruff (0.14.3)

databusclient/client.py

234-234: Avoid specifying long messages outside the exception class

(TRY003)

240-240: Avoid specifying long messages outside the exception class

(TRY003)

244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: build

🔇 Additional comments (1)

databusclient/client.py (1)

207-256: Robust metadata validation looks great.

Centralizing the checksum/size checks here will save CLI callers from puzzling deploy failures later on.

gg46ixav · 2025-11-05T11:59:24Z

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

Not needed.

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

I don't have time right now to think about meaningful tests. It's not a must-have, but if you want to work on it, doing so in a new branch would be best. (Tests are important, but the whole project isn't well tested anyway..)

If there’s anything else you’d like to change, please comment. Otherwise, if your local tests of deploy work fine, just let me know and I’ll merge the PR.

looking good. for me locally everything is working fine

gg46ixav added 9 commits July 3, 2025 15:34

added deploy script with uploading to given rclone remote

3f73738

added webdav-url argument

9edc0dc

added deploying to the databus without upload to nextcloud

a56f01d

Merge branch 'download-capabilities' into nextcloudclient

5fdf78b

updated pyproject.toml and content-hash

800256c

Merge branch 'main' into nextcloudclient

66f1c8e

Merge remote-tracking branch 'origin/main' into nextcloudclient

4259229

# Conflicts: # README.md # databusclient/client.py # poetry.lock # pyproject.toml # test.sh

updated README.md

b179f90

Merge remote-tracking branch 'origin/nextcloudclient' into nextcloudc…

a504b9d

…lient # Conflicts: # README.md # poetry.lock # pyproject.toml

gg46ixav marked this pull request as ready for review October 28, 2025 10:15

gg46ixav requested a review from Integer-Ctrl October 28, 2025 10:15

This comment was marked as resolved.

Sign in to view

gg46ixav added 5 commits October 28, 2025 16:00

added checksum validation

0ce0c24

updated upload_to_nextcloud function to accept list of source_paths

6596cbc

only add result if upload successful

b9f9854

use os.path.basename instead of .split("/")[-1]

2f8493d

added __init__.py and updated README.md

07359cc

This comment was marked as resolved.

Sign in to view

changed append to extend (no nested list)

8047968

dbpedia deleted a comment from coderabbitai bot Oct 28, 2025

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

nextcloudclient/upload.py Show resolved Hide resolved

nextcloudclient/upload.py Show resolved Hide resolved

nextcloudclient/upload.py Outdated Show resolved Hide resolved

fixed windows separators and added rclone error message

0172450

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

Integer-Ctrl requested changes Oct 30, 2025

View reviewed changes

databusclient/client.py Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Show resolved Hide resolved

Integer-Ctrl requested changes Oct 30, 2025

View reviewed changes

databusclient/metadata.json Outdated Show resolved Hide resolved

gg46ixav added 3 commits November 3, 2025 13:26

moved deploy.py to cli upload_and_deploy

f957512

changed metadata to dict list

607f527

removed python-dotenv

6cb7e11

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

gg46ixav requested a review from Integer-Ctrl November 3, 2025 14:46

Integer-Ctrl requested changes Nov 3, 2025

View reviewed changes

databusclient/metadata.json Outdated Show resolved Hide resolved

gg46ixav added 2 commits November 4, 2025 14:35

- refactored deploy, upload_and_deploy and deploy_with_metadata to on…

02b1873

…e single deploy command - added simple validate_distributions function

updated README.md

04c0b6e

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

databusclient/client.py Outdated Show resolved Hide resolved

gg46ixav added 4 commits November 4, 2025 14:49

fixed docstring

fb93bc9

removed metadata.json

8e6167b

moved COMPRESSION_EXTS out of loop

943e30b

removed unnecessary f-strings

1274cbc

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

databusclient/client.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

set file_format and compression to None

02481b3

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

databusclient/client.py Outdated Show resolved Hide resolved

gg46ixav added 2 commits November 4, 2025 15:28

get file_format and compression from metadata file

a5ec24d

updated README.md

f95155f

gg46ixav requested a review from Integer-Ctrl November 4, 2025 14:43

Integer-Ctrl self-assigned this Nov 5, 2025

chores

274f252

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

updated metadata format (removed filename - used url instead)

f22c71d

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

Integer-Ctrl merged commit 37f9e1a into main Nov 5, 2025
3 checks passed

coderabbitai bot mentioned this pull request Nov 11, 2025

Remove .vscode config, BACKLOG.md and move upload.py #33

Open

gg46ixav deleted the nextcloudclient branch November 11, 2025 15:16

feat Nextcloudclient #29

feat Nextcloudclient #29

Uh oh!

Conversation

gg46ixav commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gg46ixav commented Oct 28, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gg46ixav commented Nov 3, 2025

Uh oh!

Integer-Ctrl left a comment • edited by gg46ixav Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gg46ixav commented Nov 4, 2025

Uh oh!

gg46ixav commented Nov 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Integer-Ctrl commented Nov 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gg46ixav commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

gg46ixav commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Integer-Ctrl left a comment •

edited by gg46ixav

Loading