Skip to content

Conversation

@gg46ixav
Copy link
Collaborator

@gg46ixav gg46ixav commented Oct 28, 2025

Summary by CodeRabbit

  • New Features

    • Deploy now supports three modes: Classic, Metadata-file (JSON), and Nextcloud WebDAV upload.
    • Nextcloud mode uploads files/directories, computes per-file SHA‑256 checksums and sizes, and returns remote URLs for deployment.
  • Improvements

    • New CLI flags for mode-driven workflows with validation to prevent conflicting options.
    • README updated with mode-specific usage, examples, metadata format, and authentication notes.

@gg46ixav
Copy link
Collaborator Author

To review:

  • changed the dataset context url in databusclient/client.py
  • added "Deploy to Databus" and "Upload to Nextcloud and Deploy to Databus" to README.md

@gg46ixav gg46ixav marked this pull request as ready for review October 28, 2025 10:15
@gg46ixav gg46ixav requested a review from Integer-Ctrl October 28, 2025 10:15
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@dbpedia dbpedia deleted a comment from coderabbitai bot Oct 28, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 28, 2025

📝 Walkthrough

Walkthrough

Adds a Nextcloud upload module, metadata-driven distribution creation and deploy helpers, and refactors the CLI and README to support three deployment modes: classic distributions, metadata-file deploy, and Nextcloud (WebDAV/rclone) upload → deploy.

Changes

Cohort / File(s) Change Summary
Documentation
README.md
Rewrote deploy docs to describe three modes (Classic, Metadata-file, Nextcloud/WebDAV), added mode-specific flags (--metadata, --webdav-url, --remote, --path), JSON metadata example, rclone/WebDAV guidance, and updated usage/examples.
Client library
databusclient/client.py
Added create_distributions_from_metadata(metadata) for metadata validation and distribution creation; added deploy_from_metadata(...) (appears duplicated); removed unused re import.
CLI
databusclient/cli.py
Expanded deploy(...) signature to accept metadata_file, webdav_url, remote, path; added imports (json, os, nextcloudclient.upload); implemented mode selection/validation and wiring for: classic distributions, metadata-file deploy, and Nextcloud upload → deploy.
Nextcloud upload module
nextcloudclient/upload.py
New module with compute_sha256_and_length(filepath), get_all_files(path), and upload_to_nextcloud(source_paths, remote_name, remote_path, webdav_url) which enumerates files, computes per-file (filename, checksum, size, url), runs rclone subprocesses (copy / copyto), prints progress, and returns a flat metadata list.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant CLI as databusclient.cli.deploy()
    participant Upload as nextcloudclient.upload.upload_to_nextcloud()
    participant Rclone as rclone (subprocess)
    participant Client as databusclient.client

    rect rgb(220,240,255)
    Note over User,CLI: Mode — Nextcloud Upload & Deploy
    User->>CLI: --webdav-url, --remote, --path + inputs
    CLI->>Upload: source_paths, remote, path, webdav_url
    loop per source path
        Upload->>Upload: enumerate files, compute_sha256_and_length()
        Upload->>Rclone: rclone copy / copyto (subprocess)
        Rclone-->>Upload: success / error
    end
    Upload-->>CLI: list of (filename, checksum, size, url)
    CLI->>Client: deploy_from_metadata(metadata, version_id, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end
Loading
sequenceDiagram
    participant User as User/CLI
    participant CLI as databusclient.cli.deploy()
    participant Client as databusclient.client

    rect rgb(220,255,230)
    Note over User,CLI: Mode — Metadata-file Deploy
    User->>CLI: --metadata metadata.json
    CLI->>CLI: load JSON metadata
    CLI->>Client: deploy_from_metadata(metadata, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end

    rect rgb(255,245,220)
    Note over User,CLI: Mode — Classic Distributions
    User->>CLI: distributions (positional)
    CLI->>Client: create_dataset(distributions, ...)
    Client-->>CLI: dataset_id
    CLI->>Client: deploy(dataset_id, ...)
    Client-->>CLI: deployment result
    CLI-->>User: summary
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect and remove the duplicate deploy_from_metadata() in databusclient/client.py.
  • Review metadata validation (required keys, checksum length/format, size positivity) in create_distributions_from_metadata().
  • Audit rclone subprocess construction/escaping in nextcloudclient/upload.py for injection and error handling; verify correct use of absolute paths and return behavior.
  • Verify CLI mode-selection logic, mutual-exclusion checks, and user-facing error messages in databusclient/cli.py.
  • Review README changes for accuracy with the implemented CLI flags and modes.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'feat Nextcloudclient' is vague and generic, using generic prefixes without clearly conveying the main change or purpose of the PR. Provide a more descriptive title that clearly summarizes the main change, such as 'Add Nextcloud upload and metadata-based deployment modes' or 'Support multi-mode deployment: Classic, Metadata-based, and Nextcloud.'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nextcloudclient

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
nextcloudclient/upload.py (1)

38-66: Good: append results only after successful upload.

This fixes the earlier issue where failed uploads polluted metadata. Nice improvement.

🧹 Nitpick comments (5)
nextcloudclient/upload.py (5)

1-5: Add imports for URL building and typing.

Required by the path/URL fixes and signature broadening.

Apply this diff:

 import hashlib
 import os
 import subprocess
 import posixpath
+from urllib.parse import urljoin, quote
+from typing import Iterable

6-16: Add typing and use a larger chunk size for throughput.

Return typing improves API clarity; 1 MiB chunks reduce syscall overhead.

Apply this diff:

-def compute_sha256_and_length(filepath):
+def compute_sha256_and_length(filepath: str) -> tuple[str, int]:
     sha256 = hashlib.sha256()
     total_length = 0
-    with open(filepath, 'rb') as f:
+    with open(filepath, "rb") as f:
         while True:
-            chunk = f.read(4096)
+            chunk = f.read(1024 * 1024)
             if not chunk:
                 break
             sha256.update(chunk)
             total_length += len(chunk)
     return sha256.hexdigest(), total_length

18-25: Type hints and deterministic ordering.

Sorted filenames produce stable results; typing clarifies contract.

Apply this diff:

-def get_all_files(path):
+def get_all_files(path: str) -> list[str]:
     if os.path.isfile(path):
         return [path]
     files = []
-    for root, _, filenames in os.walk(path):
-        for name in filenames:
+    for root, _, filenames in os.walk(path):
+        for name in sorted(filenames):
             files.append(os.path.join(root, name))
     return files

27-27: Broaden input type and annotate return.

Iterable[str] is more flexible and callers remain compatible; document the tuple shape.

Apply this diff:

-def upload_to_nextcloud(source_paths: list[str], remote_name: str, remote_path: str, webdav_url: str):
+def upload_to_nextcloud(
+    source_paths: Iterable[str],
+    remote_name: str,
+    remote_path: str,
+    webdav_url: str,
+) -> list[tuple[str, str, int, str]]:

27-70: Optional: switch prints to logging and add docstrings/NamedTuple.

Improves library ergonomics and readability without behavior changes.

I can provide a small logging wrapper and a NamedTuple like:

  • class UploadedFile(NamedTuple): filename: str; checksum: str; size: int; url: str
    Would you like me to draft it?
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07359cc and 8047968.

📒 Files selected for processing (1)
  • nextcloudclient/upload.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
nextcloudclient/upload.py

63-63: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
nextcloudclient/upload.py (1)

61-63: S603 note (FYI): subprocess input is safe here.

You pass args as a list and don’t use shell=True; low injection risk. Keep validating inputs upstream.

Do you want me to scan call sites to ensure user-provided values for remote_name/remote_path aren’t sourced from untrusted input?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
nextcloudclient/upload.py (2)

43-43: Minor: Add space after comma for PEP 8 compliance.

Apply this diff:

-            checksum,size  = compute_sha256_and_length(file)
+            checksum, size = compute_sha256_and_length(file)

59-65: Optional refactor: Eliminate duplicate destination assignment.

The destination variable is set identically in both branches (lines 61 and 64). You can simplify by computing it once before the conditional.

Apply this diff:

         dest_subpath = posixpath.join(remote_path.lstrip("/"), basename)
+        destination = f"{remote_name}:{dest_subpath}"
         if os.path.isdir(path):
-            destination = f"{remote_name}:{dest_subpath}"
             command = ["rclone", "copy", abs_path, destination, "--progress"]
         else:
-            destination = f"{remote_name}:{dest_subpath}"
             command = ["rclone", "copyto", abs_path, destination, "--progress"]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8047968 and 0172450.

📒 Files selected for processing (1)
  • nextcloudclient/upload.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
nextcloudclient/upload.py

69-69: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (4)
nextcloudclient/upload.py (4)

1-5: LGTM! Well-chosen imports.

All imports are from the standard library and are appropriate for the file's functionality—hashing, file operations, subprocess calls, and URL handling.


8-27: LGTM! Helper functions are well-implemented.

Both compute_sha256_and_length and get_all_files are straightforward and correct. The chunked reading approach is memory-efficient for large files, and the recursive directory handling is appropriate.


69-69: Static analysis S603 is a false positive here.

The subprocess warning flags potential execution of untrusted input. However, this is safe because:

  • The command is built as a list (not shell string), preventing injection.
  • rclone is hardcoded; user inputs are passed as arguments to rclone.
  • abs_path is validated and normalized; remote_name and remote_path are interpreted by rclone as configuration and paths, not executed as commands.

68-77: LGTM! Upload execution and error handling are well-structured.

The implementation correctly:

  • Accumulates file metadata in tmp_results before upload
  • Only extends the final result list upon successful upload (line 70)
  • Handles both subprocess errors and missing rclone binary
  • Continues to the next path on failure, allowing partial success

This properly addresses the previous review concern about including metadata for failed uploads.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
databusclient/client.py (2)

208-279: Well-implemented metadata validation and distribution creation.

The function correctly validates all required fields, handles edge cases in filename parsing (including compound extensions like .tar.gz), and generates distributions with unique content variants using a sequential counter. The type hints and docstring are clear and comprehensive.


Optional: Consider custom exception classes to address static analysis hints.

The static analysis tool flags lines 233, 239, and 244 for using long error messages directly in ValueError. While not critical, you could optionally create a custom exception class to follow the Ruff TRY003 guideline:

class InvalidMetadataError(ValueError):
    """Raised when metadata validation fails."""
    pass

Then replace ValueError with InvalidMetadataError in the validation checks. However, this is purely a style improvement and can be deferred.


469-516: LGTM! Clean metadata-driven deployment workflow.

The function provides a clear, straightforward interface for deploying datasets from metadata entries. The implementation correctly chains create_distributions_from_metadata, create_dataset, and deploy, with appropriate progress messages.


Optional: Consider returning deployment information.

For programmatic usage, you might consider returning the version_id or dataset information instead of None, allowing callers to use the deployment result:

-) -> None:
+) -> str:
     """
     Deploy a dataset from metadata entries.
     
     ...
+    
+    Returns
+    -------
+    str
+        The deployed dataset version ID
     """
     distributions = create_distributions_from_metadata(metadata)
     
     dataset = create_dataset(
         version_id=version_id,
         title=title,
         abstract=abstract,
         description=description,
         license_url=license_url,
         distributions=distributions
     )
     
     print(f"Deploying dataset version: {version_id}")
     deploy(dataset, apikey)
     
     print(f"Successfully deployed to {version_id}")
     print(f"Deployed {len(metadata)} file(s):")
     for entry in metadata:
         print(f"  - {entry['filename']}")
+    
+    return version_id

This is optional and depends on whether you expect programmatic consumers to need the deployment result.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 529f2ae and 77dca5a.

📒 Files selected for processing (1)
  • databusclient/client.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (33-39)
🪛 Ruff (0.14.2)
databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
databusclient/client.py (1)

760-760: Good catch on the typo.

@gg46ixav
Copy link
Collaborator Author

gg46ixav commented Nov 3, 2025

I removed the deploy.py and added a upload_and_deploy and a deploy_with_metadata function to cli.py.
Furthermore I refactored some code and added help functions for these two functions to client.py.
Matching those changes I updated the README.md.
I also changed the context url back to the initial one.

@gg46ixav gg46ixav requested a review from Integer-Ctrl November 3, 2025 14:46
Copy link
Contributor

@Integer-Ctrl Integer-Ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were going to add the functionality to the existing databusclient deploy CLI command, but now we have three separate commands instead. Could you please merge the deploy-related options into a single command? Or reasons why not?

From what I can see, the following options are common across all deploy commands:
title, abstract, description, license, and apikey. You could check that the appropriate additional options for each deployment mode are provided, and not more or less, depending on the chosen deploy method/way.

The key requirement is that the existing deploy command remains correct.

Also, since the project will likely be refactored at some point, it would be great to start adding at least some tests. Would you be willing to include a few basic pytest tests for the new functionality? I understand that testing the current methods can be tricky, so feel free to decide whether you want to do this now or leave it.

TL;DR:
Open points:

  • What about metadata.json?
  • Merge deploy cli commands
  • Tests or no tests?

…e single deploy command

- added simple validate_distributions function
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
README.md (1)

204-210: Add language specifier to code blocks.

The fenced code blocks at lines 204 and 208 are missing language specifiers. This affects syntax highlighting and accessibility.

Apply this diff to add bash language specifiers:

-```
+```bash
 databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  

- +bash
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'

As per static analysis hints

databusclient/client.py (1)

208-279: Consider moving COMPRESSION_EXTS to module level.

The COMPRESSION_EXTS set is defined inside the loop (line 246), which means it's recreated for every metadata entry. While the performance impact is negligible for typical use cases, moving it to a module-level constant would be more idiomatic.

Define at module level (e.g., after line 11):

_COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"}

Then reference it in the function:

-        # Known compression extensions
-        COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"}
-
         parts = filename.split(".")
         if len(parts) == 1:
             file_format = "none"
             compression = "none"
         elif len(parts) == 2:
             file_format = parts[-1]
             compression = "none"
         else:
             # Check if last part is a known compression
 
-            if parts[-1] in COMPRESSION_EXTS:
+            if parts[-1] in _COMPRESSION_EXTS:
                 compression = parts[-1]
                 # Handle compound extensions like .tar.gz
-                if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS:
+                if len(parts) > 2 and parts[-2] in _COMPRESSION_EXTS:
                     file_format = parts[-3] if len(parts) > 3 else "file"

Otherwise, the implementation correctly handles metadata validation, extension inference, and compound extensions like .tar.gz.

databusclient/cli.py (1)

47-61: Remove unnecessary f-string prefixes.

Lines 57, 81, and 86 have f-string prefixes but contain no placeholders. This is flagged by static analysis and should be cleaned up.

Apply this diff:

-        click.echo(f"[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")
+        click.echo("[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")

Repeat for lines 81 and 86.

As per static analysis hints

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77dca5a and 04c0b6e.

📒 Files selected for processing (3)
  • README.md (3 hunks)
  • databusclient/cli.py (2 hunks)
  • databusclient/client.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (38-91)
databusclient/cli.py (2)
databusclient/client.py (4)
  • deploy (444-485)
  • deploy_from_metadata (488-534)
  • validate_distributions (282-298)
  • create_dataset (301-441)
nextcloudclient/upload.py (1)
  • upload_to_nextcloud (29-82)
🪛 markdownlint-cli2 (0.18.1)
README.md

204-204: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


208-208: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.3)
databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


244-244: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


57-57: f-string without any placeholders

Remove extraneous f prefix

(F541)


76-76: f-string without any placeholders

Remove extraneous f prefix

(F541)


81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)


86-91: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (1)

488-534: LGTM!

The function correctly orchestrates metadata-based deployment with clear type hints, comprehensive documentation, and user-friendly output. The success message appropriately shows the deployed version and file list.

databusclient/cli.py (2)

63-91: LGTM! Mode detection logic is well-structured.

The three-mode deployment strategy is correctly implemented:

  1. Mode 1 (Nextcloud): Requires all three parameters (webdav_url, remote, path) and validates input paths before upload
  2. Mode 2 (Metadata): Loads metadata from file and deploys
  3. Mode 3 (Classic): Validates distribution format before deployment

The mode priority ordering is logical (most specific to most general), validation occurs before any operations, and the fallback error message clearly guides users. Integration with client module functions is correct.


4-4: Remove the unused re import on line 4.

The re module is imported but never used anywhere in this file. Removing it will clean up the imports.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
databusclient/client.py (1)

233-244: Consider custom exception class to address style warnings.

The static analysis tool flags long exception messages at lines 233, 239, and 244. While the current approach provides good debugging context, you could optionally address this by creating a custom exception class:

class InvalidMetadataError(ValueError):
    """Raised when metadata validation fails."""
    
    def __init__(self, field: str, filename: str = None, details: str = None):
        self.field = field
        self.filename = filename
        msg = f"Invalid {field}"
        if filename:
            msg += f" for {filename}"
        if details:
            msg += f": {details}"
        super().__init__(msg)

Then replace the ValueError raises with InvalidMetadataError.

Based on learnings (past review comments suggest this pattern was discussed previously).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04c0b6e and 8e6167b.

📒 Files selected for processing (1)
  • databusclient/client.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (38-91)
🪛 Ruff (0.14.3)
databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (3)

282-299: LGTM - Basic validation is appropriate here.

The regex pattern provides a reasonable sanity check for distribution strings in the classic deploy mode. It ensures:

  • Valid HTTP/HTTPS URL prefix
  • At least one metadata field after the pipe separator

More detailed validation (CV structure, format, compression, checksum format) occurs in downstream functions like __get_file_info and create_distribution, which is the appropriate place for such checks.


488-535: LGTM - Well-structured orchestration function.

The function properly orchestrates the metadata-based deployment workflow:

  • Creates distributions with validation (via create_distributions_from_metadata)
  • Builds the dataset object
  • Deploys to the Databus
  • Provides clear user feedback with the version ID and file list

The error handling is appropriate—any failures in the called functions will propagate as exceptions, which is the expected behavior for CLI operations.


779-779: LGTM - Typo fix in comment.

Good catch fixing the typo "fwith" → "with" in the comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
databusclient/client.py (1)

259-268: Critical: Compound extension parsing still produces incorrect file_format.

Despite being marked as addressed in previous reviews, the compound extension logic still has the original bug. For archive.tar.gz:

  • parts = ["archive", "tar", "gz"] (length 3)
  • Line 263 evaluates parts[-3] if len(parts) > 3 else "file"
  • Since len(parts) == 3, this returns "file" instead of "tar"

Apply this diff:

             if parts[-1] in COMPRESSION_EXTS:
                 compression = parts[-1]
                 # Handle compound extensions like .tar.gz
                 if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS:
-                    file_format = parts[-3] if len(parts) > 3 else "file"
+                    file_format = parts[-2]
                 else:
                     file_format = parts[-2]

This ensures .tar.gz correctly produces file_format="tar" and compression="gz".

🧹 Nitpick comments (4)
databusclient/client.py (1)

780-780: Fix typo in comment.

The comment contains a typo: "fwith" should be "with".

Apply this diff:

-    parts += [None] * (6 - len(parts))  # pad fwith None if less than 6 parts
+    parts += [None] * (6 - len(parts))  # pad with None if less than 6 parts
databusclient/cli.py (3)

31-61: Consider validating partial Nextcloud options to prevent confusion.

The current logic only activates Nextcloud mode when ALL three flags (--webdav-url, --remote, --path) are provided. If a user provides only some flags (e.g., --webdav-url without --path), the code silently falls through to other modes, which could be confusing.

Consider adding explicit validation after the options are parsed:

+    # Validate Nextcloud options
+    nextcloud_flags = [webdav_url, remote, path]
+    nextcloud_provided = [f for f in nextcloud_flags if f is not None]
+    if nextcloud_provided and len(nextcloud_provided) < 3:
+        raise click.UsageError(
+            "Nextcloud mode requires all three options: --webdav-url, --remote, and --path"
+        )
+
     # === Mode 1: Upload & Deploy (Nextcloud) ===
     if webdav_url and remote and path:

This provides clearer error messages when users partially specify Nextcloud options.


52-52: Fix comment formatting.

The comment has extra hash marks at the end.

Apply this diff:

-        #Check that all given paths exist and are files or directories.#
+        # Check that all given paths exist and are files or directories.

76-76: Remove unnecessary f-string prefix.

Line 76 uses an f-string but contains no placeholders.

Apply this diff:

-            raise click.UsageError(
-                f"The following distributions are not in a valid format:\n"
-                + "\n".join(invalid)
+            raise click.UsageError(
+                "The following distributions are not in a valid format:\n"
+                + "\n".join(invalid)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e6167b and 1274cbc.

📒 Files selected for processing (2)
  • databusclient/cli.py (2 hunks)
  • databusclient/client.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (38-91)
databusclient/cli.py (2)
databusclient/client.py (4)
  • deploy (445-486)
  • deploy_from_metadata (489-535)
  • validate_distributions (283-299)
  • create_dataset (302-442)
nextcloudclient/upload.py (1)
  • upload_to_nextcloud (29-82)
🪛 Ruff (0.14.3)
databusclient/client.py

237-237: Avoid specifying long messages outside the exception class

(TRY003)


243-243: Avoid specifying long messages outside the exception class

(TRY003)


248-248: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


76-76: f-string without any placeholders

Remove extraneous f prefix

(F541)


86-91: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (2)

283-300: LGTM!

The validation function is well-documented and correctly validates distribution strings against the expected URL-based pattern.


489-536: LGTM!

The function is well-documented with proper type hints and provides clear user feedback with the deployment summary.

databusclient/cli.py (1)

38-91: Well-structured multi-mode deployment flow!

The three-mode design (Nextcloud, metadata, classic) provides good flexibility. The error handling at the end clearly guides users on which mode to use.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
databusclient/client.py (1)

208-257: Well-implemented metadata-to-distributions converter.

The function correctly validates all required metadata fields (keys, size, checksum format) and creates distributions with sequential counter-based content variants. The validation logic is thorough, and letting create_distribution infer format/compression from URLs is a sensible design choice.

Note: Static analysis flags TRY003 about long error messages at lines 234, 240, 245. Creating custom exception classes (e.g., InvalidMetadataError) would follow Python style guidelines but is optional given the clear error messages.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1274cbc and 02481b3.

📒 Files selected for processing (1)
  • databusclient/client.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (38-91)
🪛 Ruff (0.14.3)
databusclient/client.py

234-234: Avoid specifying long messages outside the exception class

(TRY003)


240-240: Avoid specifying long messages outside the exception class

(TRY003)


245-245: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (2)
databusclient/client.py (2)

260-276: LGTM: Basic validation is appropriate for this use case.

The function provides a quick sanity check for distribution format (URL followed by pipe and metadata). The intentionally basic validation catches obvious formatting errors early, while detailed validation happens later in create_dataset via __get_file_info. This two-tier approach balances early error detection with comprehensive validation.


466-512: LGTM: Clean orchestration of metadata-driven deployment.

The function properly orchestrates the deployment workflow (create distributions → create dataset → deploy) with clear type hints and documentation. The success message provides helpful feedback by listing deployed filenames. Good integration with the multi-mode CLI design.

@gg46ixav
Copy link
Collaborator Author

gg46ixav commented Nov 4, 2025

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

@gg46ixav gg46ixav requested a review from Integer-Ctrl November 4, 2025 14:43
@gg46ixav
Copy link
Collaborator Author

gg46ixav commented Nov 4, 2025

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

@Integer-Ctrl Integer-Ctrl self-assigned this Nov 5, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
README.md (1)

204-211: Add language specifiers to code blocks.

The fenced code blocks lack language specifiers, which reduces readability and prevents syntax highlighting.

Apply this diff:

-```
+```bash
 databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
-```
+```bash

-```
+```bash
 databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
-```
+```bash
databusclient/cli.py (1)

74-74: Fix comment formatting.

The comment has inconsistent spacing around the hash marks.

-        #Check that all given paths exist and are files or directories.#
+        # Check that all given paths exist and are files or directories.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02481b3 and 274f252.

📒 Files selected for processing (3)
  • README.md (3 hunks)
  • databusclient/cli.py (2 hunks)
  • databusclient/client.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (37-90)
databusclient/cli.py (2)
databusclient/client.py (3)
  • deploy (401-442)
  • create_dataset (258-398)
  • deploy_from_metadata (445-491)
nextcloudclient/upload.py (1)
  • upload_to_nextcloud (29-82)
🪛 markdownlint-cli2 (0.18.1)
README.md

204-204: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


208-208: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.3)
databusclient/client.py

233-233: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


244-244: Avoid specifying long messages outside the exception class

(TRY003)

databusclient/cli.py

48-48: Avoid specifying long messages outside the exception class

(TRY003)


50-50: Avoid specifying long messages outside the exception class

(TRY003)


72-72: Avoid specifying long messages outside the exception class

(TRY003)


77-77: Avoid specifying long messages outside the exception class

(TRY003)


85-90: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (2)

207-256: LGTM! Well-implemented metadata processing with comprehensive validation.

The function properly validates all required metadata fields including:

  • Required key presence
  • Positive integer size validation
  • SHA-256 hex checksum validation (64 chars, hex format)
  • Optional file_format and compression support

The implementation correctly constructs distribution identifiers with a counter-based content variant to ensure uniqueness.

Note: The static analysis TRY003 warnings about long error messages are style preferences and can be safely ignored—the detailed error messages aid debugging.


445-492: LGTM! Clean deployment orchestration.

The function effectively coordinates the metadata-to-deployment workflow:

  • Creates distributions from metadata
  • Builds the dataset
  • Deploys via the existing deploy function
  • Provides clear user feedback with version ID and file list
databusclient/cli.py (1)

37-90: Well-structured multi-mode deployment with clear separation of concerns.

The refactored deploy function effectively handles three distinct modes:

Mode 1 (Classic): Uses traditional distribution arguments—maintains backward compatibility.

Mode 2 (Metadata): Loads JSON metadata and delegates to deploy_from_metadata.

Mode 3 (Nextcloud): Uploads files via rclone, generates metadata, then deploys.

The conflict detection and validation logic correctly prevents invalid option combinations.

Minor: Line 74 has inconsistent comment formatting (#Check that...# should be # Check that...), but this is a trivial formatting issue.

@Integer-Ctrl
Copy link
Contributor

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

Not needed.

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

I don't have time right now to think about meaningful tests. It's not a must-have, but if you want to work on it, doing so in a new branch would be best. (Tests are important, but the whole project isn't well tested anyway..)

If there’s anything else you’d like to change, please comment. Otherwise, if your local tests of deploy work fine, just let me know and I’ll merge the PR.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
databusclient/client.py (1)

445-492: Prefer logging filenames when present.

When running the Nextcloud flow the metadata already carries filename; printing only the URL makes the summary harder to scan. Consider falling back to the URL only if no filename is available.

-    for entry in metadata:
-        print(f"  - {entry['url']}")
+    for entry in metadata:
+        label = entry.get("filename") or entry["url"]
+        print(f"  - {label}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274f252 and f22c71d.

📒 Files selected for processing (1)
  • databusclient/client.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
  • deploy (37-90)
🪛 Ruff (0.14.3)
databusclient/client.py

234-234: Avoid specifying long messages outside the exception class

(TRY003)


240-240: Avoid specifying long messages outside the exception class

(TRY003)


244-244: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
databusclient/client.py (1)

207-256: Robust metadata validation looks great.

Centralizing the checksum/size checks here will save CLI callers from puzzling deploy failures later on.

@gg46ixav
Copy link
Collaborator Author

gg46ixav commented Nov 5, 2025

I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command.

Not needed.

I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that?

I don't have time right now to think about meaningful tests. It's not a must-have, but if you want to work on it, doing so in a new branch would be best. (Tests are important, but the whole project isn't well tested anyway..)

If there’s anything else you’d like to change, please comment. Otherwise, if your local tests of deploy work fine, just let me know and I’ll merge the PR.

looking good. for me locally everything is working fine

@Integer-Ctrl Integer-Ctrl merged commit 37f9e1a into main Nov 5, 2025
3 checks passed
@gg46ixav gg46ixav deleted the nextcloudclient branch November 11, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants