-
Notifications
You must be signed in to change notification settings - Fork 6
feat Nextcloudclient #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # README.md # databusclient/client.py # poetry.lock # pyproject.toml # test.sh
…lient # Conflicts: # README.md # poetry.lock # pyproject.toml
|
To review:
|
📝 WalkthroughWalkthroughAdds a Nextcloud upload module, metadata-driven distribution creation and deploy helpers, and refactors the CLI and README to support three deployment modes: classic distributions, metadata-file deploy, and Nextcloud (WebDAV/rclone) upload → deploy. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant CLI as databusclient.cli.deploy()
participant Upload as nextcloudclient.upload.upload_to_nextcloud()
participant Rclone as rclone (subprocess)
participant Client as databusclient.client
rect rgb(220,240,255)
Note over User,CLI: Mode — Nextcloud Upload & Deploy
User->>CLI: --webdav-url, --remote, --path + inputs
CLI->>Upload: source_paths, remote, path, webdav_url
loop per source path
Upload->>Upload: enumerate files, compute_sha256_and_length()
Upload->>Rclone: rclone copy / copyto (subprocess)
Rclone-->>Upload: success / error
end
Upload-->>CLI: list of (filename, checksum, size, url)
CLI->>Client: deploy_from_metadata(metadata, version_id, ...)
Client-->>CLI: deployment result
CLI-->>User: summary
end
sequenceDiagram
participant User as User/CLI
participant CLI as databusclient.cli.deploy()
participant Client as databusclient.client
rect rgb(220,255,230)
Note over User,CLI: Mode — Metadata-file Deploy
User->>CLI: --metadata metadata.json
CLI->>CLI: load JSON metadata
CLI->>Client: deploy_from_metadata(metadata, ...)
Client-->>CLI: deployment result
CLI-->>User: summary
end
rect rgb(255,245,220)
Note over User,CLI: Mode — Classic Distributions
User->>CLI: distributions (positional)
CLI->>Client: create_dataset(distributions, ...)
Client-->>CLI: dataset_id
CLI->>Client: deploy(dataset_id, ...)
Client-->>CLI: deployment result
CLI-->>User: summary
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
nextcloudclient/upload.py (1)
38-66: Good: append results only after successful upload.This fixes the earlier issue where failed uploads polluted metadata. Nice improvement.
🧹 Nitpick comments (5)
nextcloudclient/upload.py (5)
1-5: Add imports for URL building and typing.Required by the path/URL fixes and signature broadening.
Apply this diff:
import hashlib import os import subprocess import posixpath +from urllib.parse import urljoin, quote +from typing import Iterable
6-16: Add typing and use a larger chunk size for throughput.Return typing improves API clarity; 1 MiB chunks reduce syscall overhead.
Apply this diff:
-def compute_sha256_and_length(filepath): +def compute_sha256_and_length(filepath: str) -> tuple[str, int]: sha256 = hashlib.sha256() total_length = 0 - with open(filepath, 'rb') as f: + with open(filepath, "rb") as f: while True: - chunk = f.read(4096) + chunk = f.read(1024 * 1024) if not chunk: break sha256.update(chunk) total_length += len(chunk) return sha256.hexdigest(), total_length
18-25: Type hints and deterministic ordering.Sorted filenames produce stable results; typing clarifies contract.
Apply this diff:
-def get_all_files(path): +def get_all_files(path: str) -> list[str]: if os.path.isfile(path): return [path] files = [] - for root, _, filenames in os.walk(path): - for name in filenames: + for root, _, filenames in os.walk(path): + for name in sorted(filenames): files.append(os.path.join(root, name)) return files
27-27: Broaden input type and annotate return.Iterable[str] is more flexible and callers remain compatible; document the tuple shape.
Apply this diff:
-def upload_to_nextcloud(source_paths: list[str], remote_name: str, remote_path: str, webdav_url: str): +def upload_to_nextcloud( + source_paths: Iterable[str], + remote_name: str, + remote_path: str, + webdav_url: str, +) -> list[tuple[str, str, int, str]]:
27-70: Optional: switch prints to logging and add docstrings/NamedTuple.Improves library ergonomics and readability without behavior changes.
I can provide a small logging wrapper and a NamedTuple like:
- class UploadedFile(NamedTuple): filename: str; checksum: str; size: int; url: str
Would you like me to draft it?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nextcloudclient/upload.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
nextcloudclient/upload.py
63-63: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
nextcloudclient/upload.py (1)
61-63: S603 note (FYI): subprocess input is safe here.You pass args as a list and don’t use shell=True; low injection risk. Keep validating inputs upstream.
Do you want me to scan call sites to ensure user-provided values for remote_name/remote_path aren’t sourced from untrusted input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
nextcloudclient/upload.py (2)
43-43: Minor: Add space after comma for PEP 8 compliance.Apply this diff:
- checksum,size = compute_sha256_and_length(file) + checksum, size = compute_sha256_and_length(file)
59-65: Optional refactor: Eliminate duplicate destination assignment.The
destinationvariable is set identically in both branches (lines 61 and 64). You can simplify by computing it once before the conditional.Apply this diff:
dest_subpath = posixpath.join(remote_path.lstrip("/"), basename) + destination = f"{remote_name}:{dest_subpath}" if os.path.isdir(path): - destination = f"{remote_name}:{dest_subpath}" command = ["rclone", "copy", abs_path, destination, "--progress"] else: - destination = f"{remote_name}:{dest_subpath}" command = ["rclone", "copyto", abs_path, destination, "--progress"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nextcloudclient/upload.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
nextcloudclient/upload.py
69-69: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (4)
nextcloudclient/upload.py (4)
1-5: LGTM! Well-chosen imports.All imports are from the standard library and are appropriate for the file's functionality—hashing, file operations, subprocess calls, and URL handling.
8-27: LGTM! Helper functions are well-implemented.Both
compute_sha256_and_lengthandget_all_filesare straightforward and correct. The chunked reading approach is memory-efficient for large files, and the recursive directory handling is appropriate.
69-69: Static analysis S603 is a false positive here.The subprocess warning flags potential execution of untrusted input. However, this is safe because:
- The command is built as a list (not shell string), preventing injection.
rcloneis hardcoded; user inputs are passed as arguments to rclone.abs_pathis validated and normalized;remote_nameandremote_pathare interpreted by rclone as configuration and paths, not executed as commands.
68-77: LGTM! Upload execution and error handling are well-structured.The implementation correctly:
- Accumulates file metadata in
tmp_resultsbefore upload- Only extends the final
resultlist upon successful upload (line 70)- Handles both subprocess errors and missing rclone binary
- Continues to the next path on failure, allowing partial success
This properly addresses the previous review concern about including metadata for failed uploads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
databusclient/client.py (2)
208-279: Well-implemented metadata validation and distribution creation.The function correctly validates all required fields, handles edge cases in filename parsing (including compound extensions like
.tar.gz), and generates distributions with unique content variants using a sequential counter. The type hints and docstring are clear and comprehensive.
Optional: Consider custom exception classes to address static analysis hints.
The static analysis tool flags lines 233, 239, and 244 for using long error messages directly in
ValueError. While not critical, you could optionally create a custom exception class to follow the Ruff TRY003 guideline:class InvalidMetadataError(ValueError): """Raised when metadata validation fails.""" passThen replace
ValueErrorwithInvalidMetadataErrorin the validation checks. However, this is purely a style improvement and can be deferred.
469-516: LGTM! Clean metadata-driven deployment workflow.The function provides a clear, straightforward interface for deploying datasets from metadata entries. The implementation correctly chains
create_distributions_from_metadata,create_dataset, anddeploy, with appropriate progress messages.
Optional: Consider returning deployment information.
For programmatic usage, you might consider returning the
version_idor dataset information instead ofNone, allowing callers to use the deployment result:-) -> None: +) -> str: """ Deploy a dataset from metadata entries. ... + + Returns + ------- + str + The deployed dataset version ID """ distributions = create_distributions_from_metadata(metadata) dataset = create_dataset( version_id=version_id, title=title, abstract=abstract, description=description, license_url=license_url, distributions=distributions ) print(f"Deploying dataset version: {version_id}") deploy(dataset, apikey) print(f"Successfully deployed to {version_id}") print(f"Deployed {len(metadata)} file(s):") for entry in metadata: print(f" - {entry['filename']}") + + return version_idThis is optional and depends on whether you expect programmatic consumers to need the deployment result.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
databusclient/client.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(33-39)
🪛 Ruff (0.14.2)
databusclient/client.py
233-233: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
244-244: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
databusclient/client.py (1)
760-760: Good catch on the typo.
|
I removed the deploy.py and added a upload_and_deploy and a deploy_with_metadata function to cli.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were going to add the functionality to the existing databusclient deploy CLI command, but now we have three separate commands instead. Could you please merge the deploy-related options into a single command? Or reasons why not?
From what I can see, the following options are common across all deploy commands:
title, abstract, description, license, and apikey. You could check that the appropriate additional options for each deployment mode are provided, and not more or less, depending on the chosen deploy method/way.
The key requirement is that the existing deploy command remains correct.
Also, since the project will likely be refactored at some point, it would be great to start adding at least some tests. Would you be willing to include a few basic pytest tests for the new functionality? I understand that testing the current methods can be tricky, so feel free to decide whether you want to do this now or leave it.
TL;DR:
Open points:
- What about
metadata.json? - Merge deploy cli commands
- Tests or no tests?
…e single deploy command - added simple validate_distributions function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
README.md (1)
204-210: Add language specifier to code blocks.The fenced code blocks at lines 204 and 208 are missing language specifiers. This affects syntax highlighting and accessibility.
Apply this diff to add
bashlanguage specifiers:-``` +```bash databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'-
+bash
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'As per static analysis hints
databusclient/client.py (1)
208-279: Consider moving COMPRESSION_EXTS to module level.The
COMPRESSION_EXTSset is defined inside the loop (line 246), which means it's recreated for every metadata entry. While the performance impact is negligible for typical use cases, moving it to a module-level constant would be more idiomatic.Define at module level (e.g., after line 11):
_COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"}Then reference it in the function:
- # Known compression extensions - COMPRESSION_EXTS = {"gz", "bz2", "xz", "zip", "7z", "tar", "lz", "zst"} - parts = filename.split(".") if len(parts) == 1: file_format = "none" compression = "none" elif len(parts) == 2: file_format = parts[-1] compression = "none" else: # Check if last part is a known compression - if parts[-1] in COMPRESSION_EXTS: + if parts[-1] in _COMPRESSION_EXTS: compression = parts[-1] # Handle compound extensions like .tar.gz - if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS: + if len(parts) > 2 and parts[-2] in _COMPRESSION_EXTS: file_format = parts[-3] if len(parts) > 3 else "file"Otherwise, the implementation correctly handles metadata validation, extension inference, and compound extensions like
.tar.gz.databusclient/cli.py (1)
47-61: Remove unnecessary f-string prefixes.Lines 57, 81, and 86 have f-string prefixes but contain no placeholders. This is flagged by static analysis and should be cleaned up.
Apply this diff:
- click.echo(f"[MODE] Upload & Deploy to DBpedia Databus via Nextcloud") + click.echo("[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")Repeat for lines 81 and 86.
As per static analysis hints
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(3 hunks)databusclient/cli.py(2 hunks)databusclient/client.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(38-91)
databusclient/cli.py (2)
databusclient/client.py (4)
deploy(444-485)deploy_from_metadata(488-534)validate_distributions(282-298)create_dataset(301-441)nextcloudclient/upload.py (1)
upload_to_nextcloud(29-82)
🪛 markdownlint-cli2 (0.18.1)
README.md
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
208-208: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.3)
databusclient/client.py
233-233: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
244-244: Avoid specifying long messages outside the exception class
(TRY003)
databusclient/cli.py
50-50: Avoid specifying long messages outside the exception class
(TRY003)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
57-57: f-string without any placeholders
Remove extraneous f prefix
(F541)
76-76: f-string without any placeholders
Remove extraneous f prefix
(F541)
81-81: f-string without any placeholders
Remove extraneous f prefix
(F541)
86-91: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (1)
488-534: LGTM!The function correctly orchestrates metadata-based deployment with clear type hints, comprehensive documentation, and user-friendly output. The success message appropriately shows the deployed version and file list.
databusclient/cli.py (2)
63-91: LGTM! Mode detection logic is well-structured.The three-mode deployment strategy is correctly implemented:
- Mode 1 (Nextcloud): Requires all three parameters (
webdav_url,remote,path) and validates input paths before upload- Mode 2 (Metadata): Loads metadata from file and deploys
- Mode 3 (Classic): Validates distribution format before deployment
The mode priority ordering is logical (most specific to most general), validation occurs before any operations, and the fallback error message clearly guides users. Integration with
clientmodule functions is correct.
4-4: Remove the unusedreimport on line 4.The
remodule is imported but never used anywhere in this file. Removing it will clean up the imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
databusclient/client.py (1)
233-244: Consider custom exception class to address style warnings.The static analysis tool flags long exception messages at lines 233, 239, and 244. While the current approach provides good debugging context, you could optionally address this by creating a custom exception class:
class InvalidMetadataError(ValueError): """Raised when metadata validation fails.""" def __init__(self, field: str, filename: str = None, details: str = None): self.field = field self.filename = filename msg = f"Invalid {field}" if filename: msg += f" for {filename}" if details: msg += f": {details}" super().__init__(msg)Then replace the ValueError raises with
InvalidMetadataError.Based on learnings (past review comments suggest this pattern was discussed previously).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
databusclient/client.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(38-91)
🪛 Ruff (0.14.3)
databusclient/client.py
233-233: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
244-244: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (3)
282-299: LGTM - Basic validation is appropriate here.The regex pattern provides a reasonable sanity check for distribution strings in the classic deploy mode. It ensures:
- Valid HTTP/HTTPS URL prefix
- At least one metadata field after the pipe separator
More detailed validation (CV structure, format, compression, checksum format) occurs in downstream functions like
__get_file_infoandcreate_distribution, which is the appropriate place for such checks.
488-535: LGTM - Well-structured orchestration function.The function properly orchestrates the metadata-based deployment workflow:
- Creates distributions with validation (via
create_distributions_from_metadata)- Builds the dataset object
- Deploys to the Databus
- Provides clear user feedback with the version ID and file list
The error handling is appropriate—any failures in the called functions will propagate as exceptions, which is the expected behavior for CLI operations.
779-779: LGTM - Typo fix in comment.Good catch fixing the typo "fwith" → "with" in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
databusclient/client.py (1)
259-268: Critical: Compound extension parsing still produces incorrect file_format.Despite being marked as addressed in previous reviews, the compound extension logic still has the original bug. For
archive.tar.gz:
parts = ["archive", "tar", "gz"](length 3)- Line 263 evaluates
parts[-3] if len(parts) > 3 else "file"- Since
len(parts) == 3, this returns"file"instead of"tar"Apply this diff:
if parts[-1] in COMPRESSION_EXTS: compression = parts[-1] # Handle compound extensions like .tar.gz if len(parts) > 2 and parts[-2] in COMPRESSION_EXTS: - file_format = parts[-3] if len(parts) > 3 else "file" + file_format = parts[-2] else: file_format = parts[-2]This ensures
.tar.gzcorrectly producesfile_format="tar"andcompression="gz".
🧹 Nitpick comments (4)
databusclient/client.py (1)
780-780: Fix typo in comment.The comment contains a typo: "fwith" should be "with".
Apply this diff:
- parts += [None] * (6 - len(parts)) # pad fwith None if less than 6 parts + parts += [None] * (6 - len(parts)) # pad with None if less than 6 partsdatabusclient/cli.py (3)
31-61: Consider validating partial Nextcloud options to prevent confusion.The current logic only activates Nextcloud mode when ALL three flags (
--webdav-url,--remote,--path) are provided. If a user provides only some flags (e.g.,--webdav-urlwithout--path), the code silently falls through to other modes, which could be confusing.Consider adding explicit validation after the options are parsed:
+ # Validate Nextcloud options + nextcloud_flags = [webdav_url, remote, path] + nextcloud_provided = [f for f in nextcloud_flags if f is not None] + if nextcloud_provided and len(nextcloud_provided) < 3: + raise click.UsageError( + "Nextcloud mode requires all three options: --webdav-url, --remote, and --path" + ) + # === Mode 1: Upload & Deploy (Nextcloud) === if webdav_url and remote and path:This provides clearer error messages when users partially specify Nextcloud options.
52-52: Fix comment formatting.The comment has extra hash marks at the end.
Apply this diff:
- #Check that all given paths exist and are files or directories.# + # Check that all given paths exist and are files or directories.
76-76: Remove unnecessary f-string prefix.Line 76 uses an f-string but contains no placeholders.
Apply this diff:
- raise click.UsageError( - f"The following distributions are not in a valid format:\n" - + "\n".join(invalid) + raise click.UsageError( + "The following distributions are not in a valid format:\n" + + "\n".join(invalid)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
databusclient/cli.py(2 hunks)databusclient/client.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(38-91)
databusclient/cli.py (2)
databusclient/client.py (4)
deploy(445-486)deploy_from_metadata(489-535)validate_distributions(283-299)create_dataset(302-442)nextcloudclient/upload.py (1)
upload_to_nextcloud(29-82)
🪛 Ruff (0.14.3)
databusclient/client.py
237-237: Avoid specifying long messages outside the exception class
(TRY003)
243-243: Avoid specifying long messages outside the exception class
(TRY003)
248-248: Avoid specifying long messages outside the exception class
(TRY003)
databusclient/cli.py
50-50: Avoid specifying long messages outside the exception class
(TRY003)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
76-76: f-string without any placeholders
Remove extraneous f prefix
(F541)
86-91: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (2)
283-300: LGTM!The validation function is well-documented and correctly validates distribution strings against the expected URL-based pattern.
489-536: LGTM!The function is well-documented with proper type hints and provides clear user feedback with the deployment summary.
databusclient/cli.py (1)
38-91: Well-structured multi-mode deployment flow!The three-mode design (Nextcloud, metadata, classic) provides good flexibility. The error handling at the end clearly guides users on which mode to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
databusclient/client.py (1)
208-257: Well-implemented metadata-to-distributions converter.The function correctly validates all required metadata fields (keys, size, checksum format) and creates distributions with sequential counter-based content variants. The validation logic is thorough, and letting
create_distributioninfer format/compression from URLs is a sensible design choice.Note: Static analysis flags TRY003 about long error messages at lines 234, 240, 245. Creating custom exception classes (e.g.,
InvalidMetadataError) would follow Python style guidelines but is optional given the clear error messages.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
databusclient/client.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(38-91)
🪛 Ruff (0.14.3)
databusclient/client.py
234-234: Avoid specifying long messages outside the exception class
(TRY003)
240-240: Avoid specifying long messages outside the exception class
(TRY003)
245-245: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (2)
databusclient/client.py (2)
260-276: LGTM: Basic validation is appropriate for this use case.The function provides a quick sanity check for distribution format (URL followed by pipe and metadata). The intentionally basic validation catches obvious formatting errors early, while detailed validation happens later in
create_datasetvia__get_file_info. This two-tier approach balances early error detection with comprehensive validation.
466-512: LGTM: Clean orchestration of metadata-driven deployment.The function properly orchestrates the deployment workflow (create distributions → create dataset → deploy) with clear type hints and documentation. The success message provides helpful feedback by listing deployed filenames. Good integration with the multi-mode CLI design.
|
I added a very simple validate_distributions function in client.py to validate the given distributions when calling the classic deploy command. |
|
I can add some tests. Are there specific things I definitely need to test? Otherwise I would implement what I think is useful/appropriate. And should I use a new branch for that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
README.md (1)
204-211: Add language specifiers to code blocks.The fenced code blocks lack language specifiers, which reduces readability and prevents syntax highlighting.
Apply this diff:
-``` +```bash databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger' -``` +```bash -``` +```bash databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger' -``` +```bashdatabusclient/cli.py (1)
74-74: Fix comment formatting.The comment has inconsistent spacing around the hash marks.
- #Check that all given paths exist and are files or directories.# + # Check that all given paths exist and are files or directories.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(3 hunks)databusclient/cli.py(2 hunks)databusclient/client.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(37-90)
databusclient/cli.py (2)
databusclient/client.py (3)
deploy(401-442)create_dataset(258-398)deploy_from_metadata(445-491)nextcloudclient/upload.py (1)
upload_to_nextcloud(29-82)
🪛 markdownlint-cli2 (0.18.1)
README.md
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
208-208: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.3)
databusclient/client.py
233-233: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
244-244: Avoid specifying long messages outside the exception class
(TRY003)
databusclient/cli.py
48-48: Avoid specifying long messages outside the exception class
(TRY003)
50-50: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Avoid specifying long messages outside the exception class
(TRY003)
85-90: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
databusclient/client.py (2)
207-256: LGTM! Well-implemented metadata processing with comprehensive validation.The function properly validates all required metadata fields including:
- Required key presence
- Positive integer size validation
- SHA-256 hex checksum validation (64 chars, hex format)
- Optional file_format and compression support
The implementation correctly constructs distribution identifiers with a counter-based content variant to ensure uniqueness.
Note: The static analysis TRY003 warnings about long error messages are style preferences and can be safely ignored—the detailed error messages aid debugging.
445-492: LGTM! Clean deployment orchestration.The function effectively coordinates the metadata-to-deployment workflow:
- Creates distributions from metadata
- Builds the dataset
- Deploys via the existing deploy function
- Provides clear user feedback with version ID and file list
databusclient/cli.py (1)
37-90: Well-structured multi-mode deployment with clear separation of concerns.The refactored deploy function effectively handles three distinct modes:
Mode 1 (Classic): Uses traditional distribution arguments—maintains backward compatibility.
Mode 2 (Metadata): Loads JSON metadata and delegates to
deploy_from_metadata.Mode 3 (Nextcloud): Uploads files via rclone, generates metadata, then deploys.
The conflict detection and validation logic correctly prevents invalid option combinations.
Minor: Line 74 has inconsistent comment formatting (
#Check that...#should be# Check that...), but this is a trivial formatting issue.
Not needed.
I don't have time right now to think about meaningful tests. It's not a must-have, but if you want to work on it, doing so in a new branch would be best. (Tests are important, but the whole project isn't well tested anyway..) If there’s anything else you’d like to change, please comment. Otherwise, if your local tests of deploy work fine, just let me know and I’ll merge the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
databusclient/client.py (1)
445-492: Prefer logging filenames when present.When running the Nextcloud flow the metadata already carries
filename; printing only the URL makes the summary harder to scan. Consider falling back to the URL only if no filename is available.- for entry in metadata: - print(f" - {entry['url']}") + for entry in metadata: + label = entry.get("filename") or entry["url"] + print(f" - {label}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
databusclient/client.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/client.py (1)
databusclient/cli.py (1)
deploy(37-90)
🪛 Ruff (0.14.3)
databusclient/client.py
234-234: Avoid specifying long messages outside the exception class
(TRY003)
240-240: Avoid specifying long messages outside the exception class
(TRY003)
244-244: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
databusclient/client.py (1)
207-256: Robust metadata validation looks great.Centralizing the checksum/size checks here will save CLI callers from puzzling deploy failures later on.
looking good. for me locally everything is working fine |
Summary by CodeRabbit
New Features
Improvements