Skip to content

fix: attachment sync not downloading from server#2

Merged
Go1c merged 1 commit into
mainfrom
fix/attachment-sync
Apr 11, 2026
Merged

fix: attachment sync not downloading from server#2
Go1c merged 1 commit into
mainfrom
fix/attachment-sync

Conversation

@Go1c
Copy link
Copy Markdown
Owner

@Go1c Go1c commented Apr 11, 2026

Summary

  • Fix FileSyncUpdate handler to request chunked download for attachments (images, etc.) instead of silently skipping them
  • Fix binary message prefix from "BC" to "00" to match server's VaultFileMsgType
  • Fix file_content_hash_binary to use djb2 byte-hash (matching plugin's hashArrayBuffer) instead of SHA-256
  • Send local file hashes in FileSync request for proper server-side diff
  • Wait for in-flight chunk downloads before declaring sync complete
  • Increase file sync timeout from 60s to 300s

Test plan

  • Run python -m fns_cli.main sync -c config.yaml against a vault with image attachments
  • Verify attachment files (e.g. .png, .jpg) are downloaded to the local vault
  • Verify notes referencing those attachments (e.g. ![[640.png]]) render correctly
  • Verify incremental sync still works (re-run sync, no duplicate downloads)
  • Verify upload of local attachments to server still works

Closes #1

Summary by Sourcery

Fix file synchronization so attachment binaries are correctly diffed, transferred in chunks, and awaited before marking sync complete.

New Features:

  • Include local non-note file metadata and content hashes in FileSync requests to enable server-side diffing of attachments.

Bug Fixes:

  • Request chunked downloads for files without inline content during FileSyncUpdate so attachments are downloaded instead of skipped.
  • Align binary chunk frame prefix and parsing with the server protocol to ensure file chunks are correctly handled.
  • Match binary file content hashing to the server/plugin djb2 byte-based hash to avoid spurious re-uploads or missed updates.
  • Extend file sync waiting logic to include in-flight chunk downloads before considering sync finished.

Enhancements:

  • Increase file sync timeout from 60s to 300s for large attachment sets and slower transfers.
  • Improve FileSync logging with counts of local files and server-side operation counts for better observability.

Three bugs prevented attachments from syncing:

1. FileSyncUpdate handler silently skipped files without inline content.
   Attachments are binary and never include inline content — the client
   must send a FileChunkDownload request to initiate chunked transfer.

2. Binary message prefix mismatch: server sends "00" (VaultFileMsgType)
   but client checked for "BC", so all download chunks were ignored.

3. file_content_hash_binary used SHA-256 instead of the djb2 byte-hash
   the server/plugin expect, causing perpetual hash mismatches.

Also: send local file hashes in FileSync request so the server can
properly diff, wait for in-flight downloads before declaring sync
complete, and increase file sync timeout to 300s.

Closes #1
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 11, 2026

Reviewer's Guide

Implements end-to-end fixes for file/attachment sync by sending local file hashes in FileSync, correctly handling binary chunk protocol framing, aligning file hashing with the server/plugin, requesting chunked downloads for attachments, and extending/waiting for file sync completion including in-flight downloads.

Sequence diagram for attachment file sync with chunked downloads

sequenceDiagram
    actor User
    participant SyncEngine
    participant FileSync
    participant WSClient
    participant Server

    User->>SyncEngine: run_initial_sync
    SyncEngine->>FileSync: request_sync()
    FileSync->>FileSync: _collect_local_files()
    FileSync->>WSClient: send FileSync(context, vault, lastTime, files)
    WSClient-->>Server: FileSync request

    Server-->>WSClient: FileSyncUpdate(path, metadata, no content)
    WSClient->>FileSync: _on_sync_update(msg)
    FileSync->>FileSync: detect content is None
    FileSync->>WSClient: _request_chunk_download(path, pathHash)
    WSClient-->>Server: FileChunkDownload request

    Server-->>WSClient: binary chunk frames (prefix 00)
    WSClient->>WSClient: _handle_binary(raw)
    WSClient->>FileSync: binary_handler(session_id, chunk_index, data)
    FileSync->>FileSync: update _download_sessions

    Server-->>WSClient: FileSyncEnd(lastTime, needModify, needDelete, needUpload)
    WSClient->>FileSync: _on_sync_end(msg)
    FileSync->>FileSync: set _sync_complete = True

    SyncEngine->>SyncEngine: _wait_file_sync(timeout=300)
    SyncEngine->>FileSync: poll _sync_complete
    SyncEngine->>FileSync: wait until _download_sessions empty
    FileSync-->>SyncEngine: file sync finished with attachments downloaded
Loading

Updated class diagram for file sync, client, and protocol

classDiagram
    class SyncEngine {
        +config
        +file_sync
        +_initial_sync()
        +_wait_file_sync(timeout)
    }

    class FileSync {
        +engine
        +config
        +vault_path
        +_sync_complete
        +_download_sessions
        +request_sync()
        +_on_sync_update(msg)
        +_on_sync_end(msg)
        +_request_chunk_download(rel_path, data)
        +_collect_local_files() list~dict~
        +_try_remove_empty_parent(file_path)
    }

    class Client {
        +_binary_handler
        +_handle_text(raw)
        +_handle_binary(raw)
    }

    class Protocol {
        +build_binary_chunk(session_id, chunk_index, data) bytes
        +parse_binary_chunk(raw) tuple
    }

    class HashUtils {
        +file_content_hash_binary(file_path) str
        +content_hash(text) str
    }

    class WSClient {
        +send(msg)
    }

    class EngineState {
        +last_file_sync_time
        +save()
    }

    SyncEngine --> FileSync : owns
    SyncEngine --> Client : uses
    SyncEngine --> EngineState : uses
    FileSync --> WSClient : sends_messages_via_engine
    FileSync --> HashUtils : uses_file_content_hash_binary
    Client --> Protocol : uses_build_and_parse_binary_chunk
    Client --> FileSync : invokes_binary_handler
    Protocol ..> FileSync : chunk_payloads_for_downloads
Loading

File-Level Changes

Change Details Files
Include local non-note files with hashes in FileSync requests so the server can perform proper diffing.
  • Add _collect_local_files to traverse vault, filter non-note/non-excluded files, and gather path, hashes, timestamps, and size
  • Call _collect_local_files in request_sync and populate the FileSync message files payload
  • Update FileSync request logging to include the count of local files being sent
fns_cli/file_sync.py
Trigger chunked downloads for attachments that are advertised without inline content during FileSync.
  • Change _on_sync_update to request a FileChunkDownload when content is None instead of skipping the file
  • Log that a chunked download is being requested for such files
  • Introduce _request_chunk_download helper to send ACTION_FILE_CHUNK_DOWNLOAD with vault, path, and pathHash
fns_cli/file_sync.py
Mark sync completion after FileSyncEnd but ensure all chunk download sessions have finished, with a longer timeout.
  • Refine _on_sync_end to log server-side needModify/needDelete/needUpload counts and set _sync_complete after logging
  • Increase file sync wait timeout in _initial_sync from 60s to 300s
  • Extend _wait_file_sync to wait for pending file_sync._download_sessions after FileSyncEnd, with its own timeout and logging on timeout
fns_cli/file_sync.py
fns_cli/sync_engine.py
Align file content hashing with the plugin by switching binary file hash to djb2 over bytes.
  • Replace SHA-256-based file_content_hash_binary with a streaming djb2 byte-wise hash that matches hashArrayBuffer behavior
  • Return the hash as a signed 32-bit integer string, mirroring plugin output
fns_cli/hash_utils.py
Fix the binary file chunk framing to match server protocol and parsing expectations.
  • Change binary chunk prefix from 'BC' to '00' in build_binary_chunk to match server VaultFileMsgType
  • Adjust parse_binary_chunk to assume the 2-byte prefix is already stripped, updating offsets accordingly
  • Update client._handle_binary to check for '00' prefix and pass raw[2:] into parse_binary_chunk
fns_cli/protocol.py
fns_cli/client.py

Assessment against linked issues

Issue Objective Addressed Explanation
#1 Enable synchronization of attachments (non-note files such as images) from the server to the local vault, so that they are actually downloaded during FileSync instead of only syncing note text.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Go1c Go1c merged commit 44c3acd into main Apr 11, 2026
4 checks passed
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In SyncEngine._wait_file_sync, consider exposing download-session state via a public method/property on file_sync instead of reaching into the private _download_sessions attribute directly.
  • In _collect_local_files, the broad except Exception will also swallow programming errors; narrowing this to filesystem-related exceptions (e.g., OSError) and optionally logging exc_info would make failures more visible while still skipping unreadable files.
  • The hardcoded len(raw) > 42 check in _handle_binary is tightly coupled to the framing layout; consider using a named constant or moving the length validation into parse_binary_chunk so future changes to the binary frame format are less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `SyncEngine._wait_file_sync`, consider exposing download-session state via a public method/property on `file_sync` instead of reaching into the private `_download_sessions` attribute directly.
- In `_collect_local_files`, the broad `except Exception` will also swallow programming errors; narrowing this to filesystem-related exceptions (e.g., `OSError`) and optionally logging `exc_info` would make failures more visible while still skipping unreadable files.
- The hardcoded `len(raw) > 42` check in `_handle_binary` is tightly coupled to the framing layout; consider using a named constant or moving the length validation into `parse_binary_chunk` so future changes to the binary frame format are less error-prone.

## Individual Comments

### Comment 1
<location path="fns_cli/file_sync.py" line_range="194-195" />
<code_context>

+    async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
+        """Send FileChunkDownload request for a file that needs chunked transfer."""
+        msg = WSMessage(ACTION_FILE_CHUNK_DOWNLOAD, {
+            "vault": self.config.server.vault,
+            "path": rel_path,
+            "pathHash": data.get("pathHash", path_hash(rel_path)),
+        })
+        await self.engine.ws_client.send(msg)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `data.get("pathHash", ...)` will pass through a `None` pathHash instead of recomputing it.

If the server sends a `pathHash` key with a falsy value (e.g., `null``None`), this code will use that value instead of recomputing `path_hash(rel_path)`. If a non-null hash is required, this can cause inconsistent or rejected requests. Consider `data.get("pathHash") or path_hash(rel_path)` so a missing/falsy value triggers recomputation.

```suggestion
    async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
        """Send FileChunkDownload request for a file that needs chunked transfer."""
        msg = WSMessage(
            ACTION_FILE_CHUNK_DOWNLOAD,
            {
                "vault": self.config.server.vault,
                "path": rel_path,
                # Recompute the hash if it's missing or any falsy value (e.g. None, '').
                "pathHash": data.get("pathHash") or path_hash(rel_path),
            },
        )
        await self.engine.ws_client.send(msg)
```
</issue_to_address>

### Comment 2
<location path="fns_cli/sync_engine.py" line_range="225" />
<code_context>
                 break
             await asyncio.sleep(0.5)
+        # After FileSyncEnd, wait for any in-flight chunk downloads to finish
+        dl_deadline = loop.time() + timeout
+        while self.file_sync._download_sessions:
+            if loop.time() > dl_deadline:
</code_context>
<issue_to_address>
**suggestion (performance):** The additional download wait loop can extend total wait time to roughly 2× `timeout`.

Because the first loop may already wait up to `timeout` for `_sync_complete`, then this loop adds another `timeout` window using a fresh `loop.time() + timeout`, `_wait_file_sync(timeout=300)` can block for ~600 seconds in the worst case. If you want the total wait to be bounded by `timeout`, derive the second deadline from the original one (or reduce the second window).

Suggested implementation:

```python
        # After FileSyncEnd, wait for any in-flight chunk downloads to finish
        # Reuse the original deadline so total wait time does not exceed `timeout`
        while self.file_sync._download_sessions:
            if loop.time() > deadline:

```

This change assumes that earlier in `_wait_note_sync` you already compute a deadline for the first loop, e.g.:

```python
deadline = loop.time() + timeout
while not self.file_sync._sync_complete:
    ...
    if loop.time() > deadline:
        ...
```

If the existing code uses a different variable name (e.g. `end_time`, `deadline_ts`) for the first loop’s timeout, update `deadline` in the replacement block to match that variable name instead of introducing a new one.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread fns_cli/file_sync.py
Comment on lines +194 to +195
async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
"""Send FileChunkDownload request for a file that needs chunked transfer."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Using data.get("pathHash", ...) will pass through a None pathHash instead of recomputing it.

If the server sends a pathHash key with a falsy value (e.g., nullNone), this code will use that value instead of recomputing path_hash(rel_path). If a non-null hash is required, this can cause inconsistent or rejected requests. Consider data.get("pathHash") or path_hash(rel_path) so a missing/falsy value triggers recomputation.

Suggested change
async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
"""Send FileChunkDownload request for a file that needs chunked transfer."""
async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
"""Send FileChunkDownload request for a file that needs chunked transfer."""
msg = WSMessage(
ACTION_FILE_CHUNK_DOWNLOAD,
{
"vault": self.config.server.vault,
"path": rel_path,
# Recompute the hash if it's missing or any falsy value (e.g. None, '').
"pathHash": data.get("pathHash") or path_hash(rel_path),
},
)
await self.engine.ws_client.send(msg)

Comment thread fns_cli/sync_engine.py
break
await asyncio.sleep(0.5)
# After FileSyncEnd, wait for any in-flight chunk downloads to finish
dl_deadline = loop.time() + timeout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): The additional download wait loop can extend total wait time to roughly 2× timeout.

Because the first loop may already wait up to timeout for _sync_complete, then this loop adds another timeout window using a fresh loop.time() + timeout, _wait_file_sync(timeout=300) can block for ~600 seconds in the worst case. If you want the total wait to be bounded by timeout, derive the second deadline from the original one (or reduce the second window).

Suggested implementation:

        # After FileSyncEnd, wait for any in-flight chunk downloads to finish
        # Reuse the original deadline so total wait time does not exceed `timeout`
        while self.file_sync._download_sessions:
            if loop.time() > deadline:

This change assumes that earlier in _wait_note_sync you already compute a deadline for the first loop, e.g.:

deadline = loop.time() + timeout
while not self.file_sync._sync_complete:
    ...
    if loop.time() > deadline:
        ...

If the existing code uses a different variable name (e.g. end_time, deadline_ts) for the first loop’s timeout, update deadline in the replacement block to match that variable name instead of introducing a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FastNodeSync-CLI是否不支持附件的同步?

1 participant