fix: attachment sync not downloading from server#2
Conversation
Three bugs prevented attachments from syncing: 1. FileSyncUpdate handler silently skipped files without inline content. Attachments are binary and never include inline content — the client must send a FileChunkDownload request to initiate chunked transfer. 2. Binary message prefix mismatch: server sends "00" (VaultFileMsgType) but client checked for "BC", so all download chunks were ignored. 3. file_content_hash_binary used SHA-256 instead of the djb2 byte-hash the server/plugin expect, causing perpetual hash mismatches. Also: send local file hashes in FileSync request so the server can properly diff, wait for in-flight downloads before declaring sync complete, and increase file sync timeout to 300s. Closes #1
Reviewer's GuideImplements end-to-end fixes for file/attachment sync by sending local file hashes in FileSync, correctly handling binary chunk protocol framing, aligning file hashing with the server/plugin, requesting chunked downloads for attachments, and extending/waiting for file sync completion including in-flight downloads. Sequence diagram for attachment file sync with chunked downloadssequenceDiagram
actor User
participant SyncEngine
participant FileSync
participant WSClient
participant Server
User->>SyncEngine: run_initial_sync
SyncEngine->>FileSync: request_sync()
FileSync->>FileSync: _collect_local_files()
FileSync->>WSClient: send FileSync(context, vault, lastTime, files)
WSClient-->>Server: FileSync request
Server-->>WSClient: FileSyncUpdate(path, metadata, no content)
WSClient->>FileSync: _on_sync_update(msg)
FileSync->>FileSync: detect content is None
FileSync->>WSClient: _request_chunk_download(path, pathHash)
WSClient-->>Server: FileChunkDownload request
Server-->>WSClient: binary chunk frames (prefix 00)
WSClient->>WSClient: _handle_binary(raw)
WSClient->>FileSync: binary_handler(session_id, chunk_index, data)
FileSync->>FileSync: update _download_sessions
Server-->>WSClient: FileSyncEnd(lastTime, needModify, needDelete, needUpload)
WSClient->>FileSync: _on_sync_end(msg)
FileSync->>FileSync: set _sync_complete = True
SyncEngine->>SyncEngine: _wait_file_sync(timeout=300)
SyncEngine->>FileSync: poll _sync_complete
SyncEngine->>FileSync: wait until _download_sessions empty
FileSync-->>SyncEngine: file sync finished with attachments downloaded
Updated class diagram for file sync, client, and protocolclassDiagram
class SyncEngine {
+config
+file_sync
+_initial_sync()
+_wait_file_sync(timeout)
}
class FileSync {
+engine
+config
+vault_path
+_sync_complete
+_download_sessions
+request_sync()
+_on_sync_update(msg)
+_on_sync_end(msg)
+_request_chunk_download(rel_path, data)
+_collect_local_files() list~dict~
+_try_remove_empty_parent(file_path)
}
class Client {
+_binary_handler
+_handle_text(raw)
+_handle_binary(raw)
}
class Protocol {
+build_binary_chunk(session_id, chunk_index, data) bytes
+parse_binary_chunk(raw) tuple
}
class HashUtils {
+file_content_hash_binary(file_path) str
+content_hash(text) str
}
class WSClient {
+send(msg)
}
class EngineState {
+last_file_sync_time
+save()
}
SyncEngine --> FileSync : owns
SyncEngine --> Client : uses
SyncEngine --> EngineState : uses
FileSync --> WSClient : sends_messages_via_engine
FileSync --> HashUtils : uses_file_content_hash_binary
Client --> Protocol : uses_build_and_parse_binary_chunk
Client --> FileSync : invokes_binary_handler
Protocol ..> FileSync : chunk_payloads_for_downloads
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
SyncEngine._wait_file_sync, consider exposing download-session state via a public method/property onfile_syncinstead of reaching into the private_download_sessionsattribute directly. - In
_collect_local_files, the broadexcept Exceptionwill also swallow programming errors; narrowing this to filesystem-related exceptions (e.g.,OSError) and optionally loggingexc_infowould make failures more visible while still skipping unreadable files. - The hardcoded
len(raw) > 42check in_handle_binaryis tightly coupled to the framing layout; consider using a named constant or moving the length validation intoparse_binary_chunkso future changes to the binary frame format are less error-prone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `SyncEngine._wait_file_sync`, consider exposing download-session state via a public method/property on `file_sync` instead of reaching into the private `_download_sessions` attribute directly.
- In `_collect_local_files`, the broad `except Exception` will also swallow programming errors; narrowing this to filesystem-related exceptions (e.g., `OSError`) and optionally logging `exc_info` would make failures more visible while still skipping unreadable files.
- The hardcoded `len(raw) > 42` check in `_handle_binary` is tightly coupled to the framing layout; consider using a named constant or moving the length validation into `parse_binary_chunk` so future changes to the binary frame format are less error-prone.
## Individual Comments
### Comment 1
<location path="fns_cli/file_sync.py" line_range="194-195" />
<code_context>
+ async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
+ """Send FileChunkDownload request for a file that needs chunked transfer."""
+ msg = WSMessage(ACTION_FILE_CHUNK_DOWNLOAD, {
+ "vault": self.config.server.vault,
+ "path": rel_path,
+ "pathHash": data.get("pathHash", path_hash(rel_path)),
+ })
+ await self.engine.ws_client.send(msg)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `data.get("pathHash", ...)` will pass through a `None` pathHash instead of recomputing it.
If the server sends a `pathHash` key with a falsy value (e.g., `null` → `None`), this code will use that value instead of recomputing `path_hash(rel_path)`. If a non-null hash is required, this can cause inconsistent or rejected requests. Consider `data.get("pathHash") or path_hash(rel_path)` so a missing/falsy value triggers recomputation.
```suggestion
async def _request_chunk_download(self, rel_path: str, data: dict) -> None:
"""Send FileChunkDownload request for a file that needs chunked transfer."""
msg = WSMessage(
ACTION_FILE_CHUNK_DOWNLOAD,
{
"vault": self.config.server.vault,
"path": rel_path,
# Recompute the hash if it's missing or any falsy value (e.g. None, '').
"pathHash": data.get("pathHash") or path_hash(rel_path),
},
)
await self.engine.ws_client.send(msg)
```
</issue_to_address>
### Comment 2
<location path="fns_cli/sync_engine.py" line_range="225" />
<code_context>
break
await asyncio.sleep(0.5)
+ # After FileSyncEnd, wait for any in-flight chunk downloads to finish
+ dl_deadline = loop.time() + timeout
+ while self.file_sync._download_sessions:
+ if loop.time() > dl_deadline:
</code_context>
<issue_to_address>
**suggestion (performance):** The additional download wait loop can extend total wait time to roughly 2× `timeout`.
Because the first loop may already wait up to `timeout` for `_sync_complete`, then this loop adds another `timeout` window using a fresh `loop.time() + timeout`, `_wait_file_sync(timeout=300)` can block for ~600 seconds in the worst case. If you want the total wait to be bounded by `timeout`, derive the second deadline from the original one (or reduce the second window).
Suggested implementation:
```python
# After FileSyncEnd, wait for any in-flight chunk downloads to finish
# Reuse the original deadline so total wait time does not exceed `timeout`
while self.file_sync._download_sessions:
if loop.time() > deadline:
```
This change assumes that earlier in `_wait_note_sync` you already compute a deadline for the first loop, e.g.:
```python
deadline = loop.time() + timeout
while not self.file_sync._sync_complete:
...
if loop.time() > deadline:
...
```
If the existing code uses a different variable name (e.g. `end_time`, `deadline_ts`) for the first loop’s timeout, update `deadline` in the replacement block to match that variable name instead of introducing a new one.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def _request_chunk_download(self, rel_path: str, data: dict) -> None: | ||
| """Send FileChunkDownload request for a file that needs chunked transfer.""" |
There was a problem hiding this comment.
suggestion (bug_risk): Using data.get("pathHash", ...) will pass through a None pathHash instead of recomputing it.
If the server sends a pathHash key with a falsy value (e.g., null → None), this code will use that value instead of recomputing path_hash(rel_path). If a non-null hash is required, this can cause inconsistent or rejected requests. Consider data.get("pathHash") or path_hash(rel_path) so a missing/falsy value triggers recomputation.
| async def _request_chunk_download(self, rel_path: str, data: dict) -> None: | |
| """Send FileChunkDownload request for a file that needs chunked transfer.""" | |
| async def _request_chunk_download(self, rel_path: str, data: dict) -> None: | |
| """Send FileChunkDownload request for a file that needs chunked transfer.""" | |
| msg = WSMessage( | |
| ACTION_FILE_CHUNK_DOWNLOAD, | |
| { | |
| "vault": self.config.server.vault, | |
| "path": rel_path, | |
| # Recompute the hash if it's missing or any falsy value (e.g. None, ''). | |
| "pathHash": data.get("pathHash") or path_hash(rel_path), | |
| }, | |
| ) | |
| await self.engine.ws_client.send(msg) |
| break | ||
| await asyncio.sleep(0.5) | ||
| # After FileSyncEnd, wait for any in-flight chunk downloads to finish | ||
| dl_deadline = loop.time() + timeout |
There was a problem hiding this comment.
suggestion (performance): The additional download wait loop can extend total wait time to roughly 2× timeout.
Because the first loop may already wait up to timeout for _sync_complete, then this loop adds another timeout window using a fresh loop.time() + timeout, _wait_file_sync(timeout=300) can block for ~600 seconds in the worst case. If you want the total wait to be bounded by timeout, derive the second deadline from the original one (or reduce the second window).
Suggested implementation:
# After FileSyncEnd, wait for any in-flight chunk downloads to finish
# Reuse the original deadline so total wait time does not exceed `timeout`
while self.file_sync._download_sessions:
if loop.time() > deadline:This change assumes that earlier in _wait_note_sync you already compute a deadline for the first loop, e.g.:
deadline = loop.time() + timeout
while not self.file_sync._sync_complete:
...
if loop.time() > deadline:
...If the existing code uses a different variable name (e.g. end_time, deadline_ts) for the first loop’s timeout, update deadline in the replacement block to match that variable name instead of introducing a new one.
Summary
FileSyncUpdatehandler to request chunked download for attachments (images, etc.) instead of silently skipping them"BC"to"00"to match server'sVaultFileMsgTypefile_content_hash_binaryto use djb2 byte-hash (matching plugin'shashArrayBuffer) instead of SHA-256FileSyncrequest for proper server-side diffTest plan
python -m fns_cli.main sync -c config.yamlagainst a vault with image attachments.png,.jpg) are downloaded to the local vault![[640.png]]) render correctlyCloses #1
Summary by Sourcery
Fix file synchronization so attachment binaries are correctly diffed, transferred in chunks, and awaited before marking sync complete.
New Features:
Bug Fixes:
Enhancements: