Fix data-loss and silent-staleness bugs in download, sync filtering, and caching#150
Merged
Conversation
Files added with name_clash_id=None (direct external links, direct Moodle
content files, and embedded videojs nodes) all hashed md5("None") when
disambiguating same-named siblings, so two distinct same-named files
resolved to one path and one was silently lost.
Add Node._clash_suffix(), which falls back to the node's URL when
name_clash_id is None. At the rename branch the siblings differ precisely
because their URLs differ, so the URL is a present, distinct key.
…ding skip Course filtering matched configured URLs with `str(course_id) in url`, a substring test: selecting course 12 also pulled in courses 1 and 2, and skipping 12 silently dropped 1 and 2. Add _course_id_in_filter(), which compares the parsed `id` query parameter (or a bare numeric entry) exactly. Also make selected_courses a true allowlist that overrides skip_courses (and only_sync_semester), as documented; previously skip_courses was applied first and could not be overridden. Harden shortname/idnumber access so a missing or empty field no longer aborts the sync or creates an empty-named semester folder.
The SSO/TOTP login and wstoken parsing assumed Moodle's exact HTML, so a changed or error page raised AttributeError/TypeError instead of the intended clean exit. Null-check the sesskey <script> and the maintenance <body>; route every login form-field extraction (csrf_token, RelayState, SAMLResponse) through a require_input_value helper that exits with a clear message when a field is missing; and parse the mobile-launch token defensively (isolate the token value, guard base64 decode and the ":::" split). Also use sys.exit consistently.
…e caches Rework download_file and the per-course cache so an update never destroys or silently abandons local data: - Conflict "rename" mode now downloads the replacement to a temp file first and only moves the local file aside on success, so a failed/aborted download (an expired session returning an HTML error page) never empties the canonical path. - Resume partial downloads only with a validating If-Range using a recorded etag sidecar, and use hidden, namespaced temp files, so a blind range request can no longer splice a newer version onto an old partial or clobber a user's own *.temp file. - A failed ETag comparison is treated as "no cached etag" (fall back to the timestamp/HEAD heuristic) instead of silently overwriting the local file. - Check filetype/name exclusions before any conflict handling so excluded files are never displaced. - Trust cached version markers only when the previous run actually downloaded the file, and have cache_root_node preserve the on-disk version's markers for files that were not fetched this run, so a failed update is retried cleanly next run instead of being skipped forever or moved aside as a spurious conflict. - Skip on matching timemodified only when it is meaningful, so Sciebo files (which have no timemodified) fall through to etag-based change detection and are actually re-downloaded when they change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
renameconflict mode moved the local file aside before downloading; a failed download (e.g. expired session -> HTML error page) left nothing. Now downloads to temp first, displaces only on success.Rangecould splice a new version onto an old partial, or clobber a user's.tempfile. Now usesIf-Range+ hidden temp files.None == Noneskip fired before the etag check, now uses etag-based change detection.idexactly, andselected_coursesoverridesskip_courses.md5("None")and dropped one; now fall back to the URL.Plus login/SSO/TOTP/wstoken hardening against unexpected page structure.