Skip to content

feat: duplicate detection for uploads #56

@danmunz

Description

@danmunz

Problem

Uploading the same image file multiple times creates duplicate entries on the Frame TV. Each upload gets a new content_id from the TV firmware, so Docent has no way to know the image already exists. This was reported by a user with a 2016 Frame who noticed duplicates accumulating despite their own deduplication logic.

The artwork_meta.json file stores original_filename, width, and height — but never checks them before uploading. There is no content hash stored or compared.

Current State

  • Manual uploads (/api/upload): No dedup at all. Same file uploaded twice → two entries on the TV.
  • Google Drive sync: Has dedup via drive_file_id in the sync file_map — but only prevents re-syncing the same Drive file. Duplicate images with different Drive file IDs still get through.
  • No content hashing: No hash, fingerprint, or perceptual similarity is computed or stored.

Proposed Solution

Phase 1: Hash-based dedup (simple, catches exact duplicates)

  • Compute SHA-256 of the processed image bytes before uploading
  • Store the hash in artwork_meta.json alongside existing fields
  • Before upload, check if any existing artwork entry has the same hash
  • If a match is found, warn the user and let them choose to proceed or skip

Phase 2: Perceptual dedup (stretch goal, catches near-duplicates)

  • Compute a perceptual hash (e.g. imagehash library) for fuzzy matching
  • Surface "similar images already on your Frame" in the upload UI
  • Useful for different crops/resolutions of the same artwork

Considerations

  • Dedup should be advisory, not blocking — users may intentionally upload the same image with different mattes
  • Existing artwork on the TV (not uploaded via Docent) won't have hashes until re-indexed
  • The hash check should happen client-side (fast) before the slow TV upload

Source

Reddit feedback (u/Mangolover112): "One issue I was facing on my 2016 Frame was duplicate images on the frame. Even though my python program maintained a local list of photos uploaded to prevent pushing duplicates, they seemed to replicate on the frame."

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions