An Alfresco Content Services (ACS 26.1) Platform JAR extension that prevents duplicate file uploads by computing a SHA-256 hash of incoming content and comparing it against existing nodes in the target folder hierarchy before the upload is committed.
Built with aiup-alfresco — a Claude Code plugin for Alfresco extension development.
- Upload intercept — An
OnContentUpdatePolicybehaviour fires within the same database transaction as the upload, before any commit. - Scope check (opt-in) — The behaviour walks the folder's ancestor chain looking for the
cdd:hierarchyCheckEnabledmarker aspect. If the aspect is not found anywhere in the chain, hashing is skipped entirely (zero overhead for unconfigured folders). - SHA-256 hash — The content stream is hashed in-process using
java.security.MessageDigest. - Transactional search — A DB-backed AFTS query (
QueryConsistency.TRANSACTIONAL) checks for existing nodes with the same hash within the configured scope, bypassing Solr indexing lag. - Folder lock — A
LockService.WRITE_LOCKon the parent folder serialises the check-and-write window, making the extension safe under concurrent uploads. - Reject or accept — If a duplicate is found, a
DuplicateContentException(unchecked) is thrown, rolling back the entire transaction. If no duplicate, thecdd:hashableaspect andcdd:sha256Hashproperty are stored on the new node.
| Component | Version |
|---|---|
| Alfresco Content Services | 26.1 |
Maven In-Process SDK (alfresco-sdk-aggregator) |
4.15.0 |
| Java | 17 |
| JUnit | 5.10.2 |
| Prefix | URI |
|---|---|
cdd |
http://www.example.com/model/content-dedup/1.0 |
Applied automatically to every document node after it passes the duplicate check.
| Property | Type | Mandatory | Constraints |
|---|---|---|---|
cdd:sha256Hash |
d:text |
true | Exactly 64 lowercase hex characters ([a-f0-9]{64}) |
Marker aspect — no properties. Applied by an administrator to a folder to activate dedup scope for all uploads into that folder's subtree.
- Absent from all ancestors → zero overhead, no hashing.
- Present on the direct parent → folder-only duplicate check.
- Present on an ancestor folder → all folders from the upload target up to and including the marked folder are included in the duplicate search.
Prerequisites: Java 17+, Maven 3.9+.
mvn clean packageThe JAR is produced at target/content-dedup-1.0.0-SNAPSHOT.jar.
# In your compose.yaml:
volumes:
- ./target/content-dedup-1.0.0-SNAPSHOT.jar:/usr/local/tomcat/webapps/alfresco/WEB-INF/lib/content-dedup-1.0.0-SNAPSHOT.jarFROM alfresco/alfresco-content-repository-community:26.1.0
COPY target/content-dedup-1.0.0-SNAPSHOT.jar \
/usr/local/tomcat/webapps/alfresco/WEB-INF/lib/All properties are set in alfresco-global.properties. Defaults are safe without any customisation.
| Property | Default | Description |
|---|---|---|
content.dedup.enabled |
true |
Master on/off switch — disables the behaviour without redeployment |
content.dedup.hash.algorithm |
SHA-256 |
MessageDigest algorithm name |
content.dedup.lock.timeoutSeconds |
30 |
Folder write-lock TTL; UnableToAcquireLockException propagates to RetryingTransactionHelper |
Apply the cdd:hierarchyCheckEnabled aspect to any folder before uploading content. Files uploaded before the aspect is applied will not have hashes stored and cannot be detected as future duplicates.
# Fetch current aspects of the folder
curl -u admin:admin \
"http://localhost:8080/alfresco/api/-default-/public/alfresco/versions/1/nodes/{folder-id}?fields=aspectNames"
# Add the marker aspect (include all existing aspects in the list)
curl -u admin:admin -X PUT \
"http://localhost:8080/alfresco/api/-default-/public/alfresco/versions/1/nodes/{folder-id}" \
-H "Content-Type: application/json" \
-d '{"aspectNames": ["cm:auditable", "cm:titled", "cdd:hierarchyCheckEnabled"]}'Alternatively, apply the aspect through the Alfresco Share UI or Node Browser (Admin Tools → Node Browser).
Run against a live ACS instance:
mvn verify -Dacs.endpoint.path=http://localhost:8080 \
-Dacs.username=admin \
-Dacs.password=adminThe test class (DuplicateContentCheckIT) covers:
| Test | Requirement |
|---|---|
| First upload succeeds and stores correct SHA-256 hash | REQ-01 |
| Duplicate upload to same folder is rejected with descriptive error | REQ-01 / REQ-03 |
| Upload of different content to same folder succeeds | REQ-01 |
| Without hierarchy aspect, duplicate in parent folder is not detected | REQ-04 |
| With hierarchy aspect, duplicate in ancestor folder is rejected | REQ-02 |
chmod +x http-tests/content-dedup.sh
# Default (localhost:8080, admin/admin)
bash http-tests/content-dedup.sh
# Custom target
HOST=http://alfresco:8080 USERNAME=admin PASSWORD=secret \
bash http-tests/content-dedup.shCovers 6 test cases (TC-01 through TC-06) including authentication, hash storage, duplicate rejection, different-content acceptance, scope boundary enforcement, and hierarchy detection.
content-dedup/
├── pom.xml
├── REQUIREMENTS.md
├── http-tests/
│ └── content-dedup.sh # curl smoke tests
└── src/
├── main/
│ ├── java/org/alfresco/contentdedup/
│ │ ├── behaviour/
│ │ │ └── DuplicateContentCheckBehaviour.java
│ │ └── exception/
│ │ └── DuplicateContentException.java
│ └── resources/alfresco/module/content-dedup/
│ ├── module.properties
│ ├── module-context.xml
│ ├── context/
│ │ ├── bootstrap-context.xml
│ │ └── service-context.xml
│ └── model/
│ └── content-model.xml
└── test/
└── java/org/alfresco/contentdedup/
└── DuplicateContentCheckIT.java
This project was built in a single session using Claude Code with the aiup-alfresco plugin, which packages Alfresco extension development as slash commands. The full session — including three real runtime bugs discovered during testing and the architectural improvement that followed — is documented below.
The session started with a description of the desired behaviour:
Implement duplicate content detection with SHA-256. Abort on duplicate. Check before persistence. Safe for concurrent uploads.
The /requirements command produced REQUIREMENTS.md, establishing:
- Architecture: single in-process Platform JAR (no async side-effects needed).
- Five user stories covering same-folder detection (US-01), hierarchy scope (US-02), descriptive error messages (US-03), per-folder configuration (US-04), and concurrency safety (US-05).
- Content model: namespace prefix
cdd, aspectscdd:hashableandcdd:hierarchyCheckEnabled. - Behaviour:
OnContentUpdatePolicyoncm:contentwithEVERY_EVENT, throwing an unchecked exception to force a transaction rollback.
The /scaffold command generated the Maven project skeleton:
pom.xml— parentalfresco-sdk-aggregator 4.15.0, Java 17, ACS 26.1.0, Alfresco BOM import,alfresco-repository/alfresco-remote-api/spring-webscriptsasprovideddependencies, JUnit 5.10.2 for tests.module.properties—module.id=org.alfresco.content-dedup,module.repo.version.min=26.1.module-context.xml— entry point that imports sub-contexts.
The /content-model command generated content-model.xml and bootstrap-context.xml.
Namespace prefix rename: the initial generation used prefix dc (Dublin Core conflict). The prefix was renamed to cdd (Content Dedup) across the model XML and all Java QName constants.
Key decisions captured in the model:
cdd:sha256Hashuses<mandatory>true</mandatory>withoutenforced="true"— a deliberate choice explained in Bug fix 2 below.- The property is indexed with
<tokenised>false</tokenised>to support exact-match transactional AFTS queries. - Two constraints guard the hash value: a
LENGTHconstraint (min/max 64) and aREGEXconstraint ([a-f0-9]{64}).
The /behaviours command generated DuplicateContentCheckBehaviour.java, DuplicateContentException.java, and service-context.xml.
Initial design bound the behaviour to OnContentUpdatePolicy on cm:content with EVERY_EVENT, so it fires within the upload transaction. The algorithm:
- Get the parent folder.
- Build the scope: immediate folder only, or all ancestors up to the first folder bearing
cdd:hierarchyCheckEnabled. - Compute SHA-256 via
MessageDigest. - Acquire a
WRITE_LOCKon the parent folder (concurrency guard). - Run a transactional AFTS query for existing nodes with the same hash.
- Throw
DuplicateContentExceptionif found, or store the hash viaaddAspect.
The /test command generated:
DuplicateContentCheckIT.java— five ordered JUnit 5 tests usingjava.net.http.HttpClient(no extra test dependencies). Covers all five user stories.http-tests/content-dedup.sh— sixcurl-based smoke test cases (TC-01 through TC-06).
Deploying the JAR and running real uploads exposed three issues.
Symptom: the first upload to a folder with cdd:hierarchyCheckEnabled threw:
QueryModelException: Analysis mode not supported for DB DEFAULT
Root cause: the AFTS query in findDuplicate() used bare quoted-phrase syntax:
// WRONG — triggers DEFAULT analysis mode, rejected by the DB query engine
"@cdd\\:sha256Hash:\"" + hash + "\""The DB transactional query engine (DBFTSPhrase) only supports IDENTIFIER (exact-match) mode for property lookups. The DEFAULT mode is a Solr-only analysis path.
Fix: prefix the property term with = to force IDENTIFIER mode:
// CORRECT — IDENTIFIER mode, supported by the DB transactional query engine
"=@cdd\\:sha256Hash:\"" + hash + "\""This applies to any SearchParameters with QueryConsistency.TRANSACTIONAL or TRANSACTIONAL_IF_POSSIBLE.
Symptom: the first upload failed at transaction commit:
IntegrityException: Mandatory property not set: cdd:sha256Hash on cdd:hashable
Root cause: the content model originally declared:
<mandatory enforced="true">true</mandatory>With enforced="true", ACS fires the IntegrityChecker immediately inside OnAddAspectPolicy, which runs before NodeServiceImpl.addAspect() has written the properties map to the database. Even though the behaviour passes a fully-populated properties map to addAspect(), the integrity check fires before those properties are visible, causing a spurious violation.
Fix: remove the enforced attribute:
<mandatory>true</mandatory>Without enforced, the integrity check is deferred to beforeCommit, by which point addAspect() has written both the aspect and its properties to the database.
Note:
enforced="true"is safe only on properties belonging to types (not aspects), where the value must be supplied at node creation time via the REST API.
Problem: the initial buildScope() implementation always returned at least [parentFolder], meaning every content upload anywhere in the repository triggered SHA-256 hashing — even in folders that were never configured for dedup.
Question raised: "Why calculate the hash for every document even if it's not under a folder configured for duplicates exclusion?"
Redesign: buildScope() was changed to walk the ancestor chain looking for cdd:hierarchyCheckEnabled. If the aspect is not found anywhere in the chain, the method returns an empty list. The caller checks for the empty list before calling computeHash() and returns immediately:
// onContentUpdate — eligibility-first pattern
List<NodeRef> scope = buildScope(parentFolder);
if (scope.isEmpty()) {
return; // no aspect in ancestor chain → zero overhead
}
String hash = computeHash(nodeRef); // only reached when dedup is configuredThis means:
- Folders without the aspect anywhere in their ancestor chain — zero overhead: no hashing, no locking, no searching.
- Folders with the aspect — full dedup check, scoped from the upload folder up to and including the marked boundary folder.
The cdd:hierarchyCheckEnabled aspect now serves a dual role: it is both the scope ceiling (the topmost folder included in the hash search) and the opt-in gate (its absence anywhere in the chain disables dedup entirely for that subtree).
| Step | Command / Action | Output |
|---|---|---|
| Requirements | /requirements |
REQUIREMENTS.md — 5 user stories, content model, behaviour spec |
| Scaffold | /scaffold |
pom.xml, module.properties, module-context.xml |
| Content model | /content-model |
content-model.xml, bootstrap-context.xml |
| Namespace rename | Manual | Prefix dc → cdd across model and Java |
| Behaviour | /behaviours |
DuplicateContentCheckBehaviour.java, DuplicateContentException.java, service-context.xml |
| Tests | /test |
DuplicateContentCheckIT.java, http-tests/content-dedup.sh |
| Bug fix 1 | Runtime | AFTS query syntax: @prop:"value" → =@prop:"value" |
| Bug fix 2 | Runtime | Content model: enforced="true" → removed from mandatory declaration |
| Architecture | Design review | buildScope() returns empty list when no aspect in ancestor chain |