From 822f292060dc3875f68cdf9e8aef77b7ef320aa5 Mon Sep 17 00:00:00 2001 From: Anni Wang Date: Sun, 1 Jun 2025 21:23:45 -0400 Subject: [PATCH 1/3] starts with filter instructions added --- .../configuration/metadata-filtering.mdx | 44 ++++++++++++++++++- .../docs/autorag/how-to/multitenancy.mdx | 31 +++++++++++++ 2 files changed, 73 insertions(+), 2 deletions(-) diff --git a/src/content/docs/autorag/configuration/metadata-filtering.mdx b/src/content/docs/autorag/configuration/metadata-filtering.mdx index 339197e99b83f4f..b48b0707fa8f692 100644 --- a/src/content/docs/autorag/configuration/metadata-filtering.mdx +++ b/src/content/docs/autorag/configuration/metadata-filtering.mdx @@ -34,13 +34,13 @@ const answer = await env.AI.autorag("my-autorag").search({ You can currently filter by the `folder` and `timestamp` of an R2 object. Currently, custom metadata attributes are not supported. -### `folder` +### Folder The directory to the object. For example, the `folder` of the object at `llama/logistics/llama-logistics.mdx` is `llama/logistics/`. Note that the `folder` does not include a leading `/`. Note that `folder` filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying `folder: "llama/"` will match files in `llama/` but does not match files in `llama/logistics`. -### `timestamp` +### Timestamp The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, `1735689600999` or `2025-01-01 00:00:00.999 UTC` will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`. @@ -91,6 +91,46 @@ Note the following limitations with the compound operators: - Only the `eq` operator is allowed. - All conditions must filter on the **same key** (for example, all on `folder`) +### "Starts with" filter for folders + +You can use "starts with" filtering on the `folder` metadata attribute to search for all files and subfolders within a specific path. + +For example, consider this file structure: +``` +customer-a/profile.md +customer-a/contracts/property/contract-1.pdf +``` + +If you were to filter using an `eq` (equals) operator with `value: "customer-a/"`, it would only match files directly within that folder, like `profile.md`. It wouldn't include files in subfolders like `customer-a/contracts/`. + +To recursively filter for all items starting with the path `customer-a/`, you can use the following compound filter: + +```js +filters: { + type: "and", + filters: [ + { + type: "gt", + key: "folder", + value: "customer-a//", + }, + { + type: "lte", + key: "folder", + value: "customer-a/z", + }, + ], + }, +``` + +This filter identifies paths starting with `customer-a/` by using: + +- The `and` condition to combine the effects of the `gt` and `lte` conditions. +- The `gt` condition to include pathes greater than the `/` ASCII character. +- The `lte` condition to include pathes less than and including the lower case `z` ASCII character. + +Together, these conditions effectively select paths that begin with the provided path value. + ## Response You can see the metadata attributes of your retrieved data in the response under the property `attributes` for each retrieved chunk. For example: diff --git a/src/content/docs/autorag/how-to/multitenancy.mdx b/src/content/docs/autorag/how-to/multitenancy.mdx index b3541108033a788..084ff62c7547d7c 100644 --- a/src/content/docs/autorag/how-to/multitenancy.mdx +++ b/src/content/docs/autorag/how-to/multitenancy.mdx @@ -39,3 +39,34 @@ const response = await env.AI.autorag("my-autorag").search({ ``` To filter across multiple folders, or to add date-based filtering, you can use a compound filter with an array of [comparison filters](/autorag/configuration/metadata-filtering/#compound-filter). + +## Tip: Use "Starts with" filter + +While an `eq` filter targets files at the specific folder, you'll often want to retrieve all documents belonging to a tenant regardless if there are files in its subfolders. For example, all files in `customer-a/` with a structure like: + +``` +customer-a/profile.md +customer-a/contracts/property/contract-1.pdf +``` + +To achieve this [starts with](/autorag/configuration/metadata-filtering/#starts-with-filter-for-folders) behavior, use a compound filter like: + +```js +filters: { + type: "and", + filters: [ + { + type: "gt", + key: "folder", + value: "customer-a//", + }, + { + type: "lte", + key: "folder", + value: "customer-a/z", + }, + ], + }, +``` + +With this filter you would capture both files `profile.md` and `contract-1.pdf`. \ No newline at end of file From 092f58541eae1592788008c3743ec0b2f0dd542d Mon Sep 17 00:00:00 2001 From: Anni Wang Date: Sun, 1 Jun 2025 21:27:00 -0400 Subject: [PATCH 2/3] added explanation --- src/content/docs/autorag/how-to/multitenancy.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/content/docs/autorag/how-to/multitenancy.mdx b/src/content/docs/autorag/how-to/multitenancy.mdx index 084ff62c7547d7c..1ea3b71ea86a3b2 100644 --- a/src/content/docs/autorag/how-to/multitenancy.mdx +++ b/src/content/docs/autorag/how-to/multitenancy.mdx @@ -69,4 +69,10 @@ filters: { }, ``` +This filter identifies paths starting with `customer-a/` by using: + +- The `and` condition to combine the effects of the `gt` and `lte` conditions. +- The `gt` condition to include pathes greater than the `/` ASCII character. +- The `lte` condition to include pathes less than and including the lower case `z` ASCII character. + With this filter you would capture both files `profile.md` and `contract-1.pdf`. \ No newline at end of file From 8a13072fce59c72115801411e9b0810f0faf8d8b Mon Sep 17 00:00:00 2001 From: Jun Lee Date: Mon, 2 Jun 2025 15:46:07 +0100 Subject: [PATCH 3/3] Using FileTree component and minor fixes --- .../configuration/metadata-filtering.mdx | 22 ++++++++----- .../docs/autorag/how-to/multitenancy.mdx | 33 +++++++++++-------- 2 files changed, 34 insertions(+), 21 deletions(-) diff --git a/src/content/docs/autorag/configuration/metadata-filtering.mdx b/src/content/docs/autorag/configuration/metadata-filtering.mdx index b48b0707fa8f692..b934733dc7d73da 100644 --- a/src/content/docs/autorag/configuration/metadata-filtering.mdx +++ b/src/content/docs/autorag/configuration/metadata-filtering.mdx @@ -5,6 +5,8 @@ sidebar: order: 6 --- +import { FileTree } from "~/components" + Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter. Here is an example of metadata filtering using [Workers Binding](/autorag/usage/workers-binding/) but it can be easily adapted to use the [REST API](/autorag/usage/rest-api/) instead. @@ -96,12 +98,16 @@ Note the following limitations with the compound operators: You can use "starts with" filtering on the `folder` metadata attribute to search for all files and subfolders within a specific path. For example, consider this file structure: -``` -customer-a/profile.md -customer-a/contracts/property/contract-1.pdf -``` -If you were to filter using an `eq` (equals) operator with `value: "customer-a/"`, it would only match files directly within that folder, like `profile.md`. It wouldn't include files in subfolders like `customer-a/contracts/`. + +- customer-a + - profile.md + - contracts + - property + - contract-1.pdf + + +If you were to filter using an `eq` (equals) operator with `value: "customer-a/"`, it would only match files directly within that folder, like `profile.md`. It would not include files in subfolders like `customer-a/contracts/`. To recursively filter for all items starting with the path `customer-a/`, you can use the following compound filter: @@ -117,7 +123,7 @@ filters: { { type: "lte", key: "folder", - value: "customer-a/z", + value: "customer-a/z", }, ], }, @@ -126,8 +132,8 @@ filters: { This filter identifies paths starting with `customer-a/` by using: - The `and` condition to combine the effects of the `gt` and `lte` conditions. -- The `gt` condition to include pathes greater than the `/` ASCII character. -- The `lte` condition to include pathes less than and including the lower case `z` ASCII character. +- The `gt` condition to include paths greater than the `/` ASCII character. +- The `lte` condition to include paths less than and including the lower case `z` ASCII character. Together, these conditions effectively select paths that begin with the provided path value. diff --git a/src/content/docs/autorag/how-to/multitenancy.mdx b/src/content/docs/autorag/how-to/multitenancy.mdx index 1ea3b71ea86a3b2..0d5038403b66720 100644 --- a/src/content/docs/autorag/how-to/multitenancy.mdx +++ b/src/content/docs/autorag/how-to/multitenancy.mdx @@ -5,6 +5,8 @@ sidebar: order: 5 --- +import { FileTree } from "~/components" + AutoRAG supports multitenancy by letting you segment content by tenant, so each user, customer, or workspace can only access their own data. This is typically done by organizing documents into per-tenant folders and applying [metadata filters](/autorag/configuration/metadata-filtering/) at query time. ## 1. Organize Content by Tenant @@ -13,11 +15,13 @@ When uploading files to R2, structure your content by tenant using unique folder Example folder structure: -```bash -customer-a/logs/ -customer-a/contracts/ -customer-b/contracts/ -``` + +- customer-a + - logs/ + - contracts/ +- customer-b + - contracts/ + When indexing, AutoRAG will automatically store the folder path as metadata under the `folder` attribute. It is recommended to enforce folder separation during upload or indexing to prevent accidental data access across tenants. @@ -44,10 +48,13 @@ To filter across multiple folders, or to add date-based filtering, you can use a While an `eq` filter targets files at the specific folder, you'll often want to retrieve all documents belonging to a tenant regardless if there are files in its subfolders. For example, all files in `customer-a/` with a structure like: -``` -customer-a/profile.md -customer-a/contracts/property/contract-1.pdf -``` + +- customer-a + - profile.md + - contracts + - property + - contract-1.pdf + To achieve this [starts with](/autorag/configuration/metadata-filtering/#starts-with-filter-for-folders) behavior, use a compound filter like: @@ -63,7 +70,7 @@ filters: { { type: "lte", key: "folder", - value: "customer-a/z", + value: "customer-a/z", }, ], }, @@ -72,7 +79,7 @@ filters: { This filter identifies paths starting with `customer-a/` by using: - The `and` condition to combine the effects of the `gt` and `lte` conditions. -- The `gt` condition to include pathes greater than the `/` ASCII character. -- The `lte` condition to include pathes less than and including the lower case `z` ASCII character. +- The `gt` condition to include paths greater than the `/` ASCII character. +- The `lte` condition to include paths less than and including the lower case `z` ASCII character. -With this filter you would capture both files `profile.md` and `contract-1.pdf`. \ No newline at end of file +This filter captures both files `profile.md` and `contract-1.pdf`. \ No newline at end of file