From ef693a41fb17e3d24b60945090aa4d10e039ba03 Mon Sep 17 00:00:00 2001 From: LJ Date: Tue, 18 Mar 2025 23:31:37 -0700 Subject: [PATCH] Add documentation for the `GoogleDrive` source. --- docs/docs/ops/sources.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/docs/ops/sources.md b/docs/docs/ops/sources.md index 68f444ec9..0a775be08 100644 --- a/docs/docs/ops/sources.md +++ b/docs/docs/ops/sources.md @@ -9,6 +9,8 @@ description: CocoIndex Built-in Sources The `LocalFile` source imports files from a local file system. +### Spec + The spec takes the following fields: * `path` (type: `str`, required): full path of the root directory to import files from * `binary` (type: `bool`, optional): whether reading files as binary (instead of text) @@ -24,7 +26,45 @@ The spec takes the following fields: ::: +### Schema The output is a table with the following sub fields: * `filename` (key, type: `str`): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"` * `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file + +## GoogleDrive + +The `GoogleDrive` source imports files from Google Drive. + +### Setup for Google Drive + +To access files in Google Drive, the `GoogleDrive` source will need to authenticate by service accounts. + +1. Register / login in **Google Cloud**. +2. In [**Google Cloud Console**](https://console.cloud.google.com/), search for *Service Accounts*, to enter the *IAM & Admin / Service Accounts* page. + - **Create a new service account**: Click *+ Create Service Account*. Follow the instructions to finish service account creation. + - **Add a key and download the credential**: Under "Actions" for this new service account, click *Manage keys* → *Add key* → *Create new key* → *JSON*. + Download the key file to a safe place. +3. In **Google Cloud Console**, search for *Google Drive API*. Enable this API. +4. In **Google Drive**, share the folders containing files that need to be imported through your source with the service account's email address. + **Viewer permission** is sufficient. + - The email address can be found under the *IAM & Admin / Service Accounts* page (in Step 2), in the format of `{service-account-id}@{gcp-project-id}.iam.gserviceaccount.com`. + - Copy the folder ID. Folder ID can be found from the last part of the folder's URL, e.g. `https://drive.google.com/drive/u/0/folders/{folder-id}` or `https://drive.google.com/drive/folders/{folder-id}?usp=drive_link`. + + +### Spec + +The spec takes the following fields: + +* `service_account_credential_path` (type: `str`, required): full path to the service account credential file in JSON format. +* `root_folder_ids` (type: `list[str]`, required): a list of Google Drive folder IDs to import files from. +* `binary` (type: `bool`, optional): whether reading files as binary (instead of text). + +### Schema + +The output is a table with the following sub fields: + +* `file_id` (key, type: `str`): the ID of the file in Google Drive. +* `filename` (type: `str`): the filename of the file, without the path, e.g. `"file1.md"` +* `mime_type` (type: `str`): the MIME type of the file. +* `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file.