diff --git a/mint.json b/mint.json index 7dd8714e..e298e4b8 100644 --- a/mint.json +++ b/mint.json @@ -474,6 +474,7 @@ "platform/sources/confluence", "platform/sources/databricks-volumes", "platform/sources/google-cloud", + "platform/sources/google-drive", "platform/sources/mongodb", "platform/sources/onedrive", "platform/sources/outlook", diff --git a/platform/connectors.mdx b/platform/connectors.mdx index 9cba991d..204fa53a 100644 --- a/platform/connectors.mdx +++ b/platform/connectors.mdx @@ -15,6 +15,7 @@ The Unstructured Platform supports connecting to the following source and destin - [Confluence](/platform/sources/confluence) - [Databricks Volumes](/platform/sources/databricks-volumes) - [Google Cloud Storage](/platform/sources/google-cloud) +- [Google Drive](/platform/sources/google-drive) - [MongoDB](/platform/sources/mongodb) - [OneDrive](/platform/sources/onedrive) - [Outlook](/platform/sources/outlook) diff --git a/platform/sources/overview.mdx b/platform/sources/overview.mdx index 5c91b937..e4ee5363 100644 --- a/platform/sources/overview.mdx +++ b/platform/sources/overview.mdx @@ -22,6 +22,7 @@ To create a source connector: - [Confluence](/platform/sources/confluence) - [Databricks Volumes](/platform/sources/databricks-volumes) - [Google Cloud Storage](/platform/sources/google-cloud) + - [Google Drive](/platform/sources/google-drive) - [MongoDB](/platform/sources/mongodb) - [OneDrive](/platform/sources/onedrive) - [Outlook](/platform/sources/outlook) diff --git a/snippets/general-shared-text/google-drive-cli-api.mdx b/snippets/general-shared-text/google-drive-cli-api.mdx index 29cc40be..debed405 100644 --- a/snippets/general-shared-text/google-drive-cli-api.mdx +++ b/snippets/general-shared-text/google-drive-cli-api.mdx @@ -11,5 +11,7 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d The following environment variables: - `GOOGLE_DRIVE_FOLDER_ID` - The folder ID, represented by `--drive-id` (CLI) or `drive_id` (Python). -- `GCP_SERVICE_ACCOUNT_KEY_FILEPATH` - The path to the `credentials.json` key file, represented by `--service-account-key-path` (CLI) or `service_account_key_path` (Python), or -- `GCP_SERVICE_ACCOUNT_KEY_STRING` - The contents of the `credentials.json` key file as a string, represented by `--service-account-key` (CLI) or `service_account_key` (Python). +- One of the following: + + - `GCP_SERVICE_ACCOUNT_KEY_FILEPATH` - The path to the `credentials.json` key file, represented by `--service-account-key-path` (CLI) or `service_account_key_path` (Python). + - `GCP_SERVICE_ACCOUNT_KEY_STRING` - The contents of the `credentials.json` key file as a string, represented by `--service-account-key` (CLI) or `service_account_key` (Python). diff --git a/snippets/general-shared-text/google-drive-platform.mdx b/snippets/general-shared-text/google-drive-platform.mdx index 80695a07..ef0fa524 100644 --- a/snippets/general-shared-text/google-drive-platform.mdx +++ b/snippets/general-shared-text/google-drive-platform.mdx @@ -1,7 +1,8 @@ Fill in the following fields: - **Name** (_required_): A unique name for this connector. -- **Drive ID** (_required_): The folder ID. -- **Service Account Key** (_required_): The `credentials.json` key file's contents in JSON format. -- **Extension**: Any file extensions to be included in the ingestion process (such as `.jpg`, `.docx`, and so on), if filtering is needed. -- **Recursive**: Check this box to recursively access files from subfolders within the drive. \ No newline at end of file +- **Drive ID** (_required_): The target folder's ID. +- **Extensions**: A comma-separated list of any file extensions to be included in the ingestion process (such as `.jpg,.pdf`), if filtering is needed. + The default is to include all files, if not otherwise specified. +- **Recursive**: Check this box to also access files from all subfolders within the folder. +- **Account Key** (_required_): The contents of the `credentials.json` key file for the target service account. These contents must be expressed as a single-line string without line breaks. diff --git a/snippets/general-shared-text/google-drive.mdx b/snippets/general-shared-text/google-drive.mdx index c90b21ee..a45d3fd9 100644 --- a/snippets/general-shared-text/google-drive.mdx +++ b/snippets/general-shared-text/google-drive.mdx @@ -10,9 +10,22 @@ allowfullscreen > -2. Note the local path to the `credentials.json` key file. Or, copy the key file's contents into a compatible string—including properly escaped quotes—as required. - To make converting a JSON object into a compatible string easier, you can search the Internet by using a search phrase such as "tools for converting a JSON object into a string." - Before using any tool, check to make sure that the tool does not share the key file's contents with anyone you do not trust. +2. To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as + a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). + To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. + In this command, replace `` with the path to the `credentials.json` key file that you downloaded by following the preceding instructions. + + - For macOS or Linux: + + ```text + tr -d '\n' < + ``` + + - For Windows: + + ```text + (Get-Content -Path "" -Raw).Replace("`r`n", "").Replace("`n", "") + ``` 3. Give the service account's email address access to the Google Drive folder. [Learn more](https://www.googlecloudcommunity.com/gc/Workspace-Q-A/Can-i-give-access-to-document-of-google-drive-to-service-account/m-p/530106).