diff --git a/docs.json b/docs.json index 8a68da04..cbd94292 100644 --- a/docs.json +++ b/docs.json @@ -272,6 +272,7 @@ { "group": "Tool demos", "pages": [ + "examplecode/tools/google-drive-events", "examplecode/tools/jq", "examplecode/tools/firecrawl", "examplecode/tools/langflow", diff --git a/examplecode/tools/google-drive-events.mdx b/examplecode/tools/google-drive-events.mdx new file mode 100644 index 00000000..1fa2495a --- /dev/null +++ b/examplecode/tools/google-drive-events.mdx @@ -0,0 +1,172 @@ +--- +title: Google Drive event triggers +--- + +You can use Google Drive events, such as adding new files to—or updating existing files in—Google Drive shared folders or shared drives, to automatically run Unstructured ETL+ workflows +that rely on those folders or drives as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Google Drive as they are added or updated. + +This example shows how to automate this process by adding a custom [Google Apps Script](https://developers.google.com/apps-script) project in your Google account. This project runs +a script on a regular time interval. This script automatically checks for new or updated files within the specified Google Drive shared folder or shared drive. If the script +detects at least one new or updated file, it then calls the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to automatically run the +specified corresponding Unstructured ETL+ workflow in your Unstructured account. + + + This example uses a custom Google Apps Script that you create and maintain. + Any issues with file detection, timing, or script execution could be related to your custom script, + rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom + script's execution logs first for any informational and error messages. + + +## Requirements + +import GetStartedSimpleApiOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx' + +To use this example, you will need the following: + +- An Unstructured account, and an Unstructured API key for your account, as follows: + + + +- The Unstructured Workflow Endpoint URL for your account, as follows: + + 1. In the Unstructured UI, click **API Keys** on the sidebar.
+ 2. Note the value of the **Unstructured Workflow Endpoint** field. + +- A Google Drive source connector in your Unstructured account. [Learn how](/ui/sources/google-drive). +- Some available [destination connector](/ui/destinations/overview) in your Unstructured account. +- A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows). + +## Step 1: Create the Google Apps Script project + +1. Sign in to your Google account. +2. Go to [http://script.google.com/](http://script.google.com/). +3. Click **+ New project**. +4. Click the new project's default name (such as **Untitled project**), and change it to something more descriptive, such as **Unstructured ETL Scripts**. + +## Step 2: Add the script + +1. With the project still open, on the sidebar, click the **< >** (**Editor**) icon. +2. In the **Files** tab, click **Code.gs**. +3. Replace the contents of the `Code.gs` file with the following code instead: + + ```javascript + function checkForNewOrUpdatedFiles() { + const folder = DriveApp.getFolderById(FOLDER_ID); + const files = folder.getFiles(); + const now = new Date(); + const thresholdMillis = 5 * 60 * 1000; // 5 minutes (adjust as needed). + + while (files.hasNext()) { + const file = files.next(); + const created = file.getDateCreated(); + const lastUpdated = file.getLastUpdated(); + const fileName = file.getName(); + + // Calculate time differences. + const millisSinceCreated = now - created; + const createdWithinThreshold = millisSinceCreated < thresholdMillis; + const millisSinceUpdated = now - lastUpdated; + const updatedWithinThreshold = millisSinceUpdated < thresholdMillis; + + // Log file details and calculations. + console.log('File Name: ' + fileName); + console.log('Created: ' + created); + console.log('Last updated: ' + lastUpdated); + console.log('Milliseconds since created: ' + millisSinceCreated); + console.log('Milliseconds since last updated: ' + millisSinceUpdated); + console.log('Created within threshold of ' + thresholdMillis + ' milliseconds? ' + createdWithinThreshold); + console.log('Updated within threshold of ' + thresholdMillis + ' milliseconds? ' + updatedWithinThreshold); + console.log('-----') + + // If at least one file was created or updated within the last 5 minutes... + if ((createdWithinThreshold) || (updatedWithinThreshold)) { + // ...then make the HTTP POST request. + UrlFetchApp.fetch(UNSTRUCTURED_API_URL, { + method: 'post', + headers: { + 'accept': 'application/json', + 'unstructured-api-key': UNSTRUCTURED_API_KEY + } + }); + // Then stop the script after the first fetch (no need to check any more files). + console.log('At least one file created or updated within threshold of ' + thresholdMillis + ' milliseconds.') + console.log('Unstructured workflow request sent to ' + UNSTRUCTURED_API_URL) + return; + } + } + console.log('No files created or updated within threshold of ' + thresholdMillis + ' milliseconds. No Unstructured workflow request sent.') + } + ``` + +4. Click the **Save project to Drive** button. + +## Step 3: Customize the script for your workflow + +1. With the project still open, on the **Files** tab, click the **Add a file** button, and then click **Script**. +2. Name the new file `Constants`. The `.gs` extension is added automatically. +3. Replace the contents of the `Constants.gs` file with the following code instead: + + ```javascript + const FOLDER_ID = ''; + const UNSTRUCTURED_API_URL = '' + '/workflows//run'; + const UNSTRUCTURED_API_KEY = ''; + ``` + + Replace the following placeholders: + + - Replace `` with the ID of your Google Drive shared folder or shared drive. This is the same ID that you specified + when you created your Google Drive source connector in your Unstructured account. + - Replace `` with your Unstructured API URL value. + - Replace `` with the ID of your Unstructured workflow. + - Replace `` with your Unstructured API key value. + +4. Click the disk (**Save project to Drive**) icon. + +## Step 4: Create the script trigger + +1. With the project still open, on the sidebar, click the alarm clock (**Triggers**) icon. +2. Click the **+ Add Trigger** button. +3. Set the following values: + + - For **Choose which function to run**, select `checkForNewOrUpdatedFiles`. + - For **Choose which deployment should run**, select **Head**. + - For **Select event source**, select **Time-driven**. + - For **Select type of time based trigger**, select **Minutes timer**. + - For **Select minute interval**, select **Every 5 minutes**. + + + If you change **Minutes timer** or **Every 5 minutes** to a different interval, you should also go back and change the number `5` in the following + line of code in the `checkForNewOrUpdatedFiles` function. Change the number `5` to the number of minutes that correspond to the alternate interval you + selected: + + ```javascript + const thresholdMillis = 5 * 60 * 1000; + ``` + + + - For **Failure notification settings**, select an interval such as immediately, hourly, or daily. + +4. Click **Save**. + +## Step 5: View trigger results + +1. With the project still open, on the sidebar, click the three lines (**Executions**) icon. +2. As soon as the first script execution completes, you should see a corresponding message appear in the **Executions** list. If the **Status** column shows + **Completed**, then keep going with this procedure. + + If the **Status** column shows **Failed**, expand the message to + get any details about the failure. Fix the failure, and then wait for the next script execution to complete. + +3. When the **Status** column shows **Completed** then, in your Unstructured account, click **Jobs** on the sidebar to see if a new job + is running for that worklow. + + If no new job is running for that workflow, then add at least one new file to—or update at least one existing file in—the Google Drive shared folder or shared drive, + within 5 minutes of the next script execution. After the next script execution, check the **Jobs** list again. + +## Step 6 (Optional): Delete the trigger + +1. To stop the script from automatically executing on a regular basis, with the project still open, on the sidebar, click the alarm clock (**Triggers**) icon. +2. Rest your mouse pointer on the trigger you created in Step 4. +3. Click the ellipsis (three dots) icon, and then click **Delete trigger**. + +