From 7dc8352ecaa8580a11f90c057223ae88d0912734 Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Mon, 10 Feb 2025 10:14:25 -0800 Subject: [PATCH 1/5] Platform API operations added to the Unstructured Python SDK --- platform/api/jobs.mdx | 9 +- platform/api/overview.mdx | 700 +++++++++++++++++++++++++++++++++++-- platform/api/workflows.mdx | 134 ++++++- 3 files changed, 794 insertions(+), 49 deletions(-) diff --git a/platform/api/jobs.mdx b/platform/api/jobs.mdx index 0317059a..de382d9b 100644 --- a/platform/api/jobs.mdx +++ b/platform/api/jobs.mdx @@ -4,8 +4,11 @@ title: Jobs To use the [Unstructured Platform API](/platform/api/overview) to manage jobs, do the following: -- To get a list of available jobs, use the `GET` method to call the `/jobs` endpoint. [Learn more](/platform/api/overview#list-jobs). -- To get information about a job, use the `GET` method to call the `/jobs/` endpoint. [Learn more](/platform/api/overview#get-a-job). +- To get a list of available jobs, use the `UnstructuredClient` object's `jobs.list_jobs` function (for the Python SDK) or + the `GET` method to call the `/jobs` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#list-jobs). +- To get information about a job, use the `UnstructuredClient` object's `jobs.get_job` function (for the Python SDK) or + the `GET` method to call the `/jobs/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#get-a-job). - A job is created automatically whenever a workflow runs on a schedule; see [Create a workflow](/platform/api/workflows#create-a-workflow). A job is also created whenever you run a workflow manually; see [Run a workflow](/platform/api/overview#run-a-workflow). -- To cancel a running job, use the `POST` method to call the `/jobs//cancel` endpoint. [Learn more](/platform/api/overview#cancel-a-job). \ No newline at end of file +- To cancel a running job, use the `UnstructuredClient` object's `jobs.cancel_job` function (for the Python SDK) or + the `POST` method to call the `/jobs//cancel` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#cancel-a-job). \ No newline at end of file diff --git a/platform/api/overview.mdx b/platform/api/overview.mdx index a31174b2..7cf9e6ab 100644 --- a/platform/api/overview.mdx +++ b/platform/api/overview.mdx @@ -70,15 +70,32 @@ To use the Unstructured Platform API, you must have: API URL throughout the following examples. -The Unstructured Platform API is offered as a set of Representational State Transfer (REST) endpoints, which you can call through standard REST-enabled +The Unstructured Platform API is offered as follows: + +- As part of the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client), + which you can call through standard Python code. + + To install the Unstructured Python SDK, run the following command from within your Python virtual environment: + + ```bash + pip install unstructured-client + ``` + + If you already have the Unstructured Python SDK installed, upgrade to the latest version by running the following command instead: + + ```bash + pip install --upgrade unstructured-client + ``` + +- As a set of Representational State Transfer (REST) endpoints, which you can call through standard REST-enabled utilities, tools, programming languages, packages, and libraries. The following sections describe how to call the Unstructured Platform API with `curl` and Postman. You can adapt this information as needed for your preferred programming languages and libraries, for example by using the `requests` library with Python. - - You can also use the [Unstructured Platform API - Swagger UI](https://platform.unstructuredapp.io/docs) to call the REST endpoints - that are available through `https://platform.unstructuredapp.io`. - + + You can also use the [Unstructured Platform API - Swagger UI](https://platform.unstructuredapp.io/docs) to call the REST endpoints + that are available through `https://platform.unstructuredapp.io`. + The Unstructured Platform API is separate from [Unstructured Serverless API services](/api-reference/api-services/overview) and @@ -86,7 +103,6 @@ utilities, tools, programming languages, packages, and libraries. The following Because of this separation, the following Unstructured SDKs, tools, and libraries do _not_ work with the Unstructured Platform API: - - The [Unstructured Python SDK](/api-reference/api-services/sdk-python) - The [Unstructured JavaScript/TypeScript SDK](/api-reference/api-services/sdk-jsts) - [Local single-file POST requests](/api-reference/api-services/post-requests) to Unstructured Serverless API services - The [Unstructured open source Python library](/api-reference/api-services/partition-via-api) @@ -117,22 +133,23 @@ For general information about these objects, see: - [Workflows](/platform/workflows) - [Jobs](/platform/jobs) -The following sections provide examples, showing the use of `curl` or Postman, for all of the supported REST endpoints. +The following sections provide examples, showing the use of the Unstructured SDK for Python for all of the supported API operations, +as well as `curl` and Postman for all of the supported REST endpoints. You can also use the [Unstructured Platform API - Swagger UI](https://platform.unstructuredapp.io/docs) to call the REST endpoints that are available through `https://platform.unstructuredapp.io`. -The following `curl` examples use environment variables, which you can set as follows: +The following Unstructured Python SDK and `curl` examples use environment variables, which you can set as follows: ```bash export UNSTRUCTURED_API_URL="https://platform.unstructuredapp.io/api/v1" export UNSTRUCTURED_API_KEY="" ``` -These environment variables enable you to more easily run the following `curl` examples and help prevent you from storing scripts -that contain sensitive URLs and API keys in public source code repositories. +These environment variables enable you to more easily run the following Unstructured Python SDK and `curl` examples and help prevent +you from storing scripts that contain sensitive URLs and API keys in public source code repositories. The following Postman examples use variables, which you can set as follows: @@ -172,14 +189,44 @@ For general information, see [Connectors](/platform/connectors). ### List source connectors -To list source connectors, use the `GET` method to call the `/sources` endpoint. +To list source connectors, use the `UnstructuredClient` object's `sources.list_sources` function (for the Python SDK) or +the `GET` method to call the `/sources` endpoint (for `curl` or Postman). -To filter the list of source connectors, use the query parameter `source_type=`, +To filter the list of source connectors, use the `ListSourcesRequest` object's `source_type` parameter (for the Python SDK) +or the query parameter `source_type=` (for `curl` or Postman), replacing `` with the source connector type's unique ID (for example, `s3` for the Amazon S3 source connector type). To get this ID, see [Sources](/platform/api/sources/overview). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import ListSourcesRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.sources.list_sources( + request=ListSourcesRequest( + source_type="" # Optional, list only for this source type. + ) + ) + + # Print the list in alphabetical order by connector name. + sorted_sources = sorted( + response.response_list_sources, + key=lambda source: source.name.lower() + ) + + for source in sorted_sources: + print(f"{source.name} ({source.id})") + ``` + ```bash curl --request 'GET' --location \ @@ -220,10 +267,38 @@ To get this ID, see [Sources](/platform/api/sources/overview). ### Get a source connector -To get information about a source connector, use the `GET` method to call the `/sources/` endpoint, replacing +To get information about a source connector, use the `UnstructuredClient` object's `sources.get_source` function (for the Python SDK) or +the `GET` method to call the `/sources/` endpoint (for `curl` or Postman), replacing `` with the source connector's unique ID. To get this ID, see [List source connectors](#list-source-connectors). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import GetSourceRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.sources.get_source( + request=GetSourceRequest( + source_id="" + ) + ) + + info = response.source_connector_information + + print(f"name: {info.name}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + + ``` + ```bash curl --request 'GET' --location \ @@ -251,11 +326,52 @@ To get information about a source connector, use the `GET` method to call the `/ ### Create a source connector -To create a source connector, use the `POST` method to call the `/sources` endpoint. In the request body, +To create a source connector, use the `UnstructuredClient` object's `sources.create_source` function (for the Python SDK) or +the `POST` method to call the `/sources` endpoint (for `curl` or Postman). + +In the `CreateSourceConnector` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see [Sources](/platform/api/sources/overview). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import CreateSourceRequest + from unstructured_client.models.shared import CreateSourceConnector + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + source_connector = CreateSourceConnector( + name="", + type="", + config={ + # Specify the settings for the connector here. + } + ) + + response = client.sources.create_source( + request=CreateSourceRequest( + create_source_connector=source_connector + ) + ) + + info = response.source_connector_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + + ``` + ```bash curl --request 'POST' --location \ @@ -290,10 +406,12 @@ specify the settings for the connector. For the specific settings to include, wh ### Update a source connector -To update information about a source connector, use the `PUT` method to call the `/sources/` endpoint, replacing +To update information about a source connector, use the `UnstructuredClient` object's `sources.update_source` function (for the Python SDK) or +the `PUT` method to call the `/sources/` endpoint (for `curl` or Postman), replacing `` with the source connector's unique ID. To get this ID, see [List source connectors](#list-source-connectors). -In the request body, specify the settings for the connector. For the specific settings to include, which differ by connector, see +In the `UpdateSourceConnector` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see [Sources](/platform/api/sources/overview). You must specify all of the settings for the connector, even for settings that are not changing. @@ -301,6 +419,42 @@ You must specify all of the settings for the connector, even for settings that a You can change any of the connector's settings except for its `name` and `type`. + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import UpdateSourceRequest + from unstructured_client.models.shared import UpdateSourceConnector + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + source_connector = UpdateSourceConnector( + config={ + # Specify the settings for the connector here. + } + ) + + response = client.sources.update_source( + request=UpdateSourceRequest( + source_id="", + update_source_connector=source_connector + ) + ) + + info = response.source_connector_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + + ``` + ```bash curl --request 'PUT' --location \ @@ -335,10 +489,32 @@ You can change any of the connector's settings except for its `name` and `type`. ### Delete a source connector -To delete a source connector, use the `DELETE` method to call the `/sources/` endpoint, replacing +To delete a source connector, use the `UnstructuredClient` object's `sources.delete_source` function (for the Python SDK) or +the `DELETE` method to call the `/sources/` endpoint (for `curl` or Postman), replacing `` with the source connector's unique ID. To get this ID, see [List source connectors](#list-source-connectors). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import DeleteSourceRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.sources.delete_source( + request=DeleteSourceRequest( + source_id="" + ) + ) + + print(response.raw_response) + ``` + ```bash curl --request 'DELETE' --location \ @@ -366,14 +542,44 @@ To delete a source connector, use the `DELETE` method to call the `/sources/`, +To filter the list of destination connectors, use the `ListDestinationsRequest` object's `destination_type` parameter (for the Python SDK) or +the query parameter `destination_type=` (for `curl` or Postman), replacing `` with the destination connector type's unique ID (for example, `s3` for the Amazon S3 destination connector type). To get this ID, see [Destinations](/platform/api/destinations/overview). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import ListDestinationsRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.destinations.list_destinations( + request=ListDestinationsRequest( + destination_type="" # Optional, list only for this destination type. + ) + ) + + # Print the list in alphabetical order by connector name. + sorted_destinations = sorted( + response.response_list_destinations, + key=lambda destination: destination.name.lower() + ) + + for destination in sorted_destinations: + print(f"{destination.name} ({destination.id})") + ``` + ```bash curl --request 'GET' --location \ @@ -414,10 +620,37 @@ To get this ID, see [Destinations](/platform/api/destinations/overview). ### Get a destination connector -To get information about a destination connector, use the `GET` method to call the `/destinations/` endpoint, replacing +To get information about a destination connector, use the `UnstructuredClient` object's `destinations.get_destination` function (for the Python SDK) or +the `GET` method to call the `/destinations/` endpoint (for `curl` or Postman), replacing `` with the destination connector's unique ID. To get this ID, see [List destination connectors](#list-destination-connectors). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import GetDestinationRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.destinations.get_destination( + request=GetDestinationRequest( + destination_id="" + ) + ) + + info = response.destination_connector_information + + print(f"name: {info.name}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + ``` + ```bash curl --request 'GET' --location \ @@ -445,11 +678,51 @@ To get information about a destination connector, use the `GET` method to call t ### Create a destination connector -To create a destination connectors, use the `POST` method to call the `/destinations` endpoint. In the request body, +To create a destination connectors, use the `UnstructuredClient` object's `destinations.create_destination` function (for the Python SDK) or +the `POST` method to call the `/destinations` endpoint (for `curl` or Postman). + +In the `CreateDestinationConnector` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see [Destinations](/platform/api/destinations/overview). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import CreateDestinationRequest + from unstructured_client.models.shared import CreateDestinationConnector + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + destination_connector = CreateDestinationConnector( + name="", + type="", + config={ + # Specify the settings for the connector here. + } + ) + + response = client.destinations.create_destination( + request=CreateDestinationRequest( + create_destination_connector=destination_connector + ) + ) + + info = response.destination_connector_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + ``` + ```bash curl --request 'POST' --location \ @@ -484,10 +757,12 @@ specify the settings for the connector. For the specific settings to include, wh ### Update a destination connector -To update information about a destination connector, use the `PUT` method to call the `/destinations/` endpoint, replacing +To update information about a destination connector, use the `UnstructuredClient` object's `destinations.update_destination` function (for the Python SDK) or +the `PUT` method to call the `/destinations/` endpoint (for `curl` or Postman), replacing `` with the destination connector's unique ID. To get this ID, see [List destination connectors](#list-destination-connectors). -In the request body, specify the settings for the connector. For the specific settings to include, which differ by connector, see +In the `UpdateDestinationConnector` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see [Destinations](/platform/api/destinations/overview). You must specify all of the settings for the connector, even for settings that are not changing. @@ -495,6 +770,41 @@ You must specify all of the settings for the connector, even for settings that a You can change any of the connector's settings except for its `name` and `type`. + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import UpdateDestinationRequest + from unstructured_client.models.shared import UpdateDestinationConnector + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + destination_connector = UpdateDestinationConnector( + config={ + # Specify the settings for the connector here. + } + ) + + response = client.destinations.update_destination( + request=UpdateDestinationRequest( + destination_id="", + update_destination_connector=destination_connector + ) + ) + + info = response.destination_connector_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + + for key, value in info.config.items(): + print(f"{key}: {value}") + ``` + ```bash curl --request 'PUT' --location \ @@ -529,10 +839,32 @@ You can change any of the connector's settings except for its `name` and `type`. ### Delete a destination connector -To delete a destination connector, use the `DELETE` method to call the `/destinations/` endpoint, replacing +To delete a destination connector, use the `UnstructuredClient` object's `destinations.delete_destination` function (for the Python SDK) or +the `DELETE` method to call the `/destinations/` endpoint (for `curl` or Postman), replacing `` with the destination connector's unique ID. To get this ID, see [List destination connectors](#list-destination-connectors). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import DeleteDestinationRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.destinations.delete_destination( + request=DeleteDestinationRequest( + destination_id="" + ) + ) + + print(response.raw_response) + ``` + ```bash curl --request 'DELETE' --location \ @@ -571,9 +903,11 @@ For general information, see [Workflows](/platform/workflows). ### List workflows -To list workflows, use the `GET` method to call the `/workflows` endpoint. +To list workflows, use the `UnstructuredClient` object's `workflows.list_workflows` function (for the Python SDK) or +the `GET` method to call the `/workflows` endpoint (for `curl` or Postman). -To filter the list of workflows, use one or more of the following query parameters: +To filter the list of workflows, use one or more of the following `ListWorkflowsRequest` parameters (for the Python SDK) or +query parameters (for `curl` or Postman): - `source_id=`, replacing `` with the source connector's unique ID. To get this ID, see [List source connectors](#list-source-connectors). @@ -584,6 +918,36 @@ To filter the list of workflows, use one or more of the following query paramete You can specify multiple query parameters, for example `?source_id=&status=`. + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import ListWorkflowsRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.workflows.list_workflows( + request=ListWorkflowsRequest( + destination_id="", # Optional, list only for this destination connector ID. + source_id="", # Optional, list only for this source connector ID. + status="" # Optional, list only for this workflow status. + ) + ) + + # Print the list in alphabetical order by workflow name. + sorted_workflows = sorted( + response.response_list_workflows, + key=lambda workflow: workflow.name.lower() + ) + + for workflow in sorted_workflows: + print(f"{workflow.name} ({workflow.id})") + ``` + ```bash curl --request 'GET' --location \ @@ -644,10 +1008,51 @@ You can specify multiple query parameters, for example `?source_id=` endpoint, replacing +To get information about a workflow, use the `UnstructuredClient` object's `workflows.get_workflow` function (for the Python SDK) or +the `GET` method to call the `/workflows/` endpoint (for `curl` or Postman), replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import GetWorkflowRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.workflows.get_workflow( + request=GetWorkflowRequest( + workflow_id="" + ) + ) + + info = response.workflow_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"type: {info.workflow_type}") + print("source(s):") + + for source in info.sources: + print(f" {source}") + + print("destination(s):") + + for destination in info.destinations: + print(f" {destination}") + + print("schedule(s):") + + for crontab_entry in info.schedule.crontab_entries: + print(f" {crontab_entry.cron_expression}") + ``` + ```bash curl --request 'GET' --location \ @@ -675,11 +1080,60 @@ To get information about a workflow, use the `GET` method to call the `/workflow ### Create a workflow -To create a workflow, use the `POST` method to call the `/workflows` endpoint. In the request body, +To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or +the `POST` method to call the `/workflows` endpoint (for `curl` or Postman). + +In the `CreateWorkflow` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the workflow. For the specific settings to include, see [Create a workflow](/platform/api/workflows#create-a-workflow). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import CreateWorkflowRequest + from unstructured_client.models.shared import CreateWorkflow + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + workflow = CreateWorkflow( + # Specify the settings for the workflow here. + ) + + response = client.workflows.create_workflow( + request=CreateWorkflowRequest( + create_workflow=workflow + ) + ) + + info = response.workflow_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"type: {info.workflow_type}") + print("source(s):") + + for source in info.sources: + print(f" {source}") + + print("destination(s):") + + for destination in info.destinations: + print(f" {destination}") + + print("schedule(s):") + + for crontab_entry in info.schedule.crontab_entries: + print(f" {crontab_entry.cron_expression}") + ``` + ```bash curl --request 'POST' --location \ @@ -714,10 +1168,32 @@ specify the settings for the workflow. For the specific settings to include, see ### Run a workflow -To run a workflow manually, use the `POST` method to call the `/workflows//run` endpoint, replacing +To run a workflow manually, use the `UnstructuredClient` object's `workflows.run_workflow` function (for the Python SDK) or +the `POST` method to call the `/workflows//run` endpoint (for `curl` or Postman), replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import RunWorkflowRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.workflows.run_workflow( + request=RunWorkflowRequest( + workflow_id="" + ) + ) + + print(response.raw_response) + ``` + ```bash curl --request 'POST' --location \ @@ -748,13 +1224,61 @@ workflow. See [Create a workflow](/platform/api/workflows#create-a-workflow) or ### Update a workflow -To update information about a workflow, use the `PUT` method to call the `/workflows/` endpoint, replacing +To update information about a workflow, use the `UnstructuredClient` object's `workflows.update_workflow` function (for the Python SDK) or +the `PUT` method to call the `/workflows/` endpoint (for `curl` or Postman), replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). -In the request body, specify the settings for the workflow. For the specific settings to include, see +In `UpdateWorkflow` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the workflow. For the specific settings to include, see [Update a workflow](/platform/api/workflows#update-a-workflow). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import UpdateWorkflowRequest + from unstructured_client.models.shared import UpdateWorkflow + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + workflow = UpdateWorkflow( + # Specify the settings for the workflow here. + ) + + response = client.workflows.update_workflow( + request=UpdateWorkflowRequest( + workflow_id="", + update_workflow=workflow + ) + ) + + info = response.workflow_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"type: {info.workflow_type}") + print("source(s):") + + for source in info.sources: + print(f" {source}") + + print("destination(s):") + + for destination in info.destinations: + print(f" {destination}") + + print("schedule(s):") + + for crontab_entry in info.schedule.crontab_entries: + print(f" {crontab_entry.cron_expression}") + ``` + ```bash curl --request 'PUT' --location \ @@ -789,10 +1313,32 @@ In the request body, specify the settings for the workflow. For the specific set ### Delete a workflow -To delete a workflow, use the `DELETE` method to call the `/workflows/` endpoint, replacing +To delete a workflow, use the `UnstructuredClient` object's `workflows.delete_workflow` function (for the Python SDK) or +the `DELETE` method to call the `/workflows/` endpoint (for `curl` or Postman), replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import DeleteWorkflowRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.workflows.delete_workflow( + request=DeleteWorkflowRequest( + workflow_id="" + ) + ) + + print(response.raw_response) + ``` + ```bash curl --request 'DELETE' --location \ @@ -831,17 +1377,48 @@ For general information, see [Jobs](/platform/jobs). ### List jobs -To list jobs, use the `GET` method to call the `/jobs` endpoint. +To list jobs, use the `UnstructuredClient` object's `jobs.list_jobs` function (for the Python SDK) or +the `GET` method to call the `/jobs` endpoint (for `curl` or Postman). -To filter the list of jobs, use one or both of the following query parameters: +To filter the list of jobs, use one or both of the following `ListJobsRequest` parameters (for the Python SDK) or +query parameters (for `curl` or Postman): - `workflow_id=`, replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). - `status=`, replacing `` with one of the following job statuses: `failed`, `finished`, or `running`. -You can specify multiple query parameters, for example `?workflow_id=&status=`. +For `curl` or Postman, you can specify multiple query parameters as `?workflow_id=&status=`. + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import ListJobsRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.jobs.list_jobs( + request=ListJobsRequest( + workflow_id="", # Optional, list only for this workflow ID. + status="", # Optional, list only for this job status. + ) + ) + + # Print the list in alphabetical order by workflow name. + sorted_jobs = sorted( + response.response_list_jobs, + key=lambda job: job.workflow_name.lower() + ) + + for job in sorted_jobs: + print(f"{job.id} (workflow name: {job.workflow_name}, id: {job.workflow_id})") + ``` + ```bash curl --request 'GET' --location \ @@ -892,10 +1469,37 @@ You can specify multiple query parameters, for example `?workflow_id=` endpoint, replacing +To get information about a job, use the `UnstructuredClient` object's `jobs.get_job` function (for the Python SDK) or +the `GET` method to call the `/jobs/` endpoint (for `curl` or Postman), replacing `` with the job's unique ID. To get this ID, see [List jobs](#list-jobs). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import GetJobRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.jobs.get_job( + request=GetJobRequest( + job_id="" + ) + ) + + info = response.job_information + + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"workflow name: {info.workflow_name}") + print(f"workflow id: {info.workflow_id}") + ``` + ```bash curl --request 'GET' --location \ @@ -923,10 +1527,32 @@ To get information about a job, use the `GET` method to call the `/jobs/ ### Cancel a job -To cancel a running job, use the `POST` method to call the `/jobs//cancel` endpoint, replacing +To cancel a running job, use the `UnstructuredClient` object's `jobs.cancel_job` function (for the Python SDK) or +the `POST` method to call the `/jobs//cancel` endpoint (for `curl` or Postman), replacing `` with the job's unique ID. To get this ID, see [List jobs](#list-jobs). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import CancelJobRequest + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + response = client.jobs.cancel_job( + request=CancelJobRequest( + job_id="" + ) + ) + + print(response.raw_response) + ``` + ```bash curl curl --request 'POST' --location \ diff --git a/platform/api/workflows.mdx b/platform/api/workflows.mdx index 26298e11..7d345de3 100644 --- a/platform/api/workflows.mdx +++ b/platform/api/workflows.mdx @@ -4,22 +4,85 @@ title: Workflows To use the [Unstructured Platform API](/platform/api/overview) to manage workflows, do the following: -- To get a list of available workflows, use the `GET` method to call the `/workflows` endpoint. [Learn more](/platform/api/overview#list-workflows). -- To get information about a workflow, use the `GET` method to call the `/workflows/` endpoint. [Learn more](/platform/api/overview#get-a-workflow). -- To create a workflow, use the `POST` method to call the `/workflows` endpoint. [Learn more](#create-a-workflow). -- To run a workflow manually, use the `POST` method to call the `/workflows//run` endpoint. [Learn more](/platform/api/overview#run-a-workflow). -- To update a workflow, use the `PUT` method to call the `/workflows/` endpoint. [Learn more](#update-a-workflow). -- To delete a workflow, use the `DELETE` method to call the `/workflows/` endpoint. [Learn more](/platform/api/overview#delete-a-workflow). +- To get a list of available workflows, use the `UnstructuredClient` object's `workflows.list_workflows` function (for the Python SDK) or + the `GET` method to call the `/workflows` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#list-workflows). +- To get information about a workflow, use the `UnstructuredClient` object's `workflows.get_workflow` function (for the Python SDK) or + the `GET` method to call the `/workflows/` endpoint (for `curl` or Postman)use the `GET` method to call the `/workflows/` endpoint. [Learn more](/platform/api/overview#get-a-workflow). +- To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or + the `POST` method to call the `/workflows` endpoint (for `curl` or Postman). [Learn more](#create-a-workflow). +- To run a workflow manually, use the `UnstructuredClient` object's `workflows.run_workflow` function (for the Python SDK) or + the `POST` method to call the `/workflows//run` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#run-a-workflow). +- To update a workflow, use the `UnstructuredClient` object's `workflows.update_workflow` function (for the Python SDK) or + the `PUT` method to call the `/workflows/` endpoint (for `curl` or Postman). [Learn more](#update-a-workflow). +- To delete a workflow, use the `UnstructuredClient` object's `workflows.delete_workflow` function (for the Python SDK) or + the `DELETE` method to call the `/workflows/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#delete-a-workflow). The following examples assume that you have already met the [requirements](/platform/api/overview#requirements) and understand the [basics](/platform/api/overview#basics) of working with the Unstructured Platform API. ## Create a workflow -To create a workflow, use the `POST` method to call the `/workflows` endpoint. In the request body, +To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or +the `POST` method to call the `/workflows` endpoint (for `curl` or Postman). + +In the `CreateWorkflow` object (for the Python SDK) or +the request body (for `curl` or Postman), specify the settings for the workflow, as follows: + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import CreateWorkflowRequest + from unstructured_client.models.shared import ( + CreateWorkflow, + WorkflowAutoStrategy, + CreateWorkflowSchedule + ) + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + workflow = CreateWorkflow( + name="", + source_id="", + destination_id="", + workflow_type=WorkflowAutoStrategy., + schedule=CreateWorkflowSchedule(value="") + ) + + response = client.workflows.create_workflow( + request=CreateWorkflowRequest( + create_workflow=workflow + ) + ) + + info = response.workflow_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"type: {info.workflow_type}") + print("source(s):") + + for source in info.sources: + print(f" {source}") + + print("destination(s):") + + for destination in info.destinations: + print(f" {destination}") + + print("schedule(s):") + + for crontab_entry in info.schedule.crontab_entries: + print(f" {crontab_entry.cron_expression}") + ``` + ```bash curl --request 'POST' --location \ @@ -72,7 +135,9 @@ Replace the preceding placeholders as follows: use the `GET` method to call the `/sources` endpoint. [Learn more](/platform/api/overview#list-source-connectors). - `` (_required_) - The ID of the target destination connector. To get the ID, use the `GET` method to call the `/destinations` endpoint. [Learn more](/platform/api/overview#list-destination-connectors). -- `` (_required_) - The workflow optimization type. Available values include `advanced`, `basic`, and `platinum`. +- `` (_required_) - The workflow optimization type. Available values include + `ADVANCED`, `BASIC`, and `PLATINUM` (for the Python SDK) or + `advanced`, `basic`, and `platinum` (for `curl` or Postman). - `` - The repeating automatic run schedule, specified as a predefined phrase. The available predefined phrases are: - `every 15 minutes`: Every 15 minutes (cron expression: `*/15 * * * *`). @@ -91,13 +156,64 @@ Replace the preceding placeholders as follows: ## Update a workflow -To update information about a workflow, use the `PUT` method to call the `/workflows/` endpoint, replacing +To update information about a workflow, use the `UnstructuredClient` object's `workflows.update_workflow` function (for the Python SDK) or +the `PUT` method to call the `/workflows/` endpoint (for `curl` or Postman), replacing `` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows). In the request body, specify the settings for the workflow. For the specific settings to include, see [Create a workflow](/platform/api/workflows#create-a-workflow). + + ```python + import os + + from unstructured_client import UnstructuredClient + from unstructured_client.models.operations import UpdateWorkflowRequest + from unstructured_client.models.shared import ( + UpdateWorkflow, + WorkflowAutoStrategy, + CreateWorkflowSchedule + ) + + client = UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) + + workflow = UpdateWorkflow( + # Specify the settings for the workflow here. + ) + + response = client.workflows.update_workflow( + request=UpdateWorkflowRequest( + workflow_id="", + update_workflow=workflow + ) + ) + + info = response.workflow_information + + print(f"name: {info.name}") + print(f"id: {info.id}") + print(f"status: {info.status}") + print(f"type: {info.workflow_type}") + print("source(s):") + + for source in info.sources: + print(f" {source}") + + print("destination(s):") + + for destination in info.destinations: + print(f" {destination}") + + print("schedule(s):") + + for crontab_entry in info.schedule.crontab_entries: + print(f" {crontab_entry.cron_expression}") + ``` + ```bash curl --request 'PUT' --location \ From fb27c7bd0782ac4cef71da5b6bd33196b248ed8b Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Mon, 10 Feb 2025 11:57:03 -0800 Subject: [PATCH 2/5] Add source and destination connector Python SDK code examples for Platform API --- platform/api/destinations/astradb.mdx | 2 + platform/api/destinations/azure-ai-search.mdx | 2 + platform/api/destinations/couchbase.mdx | 2 + .../destinations/databricks-delta-table.mdx | 2 + .../api/destinations/databricks-volumes.mdx | 2 + platform/api/destinations/delta-table.mdx | 2 + platform/api/destinations/elasticsearch.mdx | 2 + platform/api/destinations/google-cloud.mdx | 2 + platform/api/destinations/kafka.mdx | 2 + platform/api/destinations/milvus.mdx | 2 + platform/api/destinations/mongodb.mdx | 2 + platform/api/destinations/neo4j.mdx | 2 + platform/api/destinations/onedrive.mdx | 2 + platform/api/destinations/overview.mdx | 53 ++++++++++--------- platform/api/destinations/pinecone.mdx | 2 + platform/api/destinations/postgresql.mdx | 2 + platform/api/destinations/qdrant.mdx | 2 + platform/api/destinations/s3.mdx | 2 + platform/api/destinations/weaviate.mdx | 2 + platform/api/overview.mdx | 4 +- platform/api/sources/azure-blob-storage.mdx | 2 + platform/api/sources/confluence.mdx | 2 + platform/api/sources/couchbase.mdx | 2 + platform/api/sources/databricks-volumes.mdx | 2 + platform/api/sources/dropbox.mdx | 2 + platform/api/sources/elasticsearch.mdx | 2 + platform/api/sources/google-cloud.mdx | 2 + platform/api/sources/google-drive.mdx | 2 + platform/api/sources/kafka.mdx | 2 + platform/api/sources/mongodb.mdx | 2 + platform/api/sources/onedrive.mdx | 2 + platform/api/sources/outlook.mdx | 2 + platform/api/sources/overview.mdx | 49 +++++++++-------- platform/api/sources/postgresql.mdx | 2 + platform/api/sources/s3.mdx | 2 + platform/api/sources/salesforce.mdx | 2 + platform/api/sources/sharepoint.mdx | 2 + .../destination_connectors/astradb_sdk.mdx | 17 ++++++ .../azure_ai_search_sdk.mdx | 15 ++++++ .../destination_connectors/couchbase_sdk.mdx | 20 +++++++ .../databricks_delta_table_sdk.mdx | 23 ++++++++ .../databricks_volumes_sdk.mdx | 19 +++++++ .../delta_table_sdk.mdx | 16 ++++++ .../elasticsearch_sdk.mdx | 15 ++++++ snippets/destination_connectors/gcs_sdk.mdx | 14 +++++ snippets/destination_connectors/kafka_sdk.mdx | 19 +++++++ .../destination_connectors/milvus_sdk.mdx | 17 ++++++ .../destination_connectors/mongodb_sdk.mdx | 15 ++++++ snippets/destination_connectors/neo4j_sdk.mdx | 17 ++++++ .../destination_connectors/onedrive_sdk.mdx | 18 +++++++ .../destination_connectors/pinecone_sdk.mdx | 15 ++++++ .../destination_connectors/postgresql_sdk.mdx | 19 +++++++ .../destination_connectors/qdrant_sdk.mdx | 16 ++++++ snippets/destination_connectors/s3_sdk.mdx | 21 ++++++++ .../destination_connectors/weaviate_sdk.mdx | 15 ++++++ snippets/source_connectors/azure_sdk.mdx | 27 ++++++++++ snippets/source_connectors/confluence_sdk.mdx | 32 +++++++++++ snippets/source_connectors/couchbase_sdk.mdx | 19 +++++++ .../databricks_volumes_sdk.mdx | 18 +++++++ snippets/source_connectors/dropbox_sdk.mdx | 14 +++++ .../source_connectors/elasticsearch_sdk.mdx | 14 +++++ snippets/source_connectors/gcs_sdk.mdx | 14 +++++ .../source_connectors/google_drive_sdk.mdx | 18 +++++++ snippets/source_connectors/kafka_sdk.mdx | 18 +++++++ snippets/source_connectors/mongodb_sdk.mdx | 14 +++++ snippets/source_connectors/onedrive_sdk.mdx | 18 +++++++ snippets/source_connectors/outlook_sdk.mdx | 18 +++++++ snippets/source_connectors/postgresql_sdk.mdx | 23 ++++++++ snippets/source_connectors/s3_sdk.mdx | 24 +++++++++ snippets/source_connectors/salesforce_sdk.mdx | 18 +++++++ snippets/source_connectors/sharepoint_sdk.mdx | 19 +++++++ 71 files changed, 745 insertions(+), 48 deletions(-) create mode 100644 snippets/destination_connectors/astradb_sdk.mdx create mode 100644 snippets/destination_connectors/azure_ai_search_sdk.mdx create mode 100644 snippets/destination_connectors/couchbase_sdk.mdx create mode 100644 snippets/destination_connectors/databricks_delta_table_sdk.mdx create mode 100644 snippets/destination_connectors/databricks_volumes_sdk.mdx create mode 100644 snippets/destination_connectors/delta_table_sdk.mdx create mode 100644 snippets/destination_connectors/elasticsearch_sdk.mdx create mode 100644 snippets/destination_connectors/gcs_sdk.mdx create mode 100644 snippets/destination_connectors/kafka_sdk.mdx create mode 100644 snippets/destination_connectors/milvus_sdk.mdx create mode 100644 snippets/destination_connectors/mongodb_sdk.mdx create mode 100644 snippets/destination_connectors/neo4j_sdk.mdx create mode 100644 snippets/destination_connectors/onedrive_sdk.mdx create mode 100644 snippets/destination_connectors/pinecone_sdk.mdx create mode 100644 snippets/destination_connectors/postgresql_sdk.mdx create mode 100644 snippets/destination_connectors/qdrant_sdk.mdx create mode 100644 snippets/destination_connectors/s3_sdk.mdx create mode 100644 snippets/destination_connectors/weaviate_sdk.mdx create mode 100644 snippets/source_connectors/azure_sdk.mdx create mode 100644 snippets/source_connectors/confluence_sdk.mdx create mode 100644 snippets/source_connectors/couchbase_sdk.mdx create mode 100644 snippets/source_connectors/databricks_volumes_sdk.mdx create mode 100644 snippets/source_connectors/dropbox_sdk.mdx create mode 100644 snippets/source_connectors/elasticsearch_sdk.mdx create mode 100644 snippets/source_connectors/gcs_sdk.mdx create mode 100644 snippets/source_connectors/google_drive_sdk.mdx create mode 100644 snippets/source_connectors/kafka_sdk.mdx create mode 100644 snippets/source_connectors/mongodb_sdk.mdx create mode 100644 snippets/source_connectors/onedrive_sdk.mdx create mode 100644 snippets/source_connectors/outlook_sdk.mdx create mode 100644 snippets/source_connectors/postgresql_sdk.mdx create mode 100644 snippets/source_connectors/s3_sdk.mdx create mode 100644 snippets/source_connectors/salesforce_sdk.mdx create mode 100644 snippets/source_connectors/sharepoint_sdk.mdx diff --git a/platform/api/destinations/astradb.mdx b/platform/api/destinations/astradb.mdx index 649976ff..eeca025c 100644 --- a/platform/api/destinations/astradb.mdx +++ b/platform/api/destinations/astradb.mdx @@ -12,10 +12,12 @@ import AstraDBPrerequisites from '/snippets/general-shared-text/astradb.mdx'; To create or change an Astra DB destination connector, see the following examples. +import AstraDBSDK from '/snippets/destination_connectors/astradb_sdk.mdx'; import AstraDBAPIRESTCreate from '/snippets/destination_connectors/astradb_rest_create.mdx'; import AstraDBAPIRESTChange from '/snippets/destination_connectors/astradb_rest_change.mdx'; + diff --git a/platform/api/destinations/azure-ai-search.mdx b/platform/api/destinations/azure-ai-search.mdx index 449ac96c..dcda3c6c 100644 --- a/platform/api/destinations/azure-ai-search.mdx +++ b/platform/api/destinations/azure-ai-search.mdx @@ -12,10 +12,12 @@ import AzureAIPrerequisites from '/snippets/general-shared-text/azure-ai-search. To create or change an Azure AI Search destination connector, see the following examples. +import AzureAISDK from '/snippets/destination_connectors/azure_ai_search_sdk.mdx'; import AzureAIAPIRESTCreate from '/snippets/destination_connectors/azure_ai_search_rest_create.mdx'; import AzureAIAPIRESTChange from '/snippets/destination_connectors/azure_ai_search_rest_change.mdx'; + diff --git a/platform/api/destinations/couchbase.mdx b/platform/api/destinations/couchbase.mdx index 337c4bbc..654910a9 100644 --- a/platform/api/destinations/couchbase.mdx +++ b/platform/api/destinations/couchbase.mdx @@ -12,10 +12,12 @@ import CouchbasePrerequisites from '/snippets/general-shared-text/couchbase.mdx' To create or change a Couchbase destination connector, see the following examples. +import CouchbaseSDK from '/snippets/destination_connectors/couchbase_sdk.mdx'; import CouchbaseAPIRESTCreate from '/snippets/destination_connectors/couchbase_rest_create.mdx'; import CouchbaseAPIRESTChange from '/snippets/destination_connectors/couchbase_rest_change.mdx'; + diff --git a/platform/api/destinations/databricks-delta-table.mdx b/platform/api/destinations/databricks-delta-table.mdx index ae8370c7..4f8d45f6 100644 --- a/platform/api/destinations/databricks-delta-table.mdx +++ b/platform/api/destinations/databricks-delta-table.mdx @@ -22,10 +22,12 @@ import DeltaTablesInDatabricksPrerequisites from '/snippets/general-shared-text/ To create or change a Delta Tables in Databricks destination connector, see the following examples. +import DeltaTablesInDatabricksSDK from '/snippets/destination_connectors/databricks_delta_table_sdk.mdx'; import DeltaTablesInDatabricksAPIRESTCreate from '/snippets/destination_connectors/databricks_delta_table_rest_create.mdx'; import DeltaTablesInDatabricksAPIRESTChange from '/snippets/destination_connectors/databricks_delta_table_rest_change.mdx'; + diff --git a/platform/api/destinations/databricks-volumes.mdx b/platform/api/destinations/databricks-volumes.mdx index c06ee3a6..947b33ab 100644 --- a/platform/api/destinations/databricks-volumes.mdx +++ b/platform/api/destinations/databricks-volumes.mdx @@ -19,10 +19,12 @@ import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databr To create or change a Databricks Volumes destination connector, see the following examples. +import DatabricksVolumesSDK from '/snippets/destination_connectors/databricks_volumes_sdk.mdx'; import DatabricksVolumesAPIRESTCreate from '/snippets/destination_connectors/databricks_volumes_rest_create.mdx'; import DatabricksVolumesAPIRESTChange from '/snippets/destination_connectors/databricks_volumes_rest_change.mdx'; + diff --git a/platform/api/destinations/delta-table.mdx b/platform/api/destinations/delta-table.mdx index becbab99..00cb3045 100644 --- a/platform/api/destinations/delta-table.mdx +++ b/platform/api/destinations/delta-table.mdx @@ -18,10 +18,12 @@ import DeltaTablePrerequisites from '/snippets/general-shared-text/delta-table.m To create or change a Delta Tables in Amazon S3 destination connector, see the following examples. +import DeltaTableSDK from '/snippets/destination_connectors/delta_table_sdk.mdx'; import DeltaTableAPIRESTCreate from '/snippets/destination_connectors/delta_table_rest_create.mdx'; import DeltaTableAPIRESTChange from '/snippets/destination_connectors/delta_table_rest_change.mdx'; + diff --git a/platform/api/destinations/elasticsearch.mdx b/platform/api/destinations/elasticsearch.mdx index 596edc9b..2a63fe8a 100644 --- a/platform/api/destinations/elasticsearch.mdx +++ b/platform/api/destinations/elasticsearch.mdx @@ -12,10 +12,12 @@ import ElasticsearchPrerequisites from '/snippets/general-shared-text/elasticsea To create or change an Elasticsearch destination connector, see the following examples. +import ElasticsearchSDK from '/snippets/destination_connectors/elasticsearch_sdk.mdx'; import ElasticsearchAPIRESTCreate from '/snippets/destination_connectors/elasticsearch_rest_create.mdx'; import ElasticsearchAPIRESTChange from '/snippets/destination_connectors/elasticsearch_rest_change.mdx'; + diff --git a/platform/api/destinations/google-cloud.mdx b/platform/api/destinations/google-cloud.mdx index 4bf1bc95..0473ae9e 100644 --- a/platform/api/destinations/google-cloud.mdx +++ b/platform/api/destinations/google-cloud.mdx @@ -12,10 +12,12 @@ import GCSPrerequisites from '/snippets/general-shared-text/gcs.mdx'; To create or change a Google Cloud Storage destination connector, see the following examples. +import GCSSDK from '/snippets/destination_connectors/gcs_sdk.mdx'; import GCSAPIRESTCreate from '/snippets/destination_connectors/gcs_rest_create.mdx'; import GCSAPIRESTChange from '/snippets/destination_connectors/gcs_rest_change.mdx'; + diff --git a/platform/api/destinations/kafka.mdx b/platform/api/destinations/kafka.mdx index 31de8e1c..8f1c0ee6 100644 --- a/platform/api/destinations/kafka.mdx +++ b/platform/api/destinations/kafka.mdx @@ -12,10 +12,12 @@ import KafkaPrerequisites from '/snippets/general-shared-text/kafka.mdx'; To create or change a Kafka destination connector, see the following examples. +import KafkaSDK from '/snippets/destination_connectors/kafka_sdk.mdx'; import KafkaAPIRESTCreate from '/snippets/destination_connectors/kafka_rest_create.mdx'; import KafkaAPIRESTChange from '/snippets/destination_connectors/kafka_rest_change.mdx'; + diff --git a/platform/api/destinations/milvus.mdx b/platform/api/destinations/milvus.mdx index d2d4c9ce..d524fb98 100644 --- a/platform/api/destinations/milvus.mdx +++ b/platform/api/destinations/milvus.mdx @@ -12,10 +12,12 @@ import MilvusPrerequisites from '/snippets/general-shared-text/milvus.mdx'; To create or change a Milvus destination connector, see the following examples. +import MilvusSDK from '/snippets/destination_connectors/milvus_sdk.mdx'; import MilvusAPIRESTCreate from '/snippets/destination_connectors/milvus_rest_create.mdx'; import MilvusAPIRESTChange from '/snippets/destination_connectors/milvus_rest_change.mdx'; + diff --git a/platform/api/destinations/mongodb.mdx b/platform/api/destinations/mongodb.mdx index fa4f4faf..bbccd6cc 100644 --- a/platform/api/destinations/mongodb.mdx +++ b/platform/api/destinations/mongodb.mdx @@ -12,10 +12,12 @@ import MongoDBPrerequisites from '/snippets/general-shared-text/mongodb.mdx'; To create or change a MongoDB destination connector, see the following examples. +import MongoDBSDK from '/snippets/destination_connectors/mongodb_sdk.mdx'; import MongoDBAPIRESTCreate from '/snippets/destination_connectors/mongodb_rest_create.mdx'; import MongoDBAPIRESTChange from '/snippets/destination_connectors/mongodb_rest_change.mdx'; + diff --git a/platform/api/destinations/neo4j.mdx b/platform/api/destinations/neo4j.mdx index fd0a42cb..59d0851a 100644 --- a/platform/api/destinations/neo4j.mdx +++ b/platform/api/destinations/neo4j.mdx @@ -18,10 +18,12 @@ import Neo4jGraphFormat from '/snippets/general-shared-text/neo4j-graph.mdx'; To create or change a Neo4j destination connector, see the following examples. +import Neo4jSDK from '/snippets/destination_connectors/neo4j_sdk.mdx'; import Neo4jAPIRESTCreate from '/snippets/destination_connectors/neo4j_rest_create.mdx'; import Neo4jAPIRESTChange from '/snippets/destination_connectors/neo4j_rest_change.mdx'; + diff --git a/platform/api/destinations/onedrive.mdx b/platform/api/destinations/onedrive.mdx index 092c62df..c5e70568 100644 --- a/platform/api/destinations/onedrive.mdx +++ b/platform/api/destinations/onedrive.mdx @@ -12,10 +12,12 @@ import OneDrivePrerequisites from '/snippets/general-shared-text/onedrive.mdx'; To create or change a OneDrive destination connector, see the following examples. +import OneDriveSDK from '/snippets/destination_connectors/onedrive_sdk.mdx'; import OneDriveAPIRESTCreate from '/snippets/destination_connectors/onedrive_rest_create.mdx'; import OneDriveAPIRESTChange from '/snippets/destination_connectors/onedrive_rest_change.mdx'; + diff --git a/platform/api/destinations/overview.mdx b/platform/api/destinations/overview.mdx index 09029556..959929f1 100644 --- a/platform/api/destinations/overview.mdx +++ b/platform/api/destinations/overview.mdx @@ -4,31 +4,36 @@ title: Overview To use the [Unstructured Platform API](/platform/api/overview) to manage destination connectors, do the following: -- To get a list of available destination connectors, use the `GET` method to call the `/destinations` endpoint. [Learn more](/platform/api/overview#list-destination-connectors). -- To get information about a destination connector, use the `GET` method to call the `/destinations/` endpoint. [Learn more](/platform/api/overview#get-a-destination-connector). -- To create a destination connector, use the `POST` method to call the `/destinations` endpoint. [Learn more](/platform/api/overview#create-a-destination-connector). -- To update a destination connector, use the `PUT` method to call the `/destinations/` endpoint. [Learn more](/platform/api/overview#update-a-destination-connector). -- To delete a destination connector, use the `DELETE` method to call the `/destinations/` endpoint. [Learn more](/platform/api/overview#delete-a-destination-connector). +- To get a list of available destination connectors, use the `UnstructuredClient` object's `destinations.list_destinations` function (for the Python SDK) or + the `GET` method to call the `/destinations` endpoint (for `curl` or Postman).. [Learn more](/platform/api/overview#list-destination-connectors). +- To get information about a destination connector, use the `UnstructuredClient` object's `destinations.get_destination` function (for the Python SDK) or + the `GET` method to call the `/destinations/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#get-a-destination-connector). +- To create a destination connector, use the `UnstructuredClient` object's `destinations.create_destination` function (for the Python SDK) or + the `POST` method to call the `/destinations` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#create-a-destination-connector). +- To update a destination connector, use the `UnstructuredClient` object's `destinations.update_destination` function (for the Python SDK) or + the `PUT` method to call the `/destinations/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#update-a-destination-connector). +- To delete a destination connector, use the `UnstructuredClient` object's `destinations.delete_destination` function (for the Python SDK) or + the `DELETE` method to call the `/destinations/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#delete-a-destination-connector). -To create a destination connector, you must also provide a request body that contains settings that are specific to that connector. +To create or update a destination connector, you must also provide settings that are specific to that connector. For the list of specific settings, see: -- [Astra DB](/platform/api/destinations/astradb) (`destination_type=astradb`) -- [Azure AI Search](/platform/api/destinations/azure-ai-search) (`destination_type=azure_ai_search`) -- [Couchbase](/platform/api/destinations/couchbase) (`destination_type=couchbase`) -- [Databricks Volumes](/platform/api/destinations/databricks-volumes) (`destination_type=databricks_volumes`) -- [Delta Tables in Amazon S3](/platform/api/destinations/delta-table) (`destination_type=delta_table`) -- [Delta Tables in Databricks](/platform/api/destinations/databricks-delta-table) (`destination_type=databricks_volume_delta_tables`) -- [Elasticsearch](/platform/api/destinations/elasticsearch) (`destination_type=elasticsearch`) -- [Google Cloud Storage](/platform/api/destinations/google-cloud) (`destination_type=gcs`) -- [Kafka](/platform/api/destinations/kafka) (`destination_type=kafka-cloud`) -- [Milvus](/platform/api/destinations/milvus) (`destination_type=milvus`) -- [MongoDB](/platform/api/destinations/mongodb) (`destination_type=mongodb`) -- [Neo4j](/platform/api/destinations/neo4j) (`destination_type=neo4j`) -- [OneDrive](/platform/api/destinations/onedrive) (`destination_type=onedrive`) -- [Pinecone](/platform/api/destinations/pinecone) (`destination_type=pinecone`) -- [PostgreSQL](/platform/api/destinations/postgresql) (`destination_type=postgres`) -- [Qdrant](/platform/api/destinations/qdrant) (`destination_type=qdrant-cloud`) -- [S3](/platform/api/destinations/s3) (`destination_type=s3`) -- [Weaviate](/platform/api/destinations/weaviate) (`destination_type=weaviate`) +- [Astra DB](/platform/api/destinations/astradb) (`astradb`) +- [Azure AI Search](/platform/api/destinations/azure-ai-search) (`azure_ai_search`) +- [Couchbase](/platform/api/destinations/couchbase) (`couchbase`) +- [Databricks Volumes](/platform/api/destinations/databricks-volumes) (`databricks_volumes`) +- [Delta Tables in Amazon S3](/platform/api/destinations/delta-table) (`delta_table`) +- [Delta Tables in Databricks](/platform/api/destinations/databricks-delta-table) (`databricks_volume_delta_tables`) +- [Elasticsearch](/platform/api/destinations/elasticsearch) (`elasticsearch`) +- [Google Cloud Storage](/platform/api/destinations/google-cloud) (`gcs`) +- [Kafka](/platform/api/destinations/kafka) (`kafka-cloud`) +- [Milvus](/platform/api/destinations/milvus) (`milvus`) +- [MongoDB](/platform/api/destinations/mongodb) (`mongodb`) +- [Neo4j](/platform/api/destinations/neo4j) (`neo4j`) +- [OneDrive](/platform/api/destinations/onedrive) (`onedrive`) +- [Pinecone](/platform/api/destinations/pinecone) (`pinecone`) +- [PostgreSQL](/platform/api/destinations/postgresql) (`postgres`) +- [Qdrant](/platform/api/destinations/qdrant) (`qdrant-cloud`) +- [S3](/platform/api/destinations/s3) (`s3`) +- [Weaviate](/platform/api/destinations/weaviate) (`weaviate`) diff --git a/platform/api/destinations/pinecone.mdx b/platform/api/destinations/pinecone.mdx index bfd2041c..9db36369 100644 --- a/platform/api/destinations/pinecone.mdx +++ b/platform/api/destinations/pinecone.mdx @@ -12,10 +12,12 @@ import PineconePrerequisites from '/snippets/general-shared-text/pinecone.mdx'; To create or change a Pinecone destination connector, see the following examples. +import PineconeSDK from '/snippets/destination_connectors/pinecone_sdk.mdx'; import PineconeAPIRESTCreate from '/snippets/destination_connectors/pinecone_rest_create.mdx'; import PineconeAPIRESTChange from '/snippets/destination_connectors/pinecone_rest_change.mdx'; + diff --git a/platform/api/destinations/postgresql.mdx b/platform/api/destinations/postgresql.mdx index 42cfb8fe..3bf68c10 100644 --- a/platform/api/destinations/postgresql.mdx +++ b/platform/api/destinations/postgresql.mdx @@ -12,10 +12,12 @@ import PostgreSQLPrerequisites from '/snippets/general-shared-text/postgresql.md To create or change a PostgreSQL destination connector, see the following examples. +import PostgreSQLSDK from '/snippets/destination_connectors/postgresql_sdk.mdx'; import PostgreSQLAPIRESTCreate from '/snippets/destination_connectors/postgresql_rest_create.mdx'; import PostgreSQLAPIRESTChange from '/snippets/destination_connectors/postgresql_rest_change.mdx'; + diff --git a/platform/api/destinations/qdrant.mdx b/platform/api/destinations/qdrant.mdx index db191450..52dfe310 100644 --- a/platform/api/destinations/qdrant.mdx +++ b/platform/api/destinations/qdrant.mdx @@ -12,10 +12,12 @@ import QdrantPrerequisites from '/snippets/general-shared-text/qdrant.mdx'; To create or change a Qdrant destination connector, see the following examples. +import QdrantSDK from '/snippets/destination_connectors/qdrant_sdk.mdx'; import QdrantAPIRESTCreate from '/snippets/destination_connectors/qdrant_rest_create.mdx'; import QdrantAPIRESTChange from '/snippets/destination_connectors/qdrant_rest_change.mdx'; + diff --git a/platform/api/destinations/s3.mdx b/platform/api/destinations/s3.mdx index f45ada57..0d3b91f4 100644 --- a/platform/api/destinations/s3.mdx +++ b/platform/api/destinations/s3.mdx @@ -12,10 +12,12 @@ import s3Prerequisites from '/snippets/general-shared-text/s3.mdx'; To create or change an S3 destination connector, see the following examples. +import s3SDK from '/snippets/destination_connectors/s3_sdk.mdx'; import s3APIRESTCreate from '/snippets/destination_connectors/s3_rest_create.mdx'; import s3APIRESTChange from '/snippets/destination_connectors/s3_rest_change.mdx'; + diff --git a/platform/api/destinations/weaviate.mdx b/platform/api/destinations/weaviate.mdx index 2fab0a43..4f08371b 100644 --- a/platform/api/destinations/weaviate.mdx +++ b/platform/api/destinations/weaviate.mdx @@ -12,10 +12,12 @@ import WeaviatePrerequisites from '/snippets/general-shared-text/weaviate.mdx'; To create or change a Weaviate destination connector, see the following examples. +import WeaviateSDK from '/snippets/destination_connectors/weaviate_sdk.mdx'; import WeaviateAPIRESTCreate from '/snippets/destination_connectors/weaviate_rest_create.mdx'; import WeaviateAPIRESTChange from '/snippets/destination_connectors/weaviate_rest_change.mdx'; + diff --git a/platform/api/overview.mdx b/platform/api/overview.mdx index 7cf9e6ab..8b41ccd6 100644 --- a/platform/api/overview.mdx +++ b/platform/api/overview.mdx @@ -348,7 +348,7 @@ specify the settings for the connector. For the specific settings to include, wh server_url=os.getenv("UNSTRUCTURED_API_URL") ) - source_connector = CreateSourceConnector( + destination_connector = CreateSourceConnector( name="", type="", config={ @@ -432,7 +432,7 @@ You can change any of the connector's settings except for its `name` and `type`. server_url=os.getenv("UNSTRUCTURED_API_URL") ) - source_connector = UpdateSourceConnector( + destination_connector = UpdateSourceConnector( config={ # Specify the settings for the connector here. } diff --git a/platform/api/sources/azure-blob-storage.mdx b/platform/api/sources/azure-blob-storage.mdx index 77978b5c..38920987 100644 --- a/platform/api/sources/azure-blob-storage.mdx +++ b/platform/api/sources/azure-blob-storage.mdx @@ -12,10 +12,12 @@ import AzurePrerequisites from '/snippets/general-shared-text/azure.mdx'; To create or change an Azure Blob Storage source connector, see the following examples. +import AzureSDK from '/snippets/source_connectors/azure_sdk.mdx'; import AzureAPIRESTCreate from '/snippets/source_connectors/azure_rest_create.mdx'; import AzureAPIRESTChange from '/snippets/source_connectors/azure_rest_change.mdx'; + diff --git a/platform/api/sources/confluence.mdx b/platform/api/sources/confluence.mdx index 0cc1485e..4c29ab56 100644 --- a/platform/api/sources/confluence.mdx +++ b/platform/api/sources/confluence.mdx @@ -12,10 +12,12 @@ import ConfluencePrerequisites from '/snippets/general-shared-text/confluence.md To create or change a Confluence source connector, see the following examples. +import ConfluenceSDK from '/snippets/source_connectors/confluence_sdk.mdx'; import ConfluenceAPIRESTCreate from '/snippets/source_connectors/confluence_rest_create.mdx'; import ConfluenceAPIRESTChange from '/snippets/source_connectors/confluence_rest_change.mdx'; + diff --git a/platform/api/sources/couchbase.mdx b/platform/api/sources/couchbase.mdx index 4d20f91f..41a16b1d 100644 --- a/platform/api/sources/couchbase.mdx +++ b/platform/api/sources/couchbase.mdx @@ -12,10 +12,12 @@ import CouchbasePrerequisites from '/snippets/general-shared-text/couchbase.mdx' To create or change a Couchbase source connector, see the following examples. +import CouchbaseSDK from '/snippets/source_connectors/couchbase_sdk.mdx'; import CouchbaseAPIRESTCreate from '/snippets/source_connectors/couchbase_rest_create.mdx'; import CouchbaseAPIRESTChange from '/snippets/source_connectors/couchbase_rest_change.mdx'; + diff --git a/platform/api/sources/databricks-volumes.mdx b/platform/api/sources/databricks-volumes.mdx index 057465bd..0e16f4dc 100644 --- a/platform/api/sources/databricks-volumes.mdx +++ b/platform/api/sources/databricks-volumes.mdx @@ -12,10 +12,12 @@ import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databr To create or change a Databricks Volumes source connector, see the following examples. +import DatabricksVolumesSDK from '/snippets/source_connectors/databricks_volumes_sdk.mdx'; import DatabricksVolumesAPIRESTCreate from '/snippets/source_connectors/databricks_volumes_rest_create.mdx'; import DatabricksVolumesAPIRESTChange from '/snippets/source_connectors/databricks_volumes_rest_change.mdx'; + diff --git a/platform/api/sources/dropbox.mdx b/platform/api/sources/dropbox.mdx index ae891976..2d768dd9 100644 --- a/platform/api/sources/dropbox.mdx +++ b/platform/api/sources/dropbox.mdx @@ -12,10 +12,12 @@ import DropboxPrerequisites from '/snippets/general-shared-text/dropbox.mdx'; To create or change a Dropbox source connector, see the following examples. +import DropboxSDK from '/snippets/source_connectors/dropbox_sdk.mdx'; import DropboxAPIRESTCreate from '/snippets/source_connectors/dropbox_rest_create.mdx'; import DropboxAPIRESTChange from '/snippets/source_connectors/dropbox_rest_change.mdx'; + diff --git a/platform/api/sources/elasticsearch.mdx b/platform/api/sources/elasticsearch.mdx index e6afed91..d4b49712 100644 --- a/platform/api/sources/elasticsearch.mdx +++ b/platform/api/sources/elasticsearch.mdx @@ -12,10 +12,12 @@ import ElasticsearchPrerequisites from '/snippets/general-shared-text/elasticsea To create or change a Elasticsearch source connector, see the following examples. +import ElasticsearchSDK from '/snippets/source_connectors/elasticsearch_sdk.mdx'; import ElasticsearchAPIRESTCreate from '/snippets/source_connectors/elasticsearch_rest_create.mdx'; import ElasticsearchAPIRESTChange from '/snippets/source_connectors/elasticsearch_rest_change.mdx'; + diff --git a/platform/api/sources/google-cloud.mdx b/platform/api/sources/google-cloud.mdx index 304680c7..9cfc561e 100644 --- a/platform/api/sources/google-cloud.mdx +++ b/platform/api/sources/google-cloud.mdx @@ -12,10 +12,12 @@ import GCSPrerequisites from '/snippets/general-shared-text/gcs.mdx'; To create or change a Google Cloud Storage source connector, see the following examples. +import GCSSDK from '/snippets/source_connectors/gcs_sdk.mdx'; import GCSAPIRESTCreate from '/snippets/source_connectors/gcs_rest_create.mdx'; import GCSAPIRESTChange from '/snippets/source_connectors/gcs_rest_change.mdx'; + diff --git a/platform/api/sources/google-drive.mdx b/platform/api/sources/google-drive.mdx index b68fb4f0..73c4d0ec 100644 --- a/platform/api/sources/google-drive.mdx +++ b/platform/api/sources/google-drive.mdx @@ -12,10 +12,12 @@ import GoogleDrivePrerequisites from '/snippets/general-shared-text/google-drive To create or change a Google Drive source connector, see the following examples. +import GoogleDriveSDK from '/snippets/source_connectors/google_drive_sdk.mdx'; import GoogleDriveAPIRESTCreate from '/snippets/source_connectors/google_drive_rest_create.mdx'; import GoogleDriveAPIRESTChange from '/snippets/source_connectors/google_drive_rest_change.mdx'; + diff --git a/platform/api/sources/kafka.mdx b/platform/api/sources/kafka.mdx index e76263b6..9892047f 100644 --- a/platform/api/sources/kafka.mdx +++ b/platform/api/sources/kafka.mdx @@ -12,10 +12,12 @@ import KafkaPrerequisites from '/snippets/general-shared-text/kafka.mdx'; To create or change a Kafka source connector, see the following examples. +import KafkaSDK from '/snippets/source_connectors/kafka_sdk.mdx'; import KafkaAPIRESTCreate from '/snippets/source_connectors/kafka_rest_create.mdx'; import KafkaAPIRESTChange from '/snippets/source_connectors/kafka_rest_change.mdx'; + diff --git a/platform/api/sources/mongodb.mdx b/platform/api/sources/mongodb.mdx index 503e5bfc..10af6aca 100644 --- a/platform/api/sources/mongodb.mdx +++ b/platform/api/sources/mongodb.mdx @@ -12,10 +12,12 @@ import MongoDBPrerequisites from '/snippets/general-shared-text/mongodb.mdx'; To create or change a MongoDB source connector, see the following examples. +import MongoDBSDK from '/snippets/source_connectors/mongodb_sdk.mdx'; import MongoDBAPIRESTCreate from '/snippets/source_connectors/mongodb_rest_create.mdx'; import MongoDBAPIRESTChange from '/snippets/source_connectors/mongodb_rest_change.mdx'; + diff --git a/platform/api/sources/onedrive.mdx b/platform/api/sources/onedrive.mdx index ae937fb9..d938db8b 100644 --- a/platform/api/sources/onedrive.mdx +++ b/platform/api/sources/onedrive.mdx @@ -12,10 +12,12 @@ import OneDrivePrerequisites from '/snippets/general-shared-text/onedrive.mdx'; To create or change a OneDrive source connector, see the following examples. +import OneDriveSDK from '/snippets/source_connectors/onedrive_sdk.mdx'; import OneDriveAPIRESTCreate from '/snippets/source_connectors/onedrive_rest_create.mdx'; import OneDriveAPIRESTChange from '/snippets/source_connectors/onedrive_rest_change.mdx'; + diff --git a/platform/api/sources/outlook.mdx b/platform/api/sources/outlook.mdx index 0b6f41f9..2a9d8fbe 100644 --- a/platform/api/sources/outlook.mdx +++ b/platform/api/sources/outlook.mdx @@ -12,10 +12,12 @@ import OutlookPrerequisites from '/snippets/general-shared-text/outlook.mdx'; To create or change an Outlook source connector, see the following examples. +import OutlookSDK from '/snippets/source_connectors/outlook_sdk.mdx'; import OutlookAPIRESTCreate from '/snippets/source_connectors/outlook_rest_create.mdx'; import OutlookAPIRESTChange from '/snippets/source_connectors/outlook_rest_change.mdx'; + diff --git a/platform/api/sources/overview.mdx b/platform/api/sources/overview.mdx index 5a972921..1eab66b8 100644 --- a/platform/api/sources/overview.mdx +++ b/platform/api/sources/overview.mdx @@ -4,29 +4,34 @@ title: Overview To use the [Unstructured Platform API](/platform/api/overview) to manage source connectors, do the following: -- To get a list of available source connectors, use the `GET` method to call the `/sources` endpoint. [Learn more](/platform/api/overview#list-source-connectors). -- To get information about a source connector, use the `GET` method to call the `/sources/` endpoint. [Learn more](/platform/api/overview#get-a-source-connector). -- To create a source connector, use the `POST` method to call the `/sources` endpoint. [Learn more](/platform/api/overview#create-a-source-connector). -- To update a source connector, use the `PUT` method to call the `/sources/` endpoint. [Learn more](/platform/api/overview#update-a-source-connector). -- To delete a source connector, use the `DELETE` method to call the `/sources/` endpoint. [Learn more](/platform/api/overview#delete-a-source-connector). +- To get a list of available source connectors, use the `UnstructuredClient` object's `sources.list_sources` function (for the Python SDK) or + the `GET` method to call the `/sources` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#list-source-connectors). +- To get information about a source connector, use the `UnstructuredClient` object's `sources.get_source` function (for the Python SDK) or + the `GET` method to call the `/sources/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#get-a-source-connector). +- To create a source connector, use the `UnstructuredClient` object's `sources.create_source` function (for the Python SDK) or + the `POST` method to call the `/sources` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#create-a-source-connector). +- To update a source connector, use the `UnstructuredClient` object's `sources.update_source` function (for the Python SDK) or + the `PUT` method to call the `/sources/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#update-a-source-connector). +- To delete a source connector, use the `UnstructuredClient` object's `sources.delete_source` function (for the Python SDK) or + the `DELETE` method to call the `/sources/` endpoint (for `curl` or Postman). [Learn more](/platform/api/overview#delete-a-source-connector). -To create a source connector, you must also provide a request body that contains settings that are specific to that connector. +To create or update a source connector, you must also provide settings that are specific to that connector. For the list of specific settings, see: -- [Azure](/platform/api/sources/azure-blob-storage) (`source_type=azure`) -- [Confluence](/platform/api/sources/confluence) (`source_type=confluence`) -- [Couchbase](/platform/api/sources/couchbase) (`source_type=couchbase`) -- [Databricks Volumes](/platform/api/sources/databricks-volumes) (`source_type=databricks_volumes`) -- [Dropbox](/platform/api/sources/dropbox) (`source_type=dropbox`) -- [Elasticsearch](/platform/api/sources/elasticsearch) (`source_type=elasticsearch`) -- [Google Cloud Storage](/platform/api/sources/google-cloud) (`source_type=gcs`) -- [Google Drive](/platform/api/sources/google-drive) (`source_type=google_drive`) -- [Kafka](/platform/api/sources/kafka) (`source_type=kafka-cloud`) -- [MongoDB](/platform/api/sources/mongodb) (`source_type=mongodb`) -- [OneDrive](/platform/api/sources/onedrive) (`source_type=onedrive`) -- [Outlook](/platform/api/sources/outlook) (`source_type=outlook`) -- [PostgreSQL](/platform/api/sources/postgresql) (`source_type=postgres`) -- [S3](/platform/api/sources/s3) (`source_type=s3`) -- [Salesforce](/platform/api/sources/salesforce) (`source_type=salesforce`) -- [SharePoint](/platform/api/sources/sharepoint) (`source_type=sharepoint`) +- [Azure](/platform/api/sources/azure-blob-storage) (`azure`) +- [Confluence](/platform/api/sources/confluence) (`confluence`) +- [Couchbase](/platform/api/sources/couchbase) (`couchbase`) +- [Databricks Volumes](/platform/api/sources/databricks-volumes) (`databricks_volumes`) +- [Dropbox](/platform/api/sources/dropbox) (`dropbox`) +- [Elasticsearch](/platform/api/sources/elasticsearch) (`elasticsearch`) +- [Google Cloud Storage](/platform/api/sources/google-cloud) (`gcs`) +- [Google Drive](/platform/api/sources/google-drive) (`google_drive`) +- [Kafka](/platform/api/sources/kafka) (`kafka-cloud`) +- [MongoDB](/platform/api/sources/mongodb) (`mongodb`) +- [OneDrive](/platform/api/sources/onedrive) (`onedrive`) +- [Outlook](/platform/api/sources/outlook) (`outlook`) +- [PostgreSQL](/platform/api/sources/postgresql) (`postgres`) +- [S3](/platform/api/sources/s3) (`s3`) +- [Salesforce](/platform/api/sources/salesforce) (`salesforce`) +- [SharePoint](/platform/api/sources/sharepoint) (`sharepoint`) diff --git a/platform/api/sources/postgresql.mdx b/platform/api/sources/postgresql.mdx index 40caacdd..6ca1f4de 100644 --- a/platform/api/sources/postgresql.mdx +++ b/platform/api/sources/postgresql.mdx @@ -12,10 +12,12 @@ import PostgreSQLPrerequisites from '/snippets/general-shared-text/postgresql.md To create or change an PostgreSQL source connector, see the following examples. +import PostgreSQLSDK from '/snippets/source_connectors/postgresql_sdk.mdx'; import PostgreSQLAPIRESTCreate from '/snippets/source_connectors/postgresql_rest_create.mdx'; import PostgreSQLAPIRESTChange from '/snippets/source_connectors/postgresql_rest_change.mdx'; + diff --git a/platform/api/sources/s3.mdx b/platform/api/sources/s3.mdx index 2eeea36b..82911aa1 100644 --- a/platform/api/sources/s3.mdx +++ b/platform/api/sources/s3.mdx @@ -12,10 +12,12 @@ import S3Prerequisites from '/snippets/general-shared-text/s3.mdx'; To create or change an S3 source connector, see the following examples. +import S3SDK from '/snippets/source_connectors/s3_sdk.mdx'; import S3APIRESTCreate from '/snippets/source_connectors/s3_rest_create.mdx'; import S3APIRESTChange from '/snippets/source_connectors/s3_rest_change.mdx'; + diff --git a/platform/api/sources/salesforce.mdx b/platform/api/sources/salesforce.mdx index 9c65fc86..c02f8b91 100644 --- a/platform/api/sources/salesforce.mdx +++ b/platform/api/sources/salesforce.mdx @@ -12,10 +12,12 @@ import SalesforcePrerequisites from '/snippets/general-shared-text/salesforce.md To create or change a Salesforce source connector, see the following examples. +import SalesforceSDK from '/snippets/source_connectors/salesforce_sdk.mdx'; import SalesforceAPIRESTCreate from '/snippets/source_connectors/salesforce_rest_create.mdx'; import SalesforceAPIRESTChange from '/snippets/source_connectors/salesforce_rest_change.mdx'; + diff --git a/platform/api/sources/sharepoint.mdx b/platform/api/sources/sharepoint.mdx index 89a860d1..0abdebca 100644 --- a/platform/api/sources/sharepoint.mdx +++ b/platform/api/sources/sharepoint.mdx @@ -12,10 +12,12 @@ import SharePointPrerequisites from '/snippets/general-shared-text/sharepoint.md To create or change a SharePoint source connector, see the following examples. +import SharePointSDK from '/snippets/source_connectors/sharepoint_sdk.mdx'; import SharePointAPIRESTCreate from '/snippets/source_connectors/sharepoint_rest_create.mdx'; import SharePointAPIRESTChange from '/snippets/source_connectors/sharepoint_rest_change.mdx'; + diff --git a/snippets/destination_connectors/astradb_sdk.mdx b/snippets/destination_connectors/astradb_sdk.mdx new file mode 100644 index 00000000..e94f6db9 --- /dev/null +++ b/snippets/destination_connectors/astradb_sdk.mdx @@ -0,0 +1,17 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="astradb", # Create only. + config={ + "token": "", + "api_endpoint": "", + "collection_name": "", + "keyspace": "", + "batch_size": + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/azure_ai_search_sdk.mdx b/snippets/destination_connectors/azure_ai_search_sdk.mdx new file mode 100644 index 00000000..26c27e42 --- /dev/null +++ b/snippets/destination_connectors/azure_ai_search_sdk.mdx @@ -0,0 +1,15 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="azure_ai_search", # Create only. + config={ + "endpoint": "", + "index": "", + "azure_ai_search_key": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/couchbase_sdk.mdx b/snippets/destination_connectors/couchbase_sdk.mdx new file mode 100644 index 00000000..2bfd8362 --- /dev/null +++ b/snippets/destination_connectors/couchbase_sdk.mdx @@ -0,0 +1,20 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="couchbase", # Create only. + config={ + "username": "", + "bucket": "", + "connection_string": "", + "scope": "", + "collection": "", + "password": "", + "batch_size": , + "collection_id": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/databricks_delta_table_sdk.mdx b/snippets/destination_connectors/databricks_delta_table_sdk.mdx new file mode 100644 index 00000000..c9359af1 --- /dev/null +++ b/snippets/destination_connectors/databricks_delta_table_sdk.mdx @@ -0,0 +1,23 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="databricks_volume_delta_tables", # Create only. + config={ + "server_hostname": "", + "http_path": "", + "token": "", + "client_id": "", + "client_secret": "", + "volume": "", + "catalog": "", + "volume_path": "", + "schema": "", + "database": "", + "table_name": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/databricks_volumes_sdk.mdx b/snippets/destination_connectors/databricks_volumes_sdk.mdx new file mode 100644 index 00000000..79594467 --- /dev/null +++ b/snippets/destination_connectors/databricks_volumes_sdk.mdx @@ -0,0 +1,19 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="databricks_volumes", # Create only. + config={ + "host": "", + "catalog": "", + "schema": "", + "volume": "", + "volume_path": "", + "client_secret": "", + "client_id": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/delta_table_sdk.mdx b/snippets/destination_connectors/delta_table_sdk.mdx new file mode 100644 index 00000000..de5a3b24 --- /dev/null +++ b/snippets/destination_connectors/delta_table_sdk.mdx @@ -0,0 +1,16 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="delta_table", # Create only. + config={ + "aws_region": "", + "table_uri": "", + "aws_access_key_id": "", + "aws_secret_access_key": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/elasticsearch_sdk.mdx b/snippets/destination_connectors/elasticsearch_sdk.mdx new file mode 100644 index 00000000..9e81c9c6 --- /dev/null +++ b/snippets/destination_connectors/elasticsearch_sdk.mdx @@ -0,0 +1,15 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="elasticsearch", # Create only. + config={ + "hosts": [""], + "es_api_key": "", + "index_name": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/gcs_sdk.mdx b/snippets/destination_connectors/gcs_sdk.mdx new file mode 100644 index 00000000..ad7c8207 --- /dev/null +++ b/snippets/destination_connectors/gcs_sdk.mdx @@ -0,0 +1,14 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="gcs", # Create only. + config={ + "remote_url": "", + "service_account_key": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/kafka_sdk.mdx b/snippets/destination_connectors/kafka_sdk.mdx new file mode 100644 index 00000000..f1f8758c --- /dev/null +++ b/snippets/destination_connectors/kafka_sdk.mdx @@ -0,0 +1,19 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="kafka-cloud", # Create only. + config={ + "bootstrap_server": "", + "port": , + "group_id": "", + "kafka_api_key": "", + "secret": "", + "topic": "", + "num_messages_to_consume": + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/milvus_sdk.mdx b/snippets/destination_connectors/milvus_sdk.mdx new file mode 100644 index 00000000..44d9b1df --- /dev/null +++ b/snippets/destination_connectors/milvus_sdk.mdx @@ -0,0 +1,17 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="milvus", # Create only. + config={ + "user": "", + "uri": "", + "db_name": "", + "password": "", + "collection_name": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/mongodb_sdk.mdx b/snippets/destination_connectors/mongodb_sdk.mdx new file mode 100644 index 00000000..a1925c1b --- /dev/null +++ b/snippets/destination_connectors/mongodb_sdk.mdx @@ -0,0 +1,15 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="mongodb", # Create only. + config={ + "database": "", + "collection": "", + "uri": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/neo4j_sdk.mdx b/snippets/destination_connectors/neo4j_sdk.mdx new file mode 100644 index 00000000..9265b50e --- /dev/null +++ b/snippets/destination_connectors/neo4j_sdk.mdx @@ -0,0 +1,17 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="neo4j", # Create only. + config={ + "uri": "", + "database": "", + "username": "", + "password": "", + "batch_size": + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/onedrive_sdk.mdx b/snippets/destination_connectors/onedrive_sdk.mdx new file mode 100644 index 00000000..d8693ae5 --- /dev/null +++ b/snippets/destination_connectors/onedrive_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="onedrive", # Create only. + config={ + "client_id": "", + "user_pname": "", + "tenant": "", + "authority_url": "", + "client_cred": "", + "remote_url": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/pinecone_sdk.mdx b/snippets/destination_connectors/pinecone_sdk.mdx new file mode 100644 index 00000000..cefa9e19 --- /dev/null +++ b/snippets/destination_connectors/pinecone_sdk.mdx @@ -0,0 +1,15 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="pinecone", # Create only. + config={ + "index_name" "", + "api_key": "", + "batch_size" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/postgresql_sdk.mdx b/snippets/destination_connectors/postgresql_sdk.mdx new file mode 100644 index 00000000..8367355f --- /dev/null +++ b/snippets/destination_connectors/postgresql_sdk.mdx @@ -0,0 +1,19 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="postgres", # Create only. + config={ + "host": "", + "database": "", + "port": "", + "username": "", + "password": "", + "table_name": "", + "batch_size": + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/qdrant_sdk.mdx b/snippets/destination_connectors/qdrant_sdk.mdx new file mode 100644 index 00000000..93168ca6 --- /dev/null +++ b/snippets/destination_connectors/qdrant_sdk.mdx @@ -0,0 +1,16 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="qdrant", # Create only. + config={ + "url": "", + "collection_name": "", + "batch_size": "", + "api_key": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/s3_sdk.mdx b/snippets/destination_connectors/s3_sdk.mdx new file mode 100644 index 00000000..a7421fee --- /dev/null +++ b/snippets/destination_connectors/s3_sdk.mdx @@ -0,0 +1,21 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="s3", # Create only. + config={ + # For AWS access key ID with AWS secret access key authentication: + "key": "", + "secret": "", + + # For AWS STS token authentication: + "token": "", + + "remote_url": "", + "endpoint_url": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/destination_connectors/weaviate_sdk.mdx b/snippets/destination_connectors/weaviate_sdk.mdx new file mode 100644 index 00000000..0e08721e --- /dev/null +++ b/snippets/destination_connectors/weaviate_sdk.mdx @@ -0,0 +1,15 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import DestinationConnector +# ... +destination_connector = DestinationConnector( + name="", # Create only. + type="weaviate", # Create only. + config={ + "host_url": "", + "class_name": "", + "api_key": "" + } +) +# ... +``` \ No newline at end of file diff --git a/snippets/source_connectors/azure_sdk.mdx b/snippets/source_connectors/azure_sdk.mdx new file mode 100644 index 00000000..97e00264 --- /dev/null +++ b/snippets/source_connectors/azure_sdk.mdx @@ -0,0 +1,27 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="azure", # Create only. + config={ + "remote_url": "az:///", + "recursive": , + + # For anonymous authentication, do not set any of the + # following fields. + + # For SAS token authentication: + "account_name": "", + "sas_token": "" + + # For account key authentication: + "account_name": "", + "account_key": "" + + # For connection string authentication: + "connection_string": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/confluence_sdk.mdx b/snippets/source_connectors/confluence_sdk.mdx new file mode 100644 index 00000000..8ef23f28 --- /dev/null +++ b/snippets/source_connectors/confluence_sdk.mdx @@ -0,0 +1,32 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="confluence", # Create only. + config={ + "url": "", + "max_num_of_spaces": , + "max_num_of_docs_from_each_space": , + "spaces": ["", ""], + + # For API token authentication: + + "username": "", + "token": "", + "cloud": "" + + # For personal access token (PAT) authentication: + + "token": "", + "cloud": "false" + + # For password authentication: + + "username": "", + "password": "", + "cloud": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/couchbase_sdk.mdx b/snippets/source_connectors/couchbase_sdk.mdx new file mode 100644 index 00000000..18c588af --- /dev/null +++ b/snippets/source_connectors/couchbase_sdk.mdx @@ -0,0 +1,19 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="couchbase", # Create only. + config={ + "username": "", + "bucket": "", + "connection_string": "", + "scope": "", + "collection": "", + "password": "", + "batch_size": , + "collection_id": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/databricks_volumes_sdk.mdx b/snippets/source_connectors/databricks_volumes_sdk.mdx new file mode 100644 index 00000000..c9169b80 --- /dev/null +++ b/snippets/source_connectors/databricks_volumes_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="databricks_volumes", # Create only. + config={ + "host": "", + "client_id": "" + "client_secret": "", + "catalog": "", + "schema": "", + "volume": "", + "volume_path": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/dropbox_sdk.mdx b/snippets/source_connectors/dropbox_sdk.mdx new file mode 100644 index 00000000..ddb7649b --- /dev/null +++ b/snippets/source_connectors/dropbox_sdk.mdx @@ -0,0 +1,14 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="dropbox", # Create only. + config={ + "token": "", + "remote_url": "", + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/elasticsearch_sdk.mdx b/snippets/source_connectors/elasticsearch_sdk.mdx new file mode 100644 index 00000000..512e23ac --- /dev/null +++ b/snippets/source_connectors/elasticsearch_sdk.mdx @@ -0,0 +1,14 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="elasticsearch", # Create only. + config={ + "hosts": [""], + "es_api_key": "", + "index_name": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/gcs_sdk.mdx b/snippets/source_connectors/gcs_sdk.mdx new file mode 100644 index 00000000..51f7582c --- /dev/null +++ b/snippets/source_connectors/gcs_sdk.mdx @@ -0,0 +1,14 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="gcs", # Create only. + config={ + "service_account_key": "", + "remote_url": "", + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/google_drive_sdk.mdx b/snippets/source_connectors/google_drive_sdk.mdx new file mode 100644 index 00000000..afa5d687 --- /dev/null +++ b/snippets/source_connectors/google_drive_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="google_drive", # Create only. + config={ + "drive_id": "", + "service_account_key": "", + "extensions": [ + "", + "" + ], + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/kafka_sdk.mdx b/snippets/source_connectors/kafka_sdk.mdx new file mode 100644 index 00000000..8816e723 --- /dev/null +++ b/snippets/source_connectors/kafka_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="kafka-cloud", # Create only. + config={ + "bootstrap_server": "", + "port": , + "group_id": "", + "kafka_api_key": "", + "secret": "", + "topic": "", + "num_messages_to_consume": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/mongodb_sdk.mdx b/snippets/source_connectors/mongodb_sdk.mdx new file mode 100644 index 00000000..1a6fe3e1 --- /dev/null +++ b/snippets/source_connectors/mongodb_sdk.mdx @@ -0,0 +1,14 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="mongodb", # Create only. + config={ + "uri": "", + "database": "", + "collection": "" + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/onedrive_sdk.mdx b/snippets/source_connectors/onedrive_sdk.mdx new file mode 100644 index 00000000..a796348f --- /dev/null +++ b/snippets/source_connectors/onedrive_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="onedrive", # Create only. + config={ + "client_id": "", + "user_pname": "", + "tenant": "", + "authority_url": "", + "client_cred": "", + "path": "", + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/outlook_sdk.mdx b/snippets/source_connectors/outlook_sdk.mdx new file mode 100644 index 00000000..5f208329 --- /dev/null +++ b/snippets/source_connectors/outlook_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="outlook", # Create only. + config={ + "client_id": "", + "authority_url": "", + "tenant": "", + "client_cred": "", + "user_email": "", + "outlook_folders": ["",""], + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/postgresql_sdk.mdx b/snippets/source_connectors/postgresql_sdk.mdx new file mode 100644 index 00000000..40d421f2 --- /dev/null +++ b/snippets/source_connectors/postgresql_sdk.mdx @@ -0,0 +1,23 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="postgres", # Create only. + config={ + "host": "", + "database": "", + "port": "", + "username": "", + "password": "", + "table_name": "", + "batch_size": , + "id_column": "", + "fields": [ + "", + "" + ] + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/s3_sdk.mdx b/snippets/source_connectors/s3_sdk.mdx new file mode 100644 index 00000000..e9c5a906 --- /dev/null +++ b/snippets/source_connectors/s3_sdk.mdx @@ -0,0 +1,24 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="s3", # Create only. + config={ + # For anonymous authentication: + "anonymous": true, + + # For AWS access key ID with AWS secret access key authentication: + "key": "", + "secret": "", + + # For AWS STS token authentication: + "token": "", + + "remote_url": "", + "endpoint_url": "", + "recursive": + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/salesforce_sdk.mdx b/snippets/source_connectors/salesforce_sdk.mdx new file mode 100644 index 00000000..4238af0c --- /dev/null +++ b/snippets/source_connectors/salesforce_sdk.mdx @@ -0,0 +1,18 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="salesforce", # Create only. + config={ + "username": "", + "consumer_key": "", + "private_key": "", + "categories": [ + "", + "" + ] + } +) +``` \ No newline at end of file diff --git a/snippets/source_connectors/sharepoint_sdk.mdx b/snippets/source_connectors/sharepoint_sdk.mdx new file mode 100644 index 00000000..a6cec690 --- /dev/null +++ b/snippets/source_connectors/sharepoint_sdk.mdx @@ -0,0 +1,19 @@ +```python Python SDK +# ... +from unstructured_client.models.shared import SourceConnector +# ... +source_connector = SourceConnector( + name="", # Create only. + type="sharepoint", # Create only. + config={ + "client_id": "", + "site": "", + "tenant": "", + "authority_url": "", + "user_pname": "", + "client_cred": "", + "path": "", + "recursive": + } +) +``` \ No newline at end of file From 6faf628a86e4b0e173d3e6253561fb0b982b4c83 Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Wed, 12 Feb 2025 12:26:38 -0800 Subject: [PATCH 3/5] Platform: Python SDK - move server_url from UnstructuredClient constructor to individual function calls --- platform/api/overview.mdx | 122 ++++++++++++++++++------------------- platform/api/workflows.mdx | 12 ++-- 2 files changed, 67 insertions(+), 67 deletions(-) diff --git a/platform/api/overview.mdx b/platform/api/overview.mdx index 8b41ccd6..e9430b0d 100644 --- a/platform/api/overview.mdx +++ b/platform/api/overview.mdx @@ -72,19 +72,19 @@ To use the Unstructured Platform API, you must have: The Unstructured Platform API is offered as follows: -- As part of the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client), +- As part of the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) beginning with version 0.30.0, which you can call through standard Python code. To install the Unstructured Python SDK, run the following command from within your Python virtual environment: ```bash - pip install unstructured-client + pip install "unstructured-client>=0.30.0" ``` - If you already have the Unstructured Python SDK installed, upgrade to the latest version by running the following command instead: + If you already have the Unstructured Python SDK installed, upgrade to at least version 0.30.0 by running the following command instead: ```bash - pip install --upgrade unstructured-client + pip install --upgrade "unstructured-client>=0.30.0" ``` - As a set of Representational State Transfer (REST) endpoints, which you can call through standard REST-enabled @@ -207,14 +207,14 @@ To get this ID, see [Sources](/platform/api/sources/overview). from unstructured_client.models.operations import ListSourcesRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.sources.list_sources( request=ListSourcesRequest( source_type="" # Optional, list only for this source type. - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) # Print the list in alphabetical order by connector name. @@ -280,14 +280,14 @@ the `GET` method to call the `/sources/` endpoint (for `curl` or P from unstructured_client.models.operations import GetSourceRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.sources.get_source( request=GetSourceRequest( source_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.source_connector_information @@ -344,8 +344,7 @@ specify the settings for the connector. For the specific settings to include, wh from unstructured_client.models.shared import CreateSourceConnector client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) destination_connector = CreateSourceConnector( @@ -359,7 +358,8 @@ specify the settings for the connector. For the specific settings to include, wh response = client.sources.create_source( request=CreateSourceRequest( create_source_connector=source_connector - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.source_connector_information @@ -428,8 +428,7 @@ You can change any of the connector's settings except for its `name` and `type`. from unstructured_client.models.shared import UpdateSourceConnector client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) destination_connector = UpdateSourceConnector( @@ -442,7 +441,8 @@ You can change any of the connector's settings except for its `name` and `type`. request=UpdateSourceRequest( source_id="", update_source_connector=source_connector - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.source_connector_information @@ -502,14 +502,14 @@ the `DELETE` method to call the `/sources/` endpoint (for `curl` o from unstructured_client.models.operations import DeleteSourceRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.sources.delete_source( request=DeleteSourceRequest( source_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) print(response.raw_response) @@ -560,14 +560,14 @@ To get this ID, see [Destinations](/platform/api/destinations/overview). from unstructured_client.models.operations import ListDestinationsRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.destinations.list_destinations( request=ListDestinationsRequest( destination_type="" # Optional, list only for this destination type. - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) # Print the list in alphabetical order by connector name. @@ -633,14 +633,14 @@ the `GET` method to call the `/destinations/` endpoint (for `curl` from unstructured_client.models.operations import GetDestinationRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.destinations.get_destination( request=GetDestinationRequest( destination_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.destination_connector_information @@ -696,8 +696,7 @@ specify the settings for the connector. For the specific settings to include, wh from unstructured_client.models.shared import CreateDestinationConnector client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) destination_connector = CreateDestinationConnector( @@ -711,7 +710,8 @@ specify the settings for the connector. For the specific settings to include, wh response = client.destinations.create_destination( request=CreateDestinationRequest( create_destination_connector=destination_connector - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.destination_connector_information @@ -779,8 +779,7 @@ You can change any of the connector's settings except for its `name` and `type`. from unstructured_client.models.shared import UpdateDestinationConnector client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) destination_connector = UpdateDestinationConnector( @@ -793,7 +792,8 @@ You can change any of the connector's settings except for its `name` and `type`. request=UpdateDestinationRequest( destination_id="", update_destination_connector=destination_connector - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.destination_connector_information @@ -852,14 +852,14 @@ the `DELETE` method to call the `/destinations/` endpoint (for `cu from unstructured_client.models.operations import DeleteDestinationRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.destinations.delete_destination( request=DeleteDestinationRequest( destination_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) print(response.raw_response) @@ -926,8 +926,7 @@ You can specify multiple query parameters, for example `?source_id=` endpoint (for `curl` or from unstructured_client.models.operations import GetWorkflowRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.workflows.get_workflow( request=GetWorkflowRequest( workflow_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.workflow_information @@ -1098,8 +1098,7 @@ specify the settings for the workflow. For the specific settings to include, see from unstructured_client.models.shared import CreateWorkflow client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) workflow = CreateWorkflow( @@ -1109,7 +1108,8 @@ specify the settings for the workflow. For the specific settings to include, see response = client.workflows.create_workflow( request=CreateWorkflowRequest( create_workflow=workflow - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.workflow_information @@ -1181,14 +1181,14 @@ the `POST` method to call the `/workflows//run` endpoint (for `curl from unstructured_client.models.operations import RunWorkflowRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.workflows.run_workflow( request=RunWorkflowRequest( workflow_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) print(response.raw_response) @@ -1242,8 +1242,7 @@ the request body (for `curl` or Postman), specify the settings for the workflow. from unstructured_client.models.shared import UpdateWorkflow client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) workflow = UpdateWorkflow( @@ -1254,7 +1253,8 @@ the request body (for `curl` or Postman), specify the settings for the workflow. request=UpdateWorkflowRequest( workflow_id="", update_workflow=workflow - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.workflow_information @@ -1326,14 +1326,14 @@ the `DELETE` method to call the `/workflows/` endpoint (for `curl` from unstructured_client.models.operations import DeleteWorkflowRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.workflows.delete_workflow( request=DeleteWorkflowRequest( workflow_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) print(response.raw_response) @@ -1398,15 +1398,15 @@ For `curl` or Postman, you can specify multiple query parameters as `?workflow_i from unstructured_client.models.operations import ListJobsRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.jobs.list_jobs( request=ListJobsRequest( workflow_id="", # Optional, list only for this workflow ID. status="", # Optional, list only for this job status. - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) # Print the list in alphabetical order by workflow name. @@ -1482,14 +1482,14 @@ the `GET` method to call the `/jobs/` endpoint (for `curl` or Postman), from unstructured_client.models.operations import GetJobRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.jobs.get_job( request=GetJobRequest( job_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.job_information @@ -1540,14 +1540,14 @@ the `POST` method to call the `/jobs//cancel` endpoint (for `curl` or Po from unstructured_client.models.operations import CancelJobRequest client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) response = client.jobs.cancel_job( request=CancelJobRequest( job_id="" - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) print(response.raw_response) diff --git a/platform/api/workflows.mdx b/platform/api/workflows.mdx index 7d345de3..b0289f17 100644 --- a/platform/api/workflows.mdx +++ b/platform/api/workflows.mdx @@ -43,8 +43,7 @@ specify the settings for the workflow, as follows: ) client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) workflow = CreateWorkflow( @@ -58,7 +57,8 @@ specify the settings for the workflow, as follows: response = client.workflows.create_workflow( request=CreateWorkflowRequest( create_workflow=workflow - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.workflow_information @@ -177,8 +177,7 @@ In the request body, specify the settings for the workflow. For the specific set ) client = UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL") + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) workflow = UpdateWorkflow( @@ -189,7 +188,8 @@ In the request body, specify the settings for the workflow. For the specific set request=UpdateWorkflowRequest( workflow_id="", update_workflow=workflow - ) + ), + server_url=os.getenv("UNSTRUCTURED_API_URL") ) info = response.workflow_information From 4b45e2b3cd8c942edca6d91a781b3bbedbe2d81c Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Wed, 12 Feb 2025 15:41:50 -0800 Subject: [PATCH 4/5] Platform API: Different API URLs for SDK versus non-SDK uses --- api-reference/troubleshooting/api-key-url.mdx | 4 ++- platform/api/overview.mdx | 35 +++++++++++++++++-- 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/api-reference/troubleshooting/api-key-url.mdx b/api-reference/troubleshooting/api-key-url.mdx index 7bb0a1ea..91353db4 100644 --- a/api-reference/troubleshooting/api-key-url.mdx +++ b/api-reference/troubleshooting/api-key-url.mdx @@ -28,7 +28,9 @@ SDKError: API error occurred: Status 401 For the API URL, note the following: -- For the [Unstructured Platform API](/platform/api/overview), the API URL is typically `https://platform.unstructuredapp.io/api/v1`. +- For the [Unstructured Platform API](/platform/api/overview), the API URL is typically `https://platform.unstructuredapp.io`, which is unique to + the Unstructured Python SDK; and `https://platform.unstructuredapp.io/api/v1` for standard REST-enabled utilities (such as `curl`), + tools (such as Postman), programming languages, packages, and libraries. - For the [Unstructured Serverless API](/api-reference/api-services/saas-api-development-guide), the API URL is typically `https://api.unstructuredapp.io/general/v0/general` (be aware of the inclusion of `app` in this API URL). - For the [Free Unstructured API](/api-reference/api-services/free-api), the API URL is always `https://api.unstructured.io/general/v0/general` (be aware that there is no `app` in this API URL). diff --git a/platform/api/overview.mdx b/platform/api/overview.mdx index e9430b0d..763822fe 100644 --- a/platform/api/overview.mdx +++ b/platform/api/overview.mdx @@ -55,12 +55,17 @@ To use the Unstructured Platform API, you must have: 4. Enter some descriptive name for the API key, and then click **Save**. 5. Click the **Copy** icon for your new API key. The API key's value is copied to your system's clipboard. -- The Unstructured **Platform API URL**, which is `https://platform.unstructuredapp.io/api/v1` +- The Unstructured **Platform API URL**. This is typically `https://platform.unstructuredapp.io`, which is unique to + the Unstructured Python SDK; and `https://platform.unstructuredapp.io/api/v1` for standard REST-enabled utilities (such as `curl`), + tools (such as Postman), programming languages, packages, and libraries. ![Unstructured Platform API URL](/img/platform/PlatformAPIURL.png) - Do not use the Unstructured **Serverless API URL**, which is separate from the Unstructured Platform API URL. + **Important**: Do not use `https://platform.unstructuredapp.io/api/v1` with the Unstructured Python SDK, or else calls made by + the Python SDK will fail. Use `https://platform.unstructuredapp.io` instead. + + Do not use the Unstructured **Serverless API URL**, which is separate from the Unstructured Platform API URL. @@ -141,13 +146,32 @@ as well as `curl` and Postman for all of the supported REST endpoints. that are available through `https://platform.unstructuredapp.io`. -The following Unstructured Python SDK and `curl` examples use environment variables, which you can set as follows: +The following Unstructured Python SDK examples use the following environment variables, which you can set as follows: + +```bash +export UNSTRUCTURED_API_URL="https://platform.unstructuredapp.io" +export UNSTRUCTURED_API_KEY="" +``` + + + **Important**: Do not use `https://platform.unstructuredapp.io/api/v1` with the Python SDK, or else calls made by the Python SDK will fail. + Use `https://platform.unstructuredapp.io` instead. + + +The following `curl` and Postman examples use the following environment variables, which you can set as follows: ```bash export UNSTRUCTURED_API_URL="https://platform.unstructuredapp.io/api/v1" export UNSTRUCTURED_API_KEY="" ``` + + **Important**: For standard REST-enabled clients (such as `curl`), + do not use `https://platform.unstructuredapp.io` (which is unique to the + Unstructured Python SDK), or else calls made by these REST-enabled clients will fail. + Use `https://platform.unstructuredapp.io/api/v1` instead. + + These environment variables enable you to more easily run the following Unstructured Python SDK and `curl` examples and help prevent you from storing scripts that contain sensitive URLs and API keys in public source code repositories. @@ -161,6 +185,11 @@ The following Postman examples use variables, which you can set as follows: - **Type**: `default` - **Initial value**: `https://platform.unstructuredapp.io/api/v1` - **Current value**: `https://platform.unstructuredapp.io/api/v1` + + + **Important**: Do not use `https://platform.unstructuredapp.io` (which is unique to the + Unstructured Python SDK), or else calls made by Postman will fail. +
- **Variable**: `UNSTRUCTURED_API_URL` - **Type**: `secret` From fd9561668878cf1f65d36e24abc71a8724e74b7e Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Thu, 13 Feb 2025 12:08:59 -0800 Subject: [PATCH 5/5] Final Python SDK touches for Platform API --- .../accessing-unstructured-api.mdx | 4 +- api-reference/api-services/examples.mdx | 72 ++++++------ .../api-services/partition-via-api.mdx | 4 +- api-reference/api-services/post-requests.mdx | 4 +- api-reference/api-services/sdk-jsts.mdx | 4 +- api-reference/api-services/sdk-python.mdx | 105 ++++++++++++++---- api-reference/ingest/ingest-cli.mdx | 6 + api-reference/ingest/overview.mdx | 6 + api-reference/ingest/python-ingest.mdx | 6 + api-reference/troubleshooting/api-key-url.mdx | 65 +++++++---- ingestion/overview.mdx | 3 +- platform/api/destinations/redis.mdx | 2 + platform/api/destinations/snowflake.mdx | 2 + platform/api/sources/snowflake.mdx | 2 + platform/api/workflows.mdx | 65 +++++++---- snippets/destination_connectors/redis_sdk.mdx | 24 ++++ .../destination_connectors/snowflake_sdk.mdx | 23 ++++ .../use-ingest-instead.mdx | 9 -- .../use-ingest-or-platform-instead.mdx | 10 ++ .../extract_image_block_types.py.mdx | 8 +- .../how-to-api/extract_text_as_html.py.mdx | 8 +- .../how-to-api/get_chunked_elements.py.mdx | 8 +- snippets/source_connectors/snowflake_sdk.mdx | 26 +++++ 23 files changed, 345 insertions(+), 121 deletions(-) create mode 100644 snippets/destination_connectors/redis_sdk.mdx create mode 100644 snippets/destination_connectors/snowflake_sdk.mdx delete mode 100644 snippets/general-shared-text/use-ingest-instead.mdx create mode 100644 snippets/general-shared-text/use-ingest-or-platform-instead.mdx create mode 100644 snippets/source_connectors/snowflake_sdk.mdx diff --git a/api-reference/api-services/accessing-unstructured-api.mdx b/api-reference/api-services/accessing-unstructured-api.mdx index a8c0d954..b502cba3 100644 --- a/api-reference/api-services/accessing-unstructured-api.mdx +++ b/api-reference/api-services/accessing-unstructured-api.mdx @@ -14,9 +14,9 @@ Choose your preferred method: The API parameters for all these methods are documented on the [API parameters](/api-reference/api-services/api-parameters) page. -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; - + If you'd like to try out the Unstructured API interactively by using the Free Unstructured API to process a single file, you can do so by using the [Swagger UI](https://api.unstructured.io/general/docs#/default/pipeline_1_general_v0_general_post). diff --git a/api-reference/api-services/examples.mdx b/api-reference/api-services/examples.mdx index 74b3282e..021f630a 100644 --- a/api-reference/api-services/examples.mdx +++ b/api-reference/api-services/examples.mdx @@ -13,7 +13,7 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; ### Changing partition strategy for a PDF @@ -82,7 +82,7 @@ The `hi_res` strategy supports different models, and the default is `layout_v1.1 ```
- + ```bash POST curl -X 'POST' $UNSTRUCTURED_API_URL \ -H 'accept: application/json' \ @@ -94,7 +94,7 @@ The `hi_res` strategy supports different models, and the default is `layout_v1.1 ``` - + ```python Python import asyncio import os @@ -103,8 +103,7 @@ The `hi_res` strategy supports different models, and the default is `layout_v1.1 from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -123,7 +122,10 @@ The `hi_res` strategy supports different models, and the default is `layout_v1.1 } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -159,7 +161,7 @@ The `hi_res` strategy supports different models, and the default is `layout_v1.1 ``` - + ```typescript TypeScript import { UnstructuredClient } from "unstructured-client"; import * as fs from "fs"; @@ -300,7 +302,7 @@ For better OCR results, you can specify what languages your document is in using ``` - + ```bash POST curl -X 'POST' $UNSTRUCTURED_API_URL \ -H 'accept: application/json' \ @@ -312,7 +314,7 @@ For better OCR results, you can specify what languages your document is in using ``` - + ```python Python import asyncio import os @@ -321,8 +323,7 @@ For better OCR results, you can specify what languages your document is in using from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -341,7 +342,10 @@ For better OCR results, you can specify what languages your document is in using } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -377,7 +381,7 @@ For better OCR results, you can specify what languages your document is in using ``` - + ```typescript TypeScript import { UnstructuredClient } from "unstructured-client"; import * as fs from "fs"; @@ -515,7 +519,7 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t ``` - + ```bash POST curl -X 'POST' $UNSTRUCTURED_API_URL \ -H 'accept: application/json' \ @@ -527,7 +531,7 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t ``` - + ```python Python import asyncio import os @@ -536,8 +540,7 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -556,7 +559,10 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -592,7 +598,7 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t ``` - + ```typescript TypeScript import { UnstructuredClient } from "unstructured-client"; import * as fs from "fs"; @@ -734,7 +740,7 @@ This can be helpful if you'd like to use the IDs as a primary key in a database, ``` - + ```bash POST curl -X 'POST' $UNSTRUCTURED_API_URL \ -H 'accept: application/json' \ @@ -745,7 +751,7 @@ This can be helpful if you'd like to use the IDs as a primary key in a database, ``` - + ```python Python import asyncio import os @@ -754,8 +760,7 @@ This can be helpful if you'd like to use the IDs as a primary key in a database, from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -774,7 +779,10 @@ This can be helpful if you'd like to use the IDs as a primary key in a database, } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -810,7 +818,7 @@ This can be helpful if you'd like to use the IDs as a primary key in a database, ``` - + ```typescript TypeScript import { UnstructuredClient } from "unstructured-client"; import * as fs from "fs"; @@ -956,7 +964,7 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform ``` - + ```bash POST curl -X 'POST' $UNSTRUCTURED_API_URL \ -H 'accept: application/json' \ @@ -969,7 +977,7 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform ``` - + ```python Python import asyncio import os @@ -978,8 +986,7 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -999,7 +1006,10 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -1035,7 +1045,7 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform ``` - + ```typescript TypeScript import { UnstructuredClient } from "unstructured-client"; import * as fs from "fs"; diff --git a/api-reference/api-services/partition-via-api.mdx b/api-reference/api-services/partition-via-api.mdx index 1284d794..5fbab4d2 100644 --- a/api-reference/api-services/partition-via-api.mdx +++ b/api-reference/api-services/partition-via-api.mdx @@ -8,9 +8,9 @@ would like to leverage the advanced capabilities of Unstructured API services, y Whether you're using the Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, you can use the open source library to send an individual file through `partition_via_api` for processing with Unstructured API services. -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; - + To use the open source library, you'll also need: diff --git a/api-reference/api-services/post-requests.mdx b/api-reference/api-services/post-requests.mdx index bab601e4..fd8c447a 100644 --- a/api-reference/api-services/post-requests.mdx +++ b/api-reference/api-services/post-requests.mdx @@ -6,9 +6,9 @@ sidebarTitle: POST request Whether you're using the free Unstructured API, the Unstructured Serverless API, Unstructured API on Azure/AWS, or your local deployment of Unstructured API, you can work with the API by sending single-file POST requests to it. -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; - + To make POST requests, you will need: diff --git a/api-reference/api-services/sdk-jsts.mdx b/api-reference/api-services/sdk-jsts.mdx index 00b8b06a..8530d8d6 100644 --- a/api-reference/api-services/sdk-jsts.mdx +++ b/api-reference/api-services/sdk-jsts.mdx @@ -7,9 +7,9 @@ The [Unstructured JavaScript/TypeScript SDK](https://github.com/Unstructured-IO/ Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, you can access the API using the JavaScript/TypeScript SDK. -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; - + To use the JavaScript/TypeScript SDK, you'll need: diff --git a/api-reference/api-services/sdk-python.mdx b/api-reference/api-services/sdk-python.mdx index 9e1f1c3a..20a3baad 100644 --- a/api-reference/api-services/sdk-python.mdx +++ b/api-reference/api-services/sdk-python.mdx @@ -3,13 +3,14 @@ title: Process an individual file by using the Unstructured Python SDK sidebarTitle: Python SDK --- -The [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) client allows you to send an individual file for processing by Unstructured API services. Whether you're using the -Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local +The [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) client allows you to send an individual file for processing by +[Unstructured API services](/api-reference/api-services/overview). Whether you're using the +Free Unstructured API, the Unstructured Serverless API, or the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, you can access the API using the Python SDK. -import UseIngestInstead from '/snippets/general-shared-text/use-ingest-instead.mdx'; +import UseIngestOrPlatformInstead from '/snippets/general-shared-text/use-ingest-or-platform-instead.mdx'; - + To use the Python SDK, you'll need: @@ -45,8 +46,7 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) filename = "PATH_TO_INPUT_FILE" @@ -66,7 +66,10 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv } try: - res = client.general.partition(request=req) + res = client.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] # Print the processed data's first element only. @@ -87,8 +90,7 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv from unstructured_client.models import operations, shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) filename = "PATH_TO_INPUT_FILE" @@ -108,7 +110,10 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv ) try: - res = client.general.partition(request=req) + res = client.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] # Print the processed data's first element only. @@ -137,8 +142,7 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv from unstructured_client.models import shared client = unstructured_client.UnstructuredClient( - api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), - server_url=os.getenv("UNSTRUCTURED_API_URL"), + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") ) async def call_api(filename, input_dir, output_dir): @@ -153,7 +157,10 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv } try: - res = await client.general.partition_async(request=req) + res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) element_dicts = [element for element in res.elements] json_elements = json.dumps(element_dicts, indent=2) @@ -225,7 +232,10 @@ import NoURLForServerlessAPI from '/snippets/general-shared-text/no-url-for-serv split_pdf_concurrency_level=15 # Set the number of concurrent request to the maximum value: 15. ) ) - res = client.general.partition(req) + res = client.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") + ) ``` ## Customizing the client @@ -299,11 +309,54 @@ the names used in the SDKs are the same across all methods. ## Migration guide -There are minor breaking changes in 0.26.0. If you encounter any errors when upgrading, please find the solution below. +There are major breaking changes in 0.30.0. If you encounter any errors when upgrading, please find the solution below. + +**If you see the error: `404 Not Found`** + +Before 0.30.0, you could specify the following Unstructured API URL for the `server_url` parameter: + +- For the Unstructured Serverless API: `https://api.unstructuredapp.io/general/v0/general` + +Beginning with 0.30.0, these Unstructured API URLs have changed as follows, respectively: + +- For the Unstructured Serverless API: `https://api.unstructuredapp.io` (remove `/general/v0/general`) +- (New beginning with 0.30.0) For the Unstructured Platform API: `https://platform.unstructuredapp.io` (remove `/api/v1`) + +Also, before 0.30.0, the `server_url` parameter was part of the `UnstructuredClient` constructor. Beginning with 0.30.0, the `server_url` +parameter has been moved into the `partition` and `partition_async` functions. + +```python +# Instead of: +client = unstructured_client.UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"), + server_url=os.getenv("UNSTRUCTURED_API_URL") +) + +# Switch to: +client = unstructured_client.UnstructuredClient( + api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") +) + +# And... + +# For partition: +res = client.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") +) + +# For partition_async: +res = await client.general.partition_async( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") +) +``` + +There are minor breaking changes beginning with 0.26.0. If you encounter any errors when upgrading, please find the solution below. **If you see the error: `AttributeError: 'PartitionParameters' object has no attribute 'partition_parameters'`** -Previously, the SDK accepted a `PartitionParameters` object as input to the `sdk.general.partition` function. Now, this object must be wrapped in a `PartitionRequest` object. The old behavior was deprecated in 0.23.0 and removed in 0.26.0. +Before 0.26.0, the SDK accepted a `PartitionParameters` object as input to the `sdk.general.partition` function. Beginning with 0.26.0, this object must be wrapped in a `PartitionRequest` object. The old behavior was deprecated in 0.23.0 and removed in 0.26.0. ```python # Instead of: @@ -313,8 +366,10 @@ req = shared.PartitionParameters( files=files, ) -resp = s.general.partition(request=req) - +resp = s.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") # Beginning with 0.30.0 +) # Switch to: from unstructured_client.models import shared, operations @@ -325,12 +380,15 @@ req = operations.PartitionRequest( ) ) -resp = s.general.partition(request=req) +resp = s.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") # Beginning with 0.30.0 +) ``` **If you see the error: `TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given`** -In 0.26.0, the `PartitionRequest` constructor no longer allows for positional arguments. You must specify `partition_parameters` by name. +Beginning with 0.26.0, the `PartitionRequest` constructor no longer allows for positional arguments. You must specify `partition_parameters` by name. ```python # Instead of: @@ -350,12 +408,15 @@ req = operations.PartitionRequest( **If you see the error: `TypeError: General.partition() takes 1 positional argument but 2 were given`** -In 0.26.0, the `partition` function no longer allows for positional arguments. You must specify `request` by name. +Beginning with 0.26.0, the `partition` function no longer allows for positional arguments. You must specify `request` by name. ```python # Instead of: resp = s.general.partition(req) # Switch to: -resp = s.general.partition(request=req) +resp = s.general.partition( + request=req, + server_url=os.getenv("UNSTRUCTURED_API_URL") # Beginning with 0.30.0 +) ``` diff --git a/api-reference/ingest/ingest-cli.mdx b/api-reference/ingest/ingest-cli.mdx index f1f0fa52..f40ae508 100644 --- a/api-reference/ingest/ingest-cli.mdx +++ b/api-reference/ingest/ingest-cli.mdx @@ -5,6 +5,12 @@ sidebarTitle: Ingest CLI The Unstructured Ingest CLI enables you to use command-line scripts to send files in batches to Unstructured API services for processing, and to tell Unstructured API services where to deliver the processed data. [Learn more](/ingestion/overview#unstructured-ingest-cli). + + The Unstructured Ingest CLI does not work with the Unstructured Platform API. + + For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform/api/overview). + + ## Installation One approach to get started quickly with the Unstructured Ingest CLI is to install Python and then run the following command: diff --git a/api-reference/ingest/overview.mdx b/api-reference/ingest/overview.mdx index ddded057..48b155e7 100644 --- a/api-reference/ingest/overview.mdx +++ b/api-reference/ingest/overview.mdx @@ -10,6 +10,12 @@ You can send batches to Unstructured API services by using the following tools: - The [Unstructured Ingest CLI](/api-reference/ingest/ingest-cli) - The [Unstructured Ingest Python](/api-reference/ingest/python-ingest) library + + The Unstructured Ingest CLI and Unstructured Ingest Python library do not work with the Unstructured Platform API. + + For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform/api/overview). + + The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured API services for processing: