Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion api-reference/ingest/destination-connector/astradb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedAstraDB />
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:

import AstraDBAPISh from '/snippets/destination_connectors/astradb.sh.mdx';
import AstraDBAPIPyV2 from '/snippets/destination_connectors/astradb.v2.py.mdx';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedAzureCS />
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:

import AzureCSAPISh from '/snippets/destination_connectors/azure_cognitive_search.sh.mdx';
import AzureCSAPIPyV2 from '/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx';
Expand Down
27 changes: 14 additions & 13 deletions api-reference/ingest/source-connectors/airtable.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,24 @@
title: Airtable
---

import SharedContentAirtable from '/snippets/sc-shared-text/airtable.mdx';
import NewDocument from '/snippets/general-shared-text/new-document.mdx';

<NewDocument />

import SharedContentAirtable from '/snippets/sc-shared-text/airtable-cli-api.mdx';
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';

<SharedContentAirtable/>
<SharedAPIKeyURL/>

Finally, make sure to set the `--partition-by-api` flag and pass in your API key with `--api-key`:
Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:

import AirtableAPISh from '/snippets/source_connectors/airtable_api.sh.mdx';
import AirtableAPIPy from '/snippets/source_connectors/airtable_api.py.mdx';
import AirtableAPISh from '/snippets/source_connectors/airtable.sh.mdx';
import AirtableAPIPyV2 from '/snippets/source_connectors/airtable.v2.py.mdx';
import AirtableAPIPyV1 from '/snippets/source_connectors/airtable.v1.py.mdx';

<CodeGroup>

<AirtableAPISh />

<AirtableAPIPy />

</CodeGroup>


Additionally, if you're using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API
deployed on Azure or AWS, you also need to specify the API URL via the `--partition-endpoint` argument.
<AirtableAPIPyV2 />
<AirtableAPIPyV1 />
</CodeGroup>
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/astradb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentAstraDB/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:

import AstraDBAPISh from '/snippets/source_connectors/astradb.sh.mdx';
import AstraDBAPIPyV1 from '/snippets/source_connectors/astradb.v1.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/dropbox.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentDropbox/>
<SharedAPIKeyURL/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import DropboxAPISh from '/snippets/source_connectors/dropbox.sh.mdx';
import DropboxAPIPyV2 from '/snippets/source_connectors/dropbox.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/hubspot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentHubSpot/>
<SharedAPIKeyURL/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import HubSpotAPISh from '/snippets/source_connectors/hubspot.sh.mdx';
import HubSpotAPIPyV1 from '/snippets/source_connectors/hubspot.v1.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/astradb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedAstraDB from '/snippets/dc-shared-text/astradb-cli-api.mdx';

<SharedAstraDB />

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.
Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedAzureCS from '/snippets/dc-shared-text/azure-cognitive-search-cli-a

<SharedAzureCS />

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.
Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/dropbox.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedDropbox from '/snippets/dc-shared-text/dropbox-cli-api.mdx';

<SharedDropbox />

Now call the Unstructured Ingest CLI or Unstructured Ingest Python. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured Ingest CLI or Unstructured Ingest Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/mongodb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedMongoDB from '/snippets/dc-shared-text/mongodb-cli-api.mdx';

<SharedMongoDB />

Now call the Unstructured Ingest CLI or Unstructured Ingest Python. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured Ingest CLI or Unstructured Ingest Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/s3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedS3 from '/snippets/dc-shared-text/s3-cli-api.mdx';

<SharedS3 />

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/sftp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedSFTP from '/snippets/dc-shared-text/sftp-cli-api.mdx';

<SharedSFTP />

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/singlestore.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedSingleStore from '/snippets/dc-shared-text/singlestore-cli-api.mdx'

<SharedSingleStore />

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/weaviate.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedWeaviate from '/snippets/dc-shared-text/weaviate-cli-api.mdx';

<SharedWeaviate />

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

Expand Down
21 changes: 14 additions & 7 deletions open-source/ingest/source-connectors/airtable.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,28 @@
title: Airtable
---

import SharedContentAirtable from '/snippets/sc-shared-text/airtable.mdx';
import NewDocument from '/snippets/general-shared-text/new-document.mdx';

<NewDocument />

import SharedContentAirtable from '/snippets/sc-shared-text/airtable-cli-api.mdx';

<SharedContentAirtable/>

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

import AirtableSh from '/snippets/source_connectors/airtable.sh.mdx';
import AirtablePy from '/snippets/source_connectors/airtable.py.mdx';
import AirtablePyV2 from '/snippets/source_connectors/airtable.v2.py.mdx';
import AirtablePyV1 from '/snippets/source_connectors/airtable.v1.py.mdx';

<CodeGroup>

<AirtableSh />

<AirtablePy />

<AirtablePyV2 />
<AirtablePyV1 />
</CodeGroup>

import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';

For a full list of the options that the Unstructured Ingest CLI accepts check `unstructured-ingest airtable --help`.
<SharedPartitionByAPIOSS/>
2 changes: 1 addition & 1 deletion open-source/ingest/source-connectors/astradb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedContentAstraDB from '/snippets/sc-shared-text/astradb-cli-api.mdx';

<SharedContentAstraDB/>

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.
Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/source-connectors/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedContentAzure from '/snippets/sc-shared-text/azure-cli-api.mdx';

<SharedContentAzure/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/source-connectors/dropbox.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedContentDropbox from '/snippets/sc-shared-text/dropbox-cli-api.mdx';

<SharedContentDropbox/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/source-connectors/mongodb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedContentMongoDB from '/snippets/sc-shared-text/mongodb-cli-api.mdx';

<SharedContentMongoDB/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

Expand Down
2 changes: 1 addition & 1 deletion open-source/ingest/source-connectors/one-drive.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedContentOneDrive from '/snippets/sc-shared-text/onedrive-cli-api.mdx

<SharedContentOneDrive/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

Expand Down
14 changes: 14 additions & 0 deletions snippets/general-shared-text/airtable-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
The Airtable connector dependencies:

```bash CLI, Python
pip install "unstructured-ingest[airtable]"
```

import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';

<AdditionalIngestDependencies />

These environment variables:

- `AIRTABLE_TOKEN` - The Airtable personal access token, represented by `--personal-access-token` (CLI) or `personal_access_token` (Python).
- `AIRTABLE_PATHS` - The list of Airtable paths to process, represented by `--list-of-paths` (CLI) or `list_of_paths` (Python).
27 changes: 27 additions & 0 deletions snippets/general-shared-text/airtable.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
The Airtable connector prerequisites:

- An [Airtable](https://www.airtable.com/) account. [Create a free Airtable account](https://airtable.com/signup).
- An Airtable personal access token. [Create a personal access token](https://support.airtable.com/docs/creating-and-using-api-keys-and-access-tokens).
- The ID of the Airtable base to access. [Create a base](https://www.airtable.com/guides/build/create-a-base). [Get a base's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).
- The ID of the table to access in the base. [Create a table](https://www.airtable.com/guides/build/create-a-table). [Get a table's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).
- The ID of the view to access in the table. [Create a view](https://www.airtable.com/guides/build/create-custom-views-of-data). [Get a view's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).

By default, Unstructured processes all tables from all bases within an Airtable organization. You can limit the
tables that Unstructured ingests data from within Airtable by specifying a list of Airtable paths.
An Airtable path uses the following structure: `base_id/table_id(optional)/view_id(optional)`

For example, given the following example Airtable URL:

```text
https://airtable.com/appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit/viwDcpzf9dP0Gqz5J
```

- The base's ID is `appr9nKeXLAtg6bgn`. The base's path is `appr9nKeXLAtg6bgn`.
- The table's ID is `tblZ8uT1GY7NLbWit`. The table's path is `appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit`.
- The view's ID is `viwDcpzf9dP0Gqz5J`. The view's path is `appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit/viwDcpzf9dP0Gqz5J`.

You can call the Airtable API to get lists of available IDs for Airtable bases, tables, and views in bulk, as follows:

- [Base IDs](https://airtable.com/developers/web/api/list-bases)
- [Table and view IDs](https://airtable.com/developers/web/api/get-base-schema)
- [Base, table, and view IDs](https://pyairtable.readthedocs.io/en/latest/metadata.html)
9 changes: 9 additions & 0 deletions snippets/sc-shared-text/airtable-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Connect Airtable to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your data and store structured outputs locally on your filesystem.

You will need:

import AirtableShared from '/snippets/general-shared-text/airtable.mdx';
import AirtableSharedCLIAPI from '/snippets/general-shared-text/airtable-cli-api.mdx';

<AirtableShared />
<AirtableSharedCLIAPI />
22 changes: 0 additions & 22 deletions snippets/sc-shared-text/airtable.mdx

This file was deleted.

30 changes: 0 additions & 30 deletions snippets/source_connectors/airtable.py.mdx

This file was deleted.

7 changes: 4 additions & 3 deletions snippets/source_connectors/airtable.sh.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
```bash Shell
```bash CLI
#!/usr/bin/env bash

unstructured-ingest \
airtable \
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
--personal-access-token $AIRTABLE_PERSONAL_ACCESS_TOKEN \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--num-processes 2 \
--reprocess \
--strategy hi_res
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL
```
Loading