Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions api-reference/ingest/source-connectors/slack.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,24 @@
title: Slack
---

import SharedContentSlack from '/snippets/sc-shared-text/slack.mdx';
import NewDocument from '/snippets/general-shared-text/new-document.mdx';

<NewDocument />

import SharedContentSlack from '/snippets/sc-shared-text/slack-cli-api.mdx';
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';

<SharedContentSlack/>
<SharedAPIKeyURL/>

Make sure to set the `--partition-by-api` flag and pass in your API key with `--api-key`:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import SlackAPISh from '/snippets/source_connectors/slack_api.sh.mdx';
import SlackAPIPy from '/snippets/source_connectors/slack_api.py.mdx';
import SlackAPISh from '/snippets/source_connectors/slack.sh.mdx';
import SlackAPIPyV2 from '/snippets/source_connectors/slack.v2.py.mdx';
import SlackAPIPyV1 from '/snippets/source_connectors/slack.v1.py.mdx';

<CodeGroup>

<SlackAPISh />

<SlackAPIPy />

</CodeGroup>

Additionally, if you're using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API
deployed on Azure or AWS, you also need to specify the API URL via the `--partition-endpoint` argument.
<SlackAPIPyV2 />
<SlackAPIPyV1 />
</CodeGroup>
23 changes: 16 additions & 7 deletions open-source/ingest/source-connectors/slack.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,28 @@
title: Slack
---

import SharedContentSlack from '/snippets/sc-shared-text/slack.mdx';
import NewDocument from '/snippets/general-shared-text/new-document.mdx';

<NewDocument />

import SharedContentSlack from '/snippets/sc-shared-text/slack-cli-api.mdx';

<SharedContentSlack/>

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.

import SlackSh from '/snippets/source_connectors/slack.sh.mdx';
import SlackPy from '/snippets/source_connectors/slack.py.mdx';
import SlackPyV2 from '/snippets/source_connectors/slack.v2.py.mdx';
import SlackPyV1 from '/snippets/source_connectors/slack.v1.py.mdx';

<CodeGroup>

<SlackSh />

<SlackPy />

<SlackPyV2 />
<SlackPyV1 />
</CodeGroup>

For a full list of the options the Unstructured Ingest CLI accepts check `unstructured-ingest slack --help`.
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';

<SharedPartitionByAPIOSS/>
25 changes: 25 additions & 0 deletions snippets/general-shared-text/slack-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
The Slack connector dependencies:

```bash
pip install "unstructured-ingest[slack]"
```

import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';

<AdditionalIngestDependencies />

These environment variables:

- `SLACK_BOT_USER_OAUTH_TOKEN` - The OAuth token for the Slack app, represented by `--token` (CLI) or `token` (Python).

To specify the starting and ending date and time range for the channels to be processed:

- For the CLI, use one of the following supported formats:

- `YYYY-MM-DD`
- `YYYY-MM-DDTHH:MM:SS`
- `YYYY-MM-DDTHH:MM:SSZ`
- `YYYY-MM-DD+HH:MM:SS`
- `YYYY-MM-DD-HH:MM:SS`

- For Python, use the `datetime.datetime` function.
15 changes: 15 additions & 0 deletions snippets/general-shared-text/slack.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
The Slack prerequisites:

- A Slack app. Create a Slack app by following [Step 1: Creating an app](https://api.slack.com/quickstart#creating).
- The app must have the `channels:history` OAuth scope. Give the app this scope by following [Step 2: Requesting scopes](https://api.slack.com/quickstart#scopes).
- The app must be installed and authorized for the target Slack workspace. Install and authorize the app by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).
- The app's access token. Get this token by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).
- Add the app to the target channels in the Slack workspace. To do this from the channel, open the channel's details page, click the **Integrations** tab, click **Add apps**, and follow the on-screen directions to install the app.
- The channel ID for each target channel. To get this ID, open the channel's details page, and look for the **Channel ID** field on the **About** tab.
- The starting and ending date and time range for the channels to be processed. Supported formats include:

- `YYYY-MM-DD`
- `YYYY-MM-DDTHH:MM:SS`
- `YYYY-MM-DDTHH:MM:SSZ`
- `YYYY-MM-DD+HH:MM:SS`
- `YYYY-MM-DD-HH:MM:SS`
9 changes: 9 additions & 0 deletions snippets/sc-shared-text/slack-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Connect Slack to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

You will need:

import SharedSlack from '/snippets/general-shared-text/slack.mdx';
import SharedSlackCLIAPI from '/snippets/general-shared-text/slack-cli-api.mdx';

<SharedSlack />
<SharedSlackCLIAPI />
17 changes: 0 additions & 17 deletions snippets/sc-shared-text/slack.mdx

This file was deleted.

18 changes: 12 additions & 6 deletions snippets/source_connectors/slack.sh.mdx
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
```bash Shell
```bash CLI
#!/usr/bin/env bash

# Chunking and embedding are optional.

unstructured-ingest \
slack \
--channels 12345678 \
--token 12345678 \
--token $SLACK_BOT_USER_OAUTH_TOKEN \
--channels C03FVNHR70A,C03FVNRG43D \
--start-date 2024-10-22 \
--end-date 2024-10-23 \
--download-dir $LOCAL_FILE_DOWNLOAD_DIR \
--chunking-strategy by_title \
--embedding-provider huggingface \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--start-date 2023-04-01T01:00:00-08:00 \
--end-date 2023-04-02 \
--strategy hi_res
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL
```
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
```python Python
```python Python Ingest v1
import os
from datetime import datetime

from unstructured_ingest.connector.slack import SimpleSlackConfig, SlackAccessConfig
from unstructured_ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
Expand All @@ -18,11 +19,11 @@ if __name__ == "__main__":
),
connector_config=SimpleSlackConfig(
access_config=SlackAccessConfig(
token=os.getenv("SLACK_TOKEN"),
token=os.getenv("SLACK_BOT_USER_OAUTH_TOKEN"),
),
channels=["12345678"],
start_date="2023-04-01T01:00:00-08:00",
end_date="2023-04-02,",
channels=["C03FVNHR70A", "C03FVNRG43D"],
start_date=datetime(year=2024, month=10, day=22),
end_date=datetime(year=2024, month=10, day=23)
),
)
runner.run()
Expand Down
48 changes: 48 additions & 0 deletions snippets/source_connectors/slack.v2.py.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
```python Python Ingest v2
import os
from datetime import datetime

from unstructured_ingest.v2.pipeline.pipeline import Pipeline
from unstructured_ingest.v2.interfaces import ProcessorConfig

from unstructured_ingest.v2.processes.connectors.slack import (
SlackIndexerConfig,
SlackDownloaderConfig,
SlackConnectionConfig,
SlackAccessConfig
)

from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
from unstructured_ingest.v2.processes.connectors.local import LocalUploaderConfig

# Chunking and embedding are optional.

if __name__ == "__main__":
Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=SlackIndexerConfig(
channels=["C03FVNHR70A", "C03FVNRG43D"],
start_date=datetime(year=2024, month=10, day=22),
end_date=datetime(year=2024, month=10, day=23)
),
downloader_config=SlackDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
source_connection_config=SlackConnectionConfig(
access_config=SlackAccessConfig(token=os.getenv("SLACK_BOT_USER_OAUTH_TOKEN"))
),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
additional_partition_args={
"split_pdf_page": True,
"split_pdf_allow_failed": True,
"split_pdf_concurrency_level": 15
}
),
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
).run()
```
31 changes: 0 additions & 31 deletions snippets/source_connectors/slack_api.py.mdx

This file was deleted.

15 changes: 0 additions & 15 deletions snippets/source_connectors/slack_api.sh.mdx

This file was deleted.