Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions snippets/general-shared-text/sharepoint-api-placeholders.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
- `<name>` (_required_) - A unique name for this connector.
- `<client-id>` (_required_) - The client ID provided by SharePoint for the app registration.
- `<site>` (_required_) - The base URL of the SharePoint site to connect to.
- `<client-cred>` (_required_) - The client secret associated with the client ID.
- `<tenant>` (_required) - The **Directory (tenant) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
- `<authority-url>` - The authentication token provider URL for the Entra ID app registration. The default is https://login.microsoftonline.com.
- `<user-pname>` (_required_) - The UPN for the OneDrive account in the Entra ID tenant.
- `<client-cred>` (_required_) - The **Client secret** for the Entra ID app registration.
- `<path>` - The path from which to start parsing files. The default is `Shared Documents` if not otherwise specified.
- For `recursive` (source connector only), set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.
- For `recursive`, set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.
17 changes: 7 additions & 10 deletions snippets/general-shared-text/sharepoint-cli-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d

The following environment variables:

- `SHAREPOINT_APP_CLIENT_ID` - The application (client) ID for the SharePoint app principal, represented by `--client-id` (CLI) or `client_id` (Python).
- `SHAREPOINT_APP_CLIENT_SECRET` - The client secret for the SharePoint app principal, represented by `--client-cred` (CLI) or `client_cred` (Python).
- `SHAREPOINT_SITE` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
- `SHAREPOINT_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python).

{/*
- `SHAREPOINT_APP_PERMISSIONS_CLIENT_ID` - The associated Azure application (client) ID, represented by `--permissions-application-id` (CLI) or `permissions_application_id` (Python).
- `SHAREPOINT_APP_PERMISSIONS_CLIENT_SECRET` - The client secret for the Azure application, represented by `--permissions-client-cred` (CLI) or `permissions_client_cred` (Python).
- `SHAREPOINT_APP_PERMISSIONS_TENANT` - The domian name of the tenant for the Azure application, which is typically `<organization-name>.onmicrosoft.com`, and which is represented by `--permissions-tenant` (CLI) or `permissions_tenant` (Python).
*/}
- `ENTRA_ID_USER_PRINCIPAL_NAME` - The User Principal Name (UPN) for the target OneDrive account in the Microsoft Entra ID tenant.
- `SHAREPOINT_SITE_URL` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
- `SHAREPOINT_SITE_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python).
- `ENTRA_ID_APP_CLIENT_ID` - The **Application (client) ID** value for the Microsoft Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
- `ENTRA_ID_APP_TENANT_ID` - The **Directory (tenant) ID** value for the Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
- `ENTRA_ID_APP_CLIENT_SECRET` - The **Client secret** value for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
- `ENTRA_ID_TOKEN_AUTHORITY_URL` - The token authority URL for the Entra ID app registration (which is typically `https://login.microsoftonline.com`), represented by `--authority-url` (CLI) or `authority_url` (Python).
9 changes: 6 additions & 3 deletions snippets/general-shared-text/sharepoint-platform.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ Fill in the following fields:
- **Name** (_required_): A unique name for this connector.
- **Site URL** (_required_): The base URL of the SharePoint site to connect to.
- **Path** (_required_): The path from which to start parsing files, for example `Shared Documents`.
- **Recursive** (source connector only): Check this box to recursively process data from subfolders within the specified path.
- **Client ID** (_required_): The client ID provided by SharePoint for the app principal.
- **Client Credentials** (_required_): The client secret associated with the client ID.
- **Recursive**: Check this box to recursively process data from subfolders within the specified path.
- **Client ID** (_required_): The **Application (client) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
- **Tenant ID** (_required_): The **Directory (tenant) ID** for the Entra ID app registration.
- **User Principal Name (UPN)** (_required_): The UPN for the OneDrive account in the Entra ID tenant.
- **Client Credentials** (_required_): The **Client secret** for the Entra ID app registration.
- **Authority URL** (_required_): The authentication token provider URL for the Entra ID app registration. The default is `https://login.microsoftonline.com`.
214 changes: 103 additions & 111 deletions snippets/general-shared-text/sharepoint.mdx
Original file line number Diff line number Diff line change
@@ -1,117 +1,109 @@
<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/HHCV7rV8fS0"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

- The SharePoint site URL.
<Note>
If you are setting up the SharePoint connector for the first time, you can skip past this note.

Previous versions of the SharePoint connector relied on SharePoint app principals for authentication. Current versions of the
SharePoint connector no longer support these SharePoint app principals. Microsoft deprecated support for Share Point app principals on November 27, 2023.
SharePoint app principals will no longer work for SharePoint tenants that were created on or after November 1, 2024, and they will stop working
for all SharePoint tenants as of April 2, 2026. [Learn more](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/retirement-announcement-for-azure-acs).

Current versions of the SharePoint connector now rely on Microsoft Entra ID app registrations for authentication.

To migrate from SharePoint app princpals to Entra ID app regisrations, replace the following settings in your existing SharePoint connector,
as listed in the requirements following this note:

- Replace the deprecated SharePoint app principal's application client ID value with your replacement Entra ID app registration's **Application (client) ID** value.
- Replace the deprecated SharePoint app principal's client secret value with your replacement Entra ID app registration's **Client secret** value.
- Add your replacement Entra ID app registration's **Directory (tenant) ID** value, token authority URL value, and the correct set of Microsoft Graph access permissions for SharePoint Online.

If you need migration help, get assistance from our [Slack community](https://short.unstructured.io/pzw05l7) or [contact us](https://unstructured.io/contact) directly.
</Note>

- A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
[Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
[Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
[Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
- A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
(Even if you only plan to use SharePoint Online, you still need a plan that includes OneDrive, because the SharePoint connector is built on OneDrive technology.)
[Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
[Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
[Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.
- The SharePoint Online and OneDrive plans must share the same Microsoft Entra ID tenant.
[Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).
- The User Principal Name (UPN) for the OneDrive account in the Microsoft Entra ID tenant. This is typically the OneDrive account user's email address. To find a UPN:

1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
2. In the **Users** section, click **Active users**.
3. Locate the user account in the list of active users.
4. The UPN is displayed in the **Username** column.

The following video shows how to get a UPN:

<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/H0yYfhfyCE0"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

- The SharePoint Online site URL.

- Site collection-level URLs typically have the format `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
- Root site collection-level URLs typically have the format `https://<tenant>.sharepoint.com`.
- To process all sites within a tenant, use a site URL of `https://<tenant>-admin.sharepoint.com`.
- To process all sites within a SharePoint tenant, use a site URL of `https://<tenant>-admin.sharepoint.com`.

[Learn more](https://learn.microsoft.com/microsoft-365/community/query-string-url-tricks-sharepoint-m365).

- The path in the SharePoint site from which to start parsing files, for example `"Shared Documents"`. If the connector is to process all sites within the tenant, this filter will be applied to all site document libraries.
- A SharePoint app principal with its application (client) ID, client secret, and the appropriate access permissions.

Complete the steps in the following sections, depending on whether you want to access sites at the site collection level, the
root site collection level, or all sites within a tenant.

<Note>
Two of the main factors in the following sections are the scope of access
and the level of administrative permissions required to create the app principal. Tenant-wide app principals offer the broadest access
but require the highest level of administrative rights, while site collection app principals are more restricted but can be created by users
with lower-level permissions.
</Note>

## Tenant-wide SharePoint app principals

Create a tenant-wide SharePoint app principal when you want the power and flexibility of a principal that can process all sites within a tenant.

SharePoint app principals that are created in the SharePoint admin center have tenant-wide scope and can potentially access all sites within the tenant.
Only global or SharePoint administrators typically have access to the following URLs.

1. To create a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:

`https://<tenant>-admin.sharepoint.com/_layouts/15/appregnew.aspx`

2. To add access permissions to a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:

`https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx`

3. Apply the following permissions XML to the tenant-wide SharePoint app principal:

```xml
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
</AppPermissionRequests>
```
Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).

[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.

## Root site collection-level SharePoint app principals

Create a root site collection-level SharePoint app principal when you want a principal that can only access a root site collection, for example with a URL
that has the format `https://<tenant>.sharepoint.com`.

SharePoint app principals that are created at the root site collection level have a scope limited to the root site collection. Site collection administrators can usually access the following URLs.

1. To create a root site collection-level SharePoint app principal and then get its client ID and client secret, go to the following URL:

`https://<tenant>.sharepoint.com/_layouts/15/appregnew.aspx`

2. To add access permissions to a root site collection-level SharePoint app principal, go to the following URL:

`https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx`

3. Apply the following permissions XML to the root site collection-level SharePoint app principal:

```xml
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
</AppPermissionRequests>
```

Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).

[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.

## Site collection-level SharePoint app principals

Create a site collection-level SharePoint app principal when you want a principal that can only access a specific site collection, for example with a URL
that has or starts with the format `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.

SharePoint app principals that are created at the site collection level have the most limited scope, restricted to the specific subsite and its subsites.
Site owners or those with appropriate permissions on the subsite can access the following URLs.

1. To create a site collection-level SharePoint app principal, go to the following URL:

`https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appregnew.aspx`

2. To add access permissions to a site collection-level SharePoint app principal, go to the following URL:

`https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appinv.aspx`

3. Apply the following permissions XML to the site collection-level SharePoint app principal:

```xml
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
</AppPermissionRequests>
```

Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).

[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
- The path in the SharePoint Online site from which to start parsing files, for example `"Shared Documents"`. If the SharePoint connector is to process all sites within the tenant, this filter will be applied to all site document libraries.

The following video shows how to get the site URL and a path within the site:

<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/E3fRwJU-KTc"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

- The **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Microsoft Entra ID app registration with
the correct set of Microsoft Graph access permissions. These permissions include:

- `Sites.ReadWrite.All` (if both reading and writing are needed)
- `User.Read.All`
[Learn more](https://learn.microsoft.com/answers/questions/2116616/service-principal-access-to-sharepoint-online).
1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).

The following video shows how to create an Entra ID app registration:

<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/aBAY-LKLPSo"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:

<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/X7fnRYyxy0Q"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

- The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`
20 changes: 10 additions & 10 deletions snippets/source_connectors/sharepoint.sh.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@

unstructured-ingest \
sharepoint \
--client-id $SHAREPOINT_APP_CLIENT_ID \
--client-cred $SHAREPOINT_APP_CLIENT_SECRET \
--site $SHAREPOINT_SITE \
--path $SHAREPOINT_PATH \
--no-omit-files \
--omit-pages \
--omit-lists \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--num-processes 2 \
--verbose \
--client-cred $ENTRA_ID_APP_CLIENT_SECRET \
--client-id $ENTRA_ID_APP_CLIENT_ID \
--user-pname $ENTRA_ID_USER_PRINCIPAL_NAME \
--tenant $ENTRA_ID_APP_TENANT_ID \
--authority-url $ENTRA_ID_TOKEN_AUTHORITY_URL \
--site $SHAREPOINT_SITE_URL \
--path $SHAREPOINT_SITE_PATH \
--recursive \
--download-dir $LOCAL_FILE_DOWNLOAD_DIR\
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--strategy hi_res \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
```
Loading