Skip to content

Commit

Permalink
Docs: update Google Sheets docs (#29492)
Browse files Browse the repository at this point in the history
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
  • Loading branch information
ChristoGrab and sherifnada committed Aug 21, 2023
1 parent 5393943 commit 39b7da5
Show file tree
Hide file tree
Showing 2 changed files with 110 additions and 37 deletions.
39 changes: 32 additions & 7 deletions docs/integrations/sources/google-sheets.inapp.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,47 @@
## Prerequisites

- Access to a Google Sheet
- Access to a Google spreadsheet

:::info
The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. To replicate multiple spreadsheets, set up multiple Google Sheets source connectors in your Airbyte instance.
:::

## Setup guide

1. Enter a name for the Google Sheets connector.
2. Authenticate your Google account via OAuth or Service Account Key Authentication.
- **(Recommended)** To authenticate your Google account via OAuth, click **Sign in with Google** and complete the authentication workflow.
- To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22).
1. For **Source name**, enter a name to help you identify this source.
2. Select your authentication method:

<!-- env:cloud -->

#### For Airbyte Cloud

- **(Recommended)** Select **Authenticate via Google (OAuth)** from the Authentication dropdown, click **Sign in with Google** and complete the authentication workflow.

<!-- /env:cloud -->
<!-- env:oss -->

#### For Airbyte Open Source

- **(Recommended)** Select **Service Account Key Authentication** from the dropdown and enter your Google Cloud service account key in JSON format:

```js
{ "type": "service_account", "project_id": "YOUR_PROJECT_ID", "private_key_id": "YOUR_PRIVATE_KEY", ... }
```

- To authenticate your Google account via OAuth, select **Authenticate via Google (OAuth)** from the dropdown and enter your Google application's client ID, client secret, and refresh token.

For detailed instructions on how to generate a service account key or OAuth credentials, refer to the [full documentation](https://docs.airbyte.io/integrations/sources/google-sheets#setup-guide).

<!-- /env:oss -->

3. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**.
4. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200.
4. (Optional) You may enable the option to **Convert Column Names to SQL-Compliant Format**. Enabling this option will allow the connector to convert column names to a standardized, SQL-friendly format. For example, a column name of `Café Earnings 2022` will be converted to `cafe_earnings_2022`. We recommend enabling this option if your target destination is SQL-based (ie Postgres, MySQL). Set to false by default.
5. (Optional) For **Row Batch Size**, you may specify the number of records you want to fetch per request to the Google API. By adjusting this value, you can balance the efficiency of the data retrieval process with [Google's request quotas](#performance-consideration). The default value of 200 should suffice for most use cases.
6. Click **Set up source** and wait for the tests to complete.

### Google Sheets format requirements
- Sheet names and column headers must only contain alphanumeric characters or `_`, as specified in the [**Airbyte Protocol**](../../understanding-airbyte/airbyte-protocol.md). For example, if your sheet or column header is named `the data`, rename it to `the_data`. This restriction does not apply to non-header cell values.

- When replicating data to a SQL-based destination (such as Postgres or MySQL), sheet names and column headers must contain **only alphanumeric characters or `_`**. For example, if your sheet or column header is named `The Data 2022`, it should be renamed to `the_data_2022`. To automatically convert column names to this format, enable the **Convert Column Names to SQL-Compliant Format** option. Please note that this option only converts column names and not sheet names. This naming restriction does not apply to non-header cell values.
- Airbyte only supports replicating [Grid](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#SheetType) sheets.

For detailed information on supported sync modes, supported streams, performance considerations, refer to the full documentation for [Google Sheets](https://docs.airbyte.com/integrations/sources/google-sheets/).
108 changes: 78 additions & 30 deletions docs/integrations/sources/google-sheets.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,105 @@
# Google Sheets

This page guides you through the process of setting up the Google Sheets source connector.
This page contains the setup guide and reference information for the Google Sheets source connector.

:::info
The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. To replicate multiple spreadsheets, set up multiple Google Sheets source connectors in your Airbyte instance.
The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. Each sheet (tab) within a spreadsheet can be replicated. To replicate multiple spreadsheets, set up multiple Google Sheets source connectors in your Airbyte instance. No other files in your Google Drive are accessed.
:::

## Prerequisites

- Access to a Google spreadsheet

## Setup guide

The Google Sheets source connector supports authentication via either OAuth or Service Account Key Authentication.

<!-- env:cloud -->
For **Airbyte Cloud** users, we highly recommend using OAuth, as it significantly simplifies the setup process and allows you to authenticate [directly from the Airbyte UI](#set-up-the-google-sheets-source-connector-in-airbyte).

<!-- /env:cloud -->

<!-- env:oss -->

For **Airbyte Open Source** users, we recommend using Service Account Key Authentication. Follow the steps below to create a service account, generate a key, and enable the Google Sheets API.

:::note
If you prefer to use OAuth for authentication with **Airbyte Open Source**, you can follow [Google's OAuth instructions](https://developers.google.com/identity/protocols/oauth2) to create an authentication app. Be sure to set the scopes to `https://www.googleapis.com/auth/spreadsheets.readonly`. You will need to obtain your client ID, client secret, and refresh token for the connector setup.
:::

### Set up the service account key (Airbyte Open Source)

#### Create a service account

1. Open the [Service Accounts page](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts) in your Google Cloud console.
2. Select an existing project, or create a new project.
3. At the top of the page, click **+ Create service account**.
4. Enter a name and description for the service account, then click **Create and Continue**.
5. Under **Service account permissions**, select the roles to grant to the service account, then click **Continue**. We recommend the **Viewer** role.

#### Generate a key

1. Go to the [API Console/Credentials](https://console.cloud.google.com/apis/credentials) page and click on the email address of the service account you just created.
2. In the **Keys** tab, click **+ Add key**, then click **Create new key**.
3. Select **JSON** as the Key type. This will generate and download the JSON key file that you'll use for authentication. Click **Continue**.

#### Enable the Google Sheets API

1. Go to the [API Console/Library](https://console.cloud.google.com/apis/library) page.
2. Make sure you have selected the correct project from the top.
3. Find and select the **Google Sheets API**.
4. Click **ENABLE**.

If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22).

**For Airbyte Cloud:**
<!-- /env:oss -->

### Set up the Google Sheets source connector in Airbyte

To set up Google Sheets as a source in Airbyte Cloud:

1. [Log into your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account.
1. [Log in to your Airbyte Cloud](https://cloud.airbyte.com/workspaces) or Airbyte Open Source account.
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ New source**.
3. On the Set up the source page, select **Google Sheets** from the **Source type** dropdown.
4. Enter a name for the Google Sheets connector.
5. Authenticate your Google account via OAuth or Service Account Key Authentication.
- **(Recommended)** To authenticate your Google account via OAuth, click **Sign in with Google** and complete the authentication workflow.
- To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22).
6. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**.
7. For **Row Batch Size**, define the number of records you want the Google API to fetch at a time. The default value is 200.
<!-- /env:cloud -->
3. Find and select **Google Sheets** from the list of available sources.
4. For **Source name**, enter a name to help you identify this source.
5. Select your authentication method:

<!-- env:cloud -->

#### For Airbyte Cloud

- **(Recommended)** Select **Authenticate via Google (OAuth)** from the Authentication dropdown, click **Sign in with Google** and complete the authentication workflow.

<!-- /env:cloud -->
<!-- env:oss -->

**For Airbyte Open Source:**
#### For Airbyte Open Source

To set up Google Sheets as a source in Airbyte Open Source:
- **(Recommended)** Select **Service Account Key Authentication** from the dropdown and enter your Google Cloud service account key in JSON format:

1. [Enable the Google Cloud Platform APIs for your personal or organization account](https://support.google.com/googleapi/answer/6158841?hl=en).
```js
{ "type": "service_account", "project_id": "YOUR_PROJECT_ID", "private_key_id": "YOUR_PRIVATE_KEY", ... }
```

:::info
The connector only finds the spreadsheet you want to replicate; it does not access any of your other files in Google Drive.
:::
- To authenticate your Google account via OAuth, select **Authenticate via Google (OAuth)** from the dropdown and enter your Google application's client ID, client secret, and refresh token.

<!-- /env:oss -->

2. Go to the Airbyte UI and in the left navigation bar, click **Sources**. In the top-right corner, click **+ New source**.
3. On the Set up the source page, select **Google Sheets** from the Source type dropdown.
4. Enter a name for the Google Sheets connector.
5. Authenticate your Google account via OAuth or Service Account Key Authentication:
- To authenticate your Google account via OAuth, enter your Google application's [client ID, client secret, and refresh token](https://developers.google.com/identity/protocols/oauth2).
- To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22).
6. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**.
7. (Optional) You may enable the option to **Convert Column Names to SQL-Compliant Format**. Enabling this option will allow the connector to convert column names to a standardized, SQL-friendly format. For example, a column name of `Café Earnings 2022` will be converted to `cafe_earnings_2022`. We recommend enabling this option if your target destination is SQL-based (ie Postgres, MySQL). Set to false by default.
8. (Optional) For **Row Batch Size**, you may specify the number of records you want to fetch per request to the Google API. By adjusting this value, you can balance the efficiency of the data retrieval process with [Google's request quotas](#performance-consideration). The default value of 200 should suffice for most use cases.
9. Click **Set up source** and wait for the tests to complete.

### Output schema

Each sheet in the selected spreadsheet is synced as a separate stream. Each selected column in the sheet is synced as a string field.

**Note: Sheet names and column headers must contain only alphanumeric characters or `_`, as specified in the** [**Airbyte Protocol**](../../understanding-airbyte/airbyte-protocol.md). For example, if your sheet or column header is named `the data`, rename it to `the_data`. This restriction does not apply to non-header cell values.
:::caution
When replicating data to a SQL-based destination (such as Postgres or MySQL), sheet names and column headers must contain **only alphanumeric characters or `_`**. For example, if your sheet or column header is named `The Data 2022`, it should be renamed to `the_data_2022`. To automatically convert column names to this format, enable the **Convert Column Names to SQL-Compliant Format** option. Please note that this option only converts column names and not sheet names. This naming restriction does not apply to non-header cell values.
:::

Airbyte only supports replicating [Grid](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#SheetType) sheets.

<!-- env:oss -->

## Supported sync modes

The Google Sheets source connector supports the following sync modes:
Expand All @@ -70,7 +115,12 @@ The Google Sheets source connector supports the following sync modes:

## Performance consideration

The [Google API rate limit](https://developers.google.com/sheets/api/limits) is 100 requests per 100 seconds per user and 500 requests per 100 seconds per project. Airbyte batches requests to the API in order to efficiently pull data and respects these rate limits. We recommended not using the same service user for more than 3 instances of the Google Sheets source connector to ensure high transfer speeds.
The [Google API rate limits](https://developers.google.com/sheets/api/limits) are:

- 300 read requests per minute per project
- 60 requests per minute per user per project

Airbyte batches requests to the API in order to efficiently pull data and respect these rate limits. We recommend not using the same user or service account for more than 3 instances of the Google Sheets source connector to ensure high transfer speeds.

## Changelog

Expand Down Expand Up @@ -116,5 +166,3 @@ The [Google API rate limit](https://developers.google.com/sheets/api/limits) is
| 0.1.6 | 2021-01-27 | [1668](https://github.com/airbytehq/airbyte/pull/1668) | Adopt connector best practices |
| 0.1.5 | 2020-12-30 | [1438](https://github.com/airbytehq/airbyte/pull/1438) | Implement backoff |
| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file |

<!-- /env:oss -->

0 comments on commit 39b7da5

Please sign in to comment.