doc: Add performance tuning section (#4639)

I started this PR mainly as a way to document the new wildcard options, but ultimately updated a few things relating to the CloudQuery docs: - added brief notes about wildcards to the source plugin reference sections for `tables` and `skip_tables` - placed detailed information about wildcards in a `Performance tuning` page under `Advanced Topics`. The idea is that we will expand this page over time. - updated the docs relating to `concurrency` (`resource_concurrency` and `table_concurrency` were deprecated) - some other misc fixes
cloudquery · Nov 15, 2022 · b90ff96 · b90ff96
1 parent 75162dd
commit b90ff96
Show file tree

Hide file tree

Showing 9 changed files with 75 additions and 18 deletions.
diff --git a/website/components/mdx/_configure.mdx b/website/components/mdx/_configure.mdx
@@ -91,7 +91,7 @@ spec:
 ```
 
 - All general options for source spec you can find under [references/source-spec](/docs/reference/source-spec).
-- All options for `postgresql` destination plugin spec you can find [here](https://github.com/cloudquery/cloudquery/blob/main/plugins/source/aws/docs/configuration.md)
+- All options for `aws` source plugin spec you can find [here](https://github.com/cloudquery/cloudquery/blob/main/plugins/source/aws/docs/configuration.md)
 
 <Callout>
 

diff --git a/website/pages/docs/advanced-topics/_meta.json b/website/pages/docs/advanced-topics/_meta.json
@@ -1,8 +1,9 @@
 {
   "environment-variable-substitution": "Environment Variable Substitution",
-  "running-cloudquery-in-parallel": "Running CloudQuery in Parallel",
-  "proxy-configuration": "Proxy Configuration",
   "docker": "Docker",
-  "security": "Security",
-  "rate-limiting": "Rate Limiting"
+  "proxy-configuration": "Proxy Configuration",
+  "performance-tuning": "Performance Tuning",
+  "rate-limiting": "Rate Limiting",
+  "running-cloudquery-in-parallel": "Running CloudQuery in Parallel",
+  "security": "Security"
 }
diff --git a/website/pages/docs/advanced-topics/performance-tuning.md b/website/pages/docs/advanced-topics/performance-tuning.md
@@ -0,0 +1,58 @@
+---
+title: Performance Tuning
+---
+
+# Performance Tuning
+
+This page contains a number of tips and tricks for improving the performance of `cloudquery sync` for large cloud estates.
+
+## Wildcard Matching
+
+import { Callout } from 'nextra-theme-docs'
+
+Sometimes the easiest way to improve the performance of the `sync` command is to limit the number of tables that get synced. The `tables` and `skip_tables` source config options both support wildcard matching. This means that you can use `*` anywhere in a name to match multiple tables.
+
+For example, when using the `aws` source plugin, it is possible to use a wildcard pattern to match all tables related to AWS EC2:
+
+```yaml
+tables:
+ - aws_ec2_*
+```
+
+This can also be combined with `skip_tables`. For example, let's say we want to include all EC2 tables, but not EBS-related ones:
+
+```yaml
+tables: 
+- "aws_ec2_*"
+skip_tables:
+- "aws_ec2_ebs_*"
+```
+
+<Callout> 
+
+The CloudQuery CLI will warn if a wildcard pattern does not match any known tables.
+
+</Callout>
+
+## Improving Performance by Skipping Relations
+
+Some tables require many API calls to sync. This is especially true of tables that depend on other tables, because often multiple API calls need to be made for every row in the parent table. This can lead to thousands of API calls, increasing the time it takes to sync. If you know that some child tables are not strictly necessary, you can improve sync performance by skipping them with the `skip_tables` setting.
+
+Let's say we have three tables: `A`, `B` and `C`. `A` is the top-level table. `B` depends on it, and `C` depends on `B`:
+
+```text
+A 
+↳ B
+  ↳ C
+```
+
+We might want table `A`, but not need the information in table `B`. We can then write our source config as:
+
+```yaml
+tables:
+ - A
+skip_tables:
+ - B
+```
+
+By skipping table `B`, we are automatically skipping its dependant table `C` as well. Likewise, by including table `A`, we are automatically including its dependant tables `B` and `C` as well, unless they are explicitly skipped in the `skip_tables` section (like in the example above).
diff --git a/website/pages/docs/advanced-topics/rate-limiting.md b/website/pages/docs/advanced-topics/rate-limiting.md
@@ -4,12 +4,8 @@ title: Rate Limiting
 
 # Rate Limiting
 
-There are two main levers to control the rate at which CloudQuery fetches resources from cloud providers. These are the `table_concurrency` and `resource_concurrency` options that can be specified as [part of the source spec](/docs/reference/source-spec). Note that these options were introduced in CloudQuery CLI v1.0.8.
+There is currently one main lever to control the rate at which CloudQuery fetches resources from cloud providers. This setting is called `concurrency`, and it can be specified as [part of the source spec](/docs/reference/source-spec). Note that this option was introduced in CloudQuery CLI v1.4.1.
 
-## Table Concurrency
+## Concurrency
 
-`table_concurrency` controls the number of concurrent tables that will be processed while performing a sync. Setting this to a low number will reduce the number of concurrent requests, making it less likely to hit rate limits. The trade-off is that syncs will take longer to complete.
-
-## Resource Concurrency
-
-`resource_concurrency` is an approximate global limit on how many concurrent requests will be made to fetch details about the initial rows returned by a table's resolver. This limit applies only to top-level tables, and child relations will not be limited. Setting this to a lower number will also reduce the number of concurrent requests made, regardless of how many tables are being synced at any one time. As with `table_concurrency`, the trade-off is that syncs will take longer to complete.
+`concurrency` provides rough control over the number of concurrent requests that will be made while performing a sync. Setting this to a low number will reduce the number of concurrent requests, reducing the memory used and making the sync less likely to hit rate limits. The trade-off is that syncs will take longer to complete.
diff --git a/website/pages/docs/quickstart/linux.mdx b/website/pages/docs/quickstart/linux.mdx
@@ -1,5 +1,5 @@
 ---
-title: Linux
+title: Quickstart - Linux
 ---
 
 import Intro from '../../../components/mdx/_intro.mdx'

diff --git a/website/pages/docs/quickstart/macOS.mdx b/website/pages/docs/quickstart/macOS.mdx
@@ -1,5 +1,5 @@
 ---
-title: macOS
+title: Quickstart - macOS
 ---
 
 import Intro from '../../../components/mdx/_intro.mdx'

diff --git a/website/pages/docs/quickstart/windows.mdx b/website/pages/docs/quickstart/windows.mdx
@@ -1,5 +1,5 @@
 ---
-title: Windows
+title: Quickstart - Windows
 ---
 
 import Intro from '../../../components/mdx/_intro.mdx'

diff --git a/website/pages/docs/reference/cli/_meta.json b/website/pages/docs/reference/cli/_meta.json
@@ -1,4 +1,5 @@
 {
   "cloudquery": "cloudquery",
-  "cloudquery_sync": "cloudquery sync"
+  "cloudquery_sync": "cloudquery sync",
+  "cloudquery_migrate": "cloudquery migrate"
 }
diff --git a/website/pages/docs/reference/source-spec.md b/website/pages/docs/reference/source-spec.md
@@ -14,6 +14,7 @@ spec:
   name: "aws"
   path: "cloudquery/aws"
   version: "v6.0.0" # latest version of aws plugin
+  tables: ["*"]
   destinations: ["postgresql"]
 
   spec:
@@ -57,13 +58,13 @@ Configures how to retrieve the plugin. The contents depend on the value of `regi
 
 (`[]string`, optional, default: `["*"]`)
 
-Tables to sync from the source plugin.
+Tables to sync from the source plugin. It accepts wildcards. For example, to match all EC2-related tables, : `aws_ec2_*`. Matched tables will also sync all their descendant tables, unless these are skipped in `skip_tables`.
 
 ### skip_tables
 
 (`[]string`, optional, default: `[]`)
 
-Useful when using glob in `tables`, specify which tables to skip when syncing the source plugin.
+Useful when using wildcards in `tables`. Specify which tables to skip when syncing the source plugin. Note that if a table with dependencies is skipped, all its dependant tables will also be skipped.
 
 ### destinations