docs: Anchor link checker (#15624)

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
apache · Jan 8, 2024 · 52313c5 · 52313c5
1 parent df5bcd1
commit 52313c5
Show file tree

Hide file tree

Showing 18 changed files with 126 additions and 123 deletions.
diff --git a/.github/workflows/static-checks.yml b/.github/workflows/static-checks.yml
@@ -168,6 +168,7 @@ jobs:
           (cd website && npm install)
           cd website
           npm run build
+          npm run link-lint
           npm run spellcheck
 
       - name: web console

diff --git a/docs/_bin/broken-link-check.py b/docs/_bin/broken-link-check.py
diff --git a/docs/data-management/delete.md b/docs/data-management/delete.md
@@ -24,12 +24,12 @@ title: "Data deletion"
 
 ## By time range, manually
 
-Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports
+Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
 deleting data for time chunks by dropping segments. This is a fast, metadata-only operation.
 
 Deletion by time range happens in two steps:
 
-1. Segments to be deleted must first be marked as ["unused"](../design/architecture.md#segment-lifecycle). This can
+1. Segments to be deleted must first be marked as ["unused"](../design/storage.md#segment-lifecycle). This can
    happen when a segment is dropped by a [drop rule](../operations/rule-configuration.md) or when you manually mark a
    segment unused through the Coordinator API or web console. This is a soft delete: the data is not available for
    querying, but the segment files remains in deep storage, and the segment records remains in the metadata store.

diff --git a/docs/data-management/index.md b/docs/data-management/index.md
@@ -23,7 +23,7 @@ sidebar_label: "Overview"
   ~ under the License.
   -->
 
-Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) in immutable
+Apache Druid stores data [partitioned by time chunk](../design/storage.md) in immutable
 files called [segments](../design/segments.md). Data management operations involving replacing, or deleting,
 these segments include:
 

diff --git a/docs/data-management/schema-changes.md b/docs/data-management/schema-changes.md
@@ -28,7 +28,7 @@ title: "Schema changes"
 Apache Druid allows you to provide a new schema for new data without the need to update the schema of any existing data.
 It is sufficient to update your supervisor spec, if using [streaming ingestion](../ingestion/index.md#streaming), or to
 provide the new schema the next time you do a [batch ingestion](../ingestion/index.md#batch). This is made possible by
-the fact that each [segment](../design/architecture.md#datasources-and-segments), at the time it is created, stores a
+the fact that each [segment](../design/segments.md), at the time it is created, stores a
 copy of its own schema. Druid reconciles all of these individual segment schemas automatically at query time.
 
 ## For existing data

diff --git a/docs/data-management/update.md b/docs/data-management/update.md
@@ -24,7 +24,7 @@ title: "Data updates"
 
 ## Overwrite
 
-Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports
+Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
 overwriting existing data using time ranges. Data outside the replacement time range is not touched. Overwriting of
 existing data is done using the same mechanisms as [batch ingestion](../ingestion/index.md#batch).
 

diff --git a/docs/development/docs-contribute.md b/docs/development/docs-contribute.md
@@ -101,7 +101,8 @@ Now you're up to date, and you can make your changes.
    git checkout -b MY-BRANCH
    ```
 
-Provide a name for your feature branch in `MY-BRANCH`.
+   Provide a name for your feature branch in `MY-BRANCH`.
+
 2. Find the file that you want to make changes to. All the source files for the docs are written in Markdown and located in the `docs` directory. The URL for the page includes the subdirectory the source file is in. For example, the SQL-based ingestion tutorial found at `https://druid.apache.org/docs/latest/tutorials/tutorial-msq-extern.html` is in the `tutorials` subdirectory.
 
    If you're adding a page, create a new Markdown file in the appropriate subdirectory. Then, copy the front matter and Apache license from an existing file. Update the `title` and `id` fields. Don't forget to add it to `website/sidebars.json` so that your new page shows up in the navigation.
@@ -111,6 +112,11 @@ Provide a name for your feature branch in `MY-BRANCH`.
 5. Use the following commands to run the link and spellcheckers locally: 
 
    ```bash
+   cd website
+   # You only need to install once
+   npm install
+   npm run build
+
    npm run spellcheck
    npm run link-lint
    ```
@@ -216,4 +222,4 @@ Before publishing new content or updating an existing topic, you can audit your
 * When American spelling is different from Commonwealth/"British" spelling, use the American spelling.
 * Don’t use terms considered disrespectful. Refer to a list like Google’s [Word list](https://developers.google.com/style/word-list) for guidance and alternatives.
 * Use straight quotation marks and straight apostrophes instead of the curly versions.
-* Introduce a list, a table, or a procedure with an introductory sentence that prepares the reader for what they're about to read.
+* Introduce a list, a table, or a procedure with an introductory sentence that prepares the reader for what they're about to read.
diff --git a/docs/development/experimental-features.md b/docs/development/experimental-features.md
@@ -47,7 +47,7 @@ Note that this document does not track the status of contrib extensions, all of
 
 - [Configuration reference](../configuration/index.md#overlord-operations)
 - [Task reference](../ingestion/tasks.md#locking)
-- [Design](../design/architecture.md#availability-and-consistency)
+- [Design](../design/storage.md#availability-and-consistency)
 
 ## Front coding
 

diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md
@@ -24,7 +24,7 @@ sidebar_label: Overview
   -->
 
 Loading data in Druid is called _ingestion_ or _indexing_. When you ingest data into Druid, Druid reads the data from
-your source system and stores it in data files called [_segments_](../design/architecture.md#datasources-and-segments).
+your source system and stores it in data files called [_segments_](../design/segments.md).
 In general, segment files contain a few million rows each.
 
 For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes or the

diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md
@@ -149,7 +149,7 @@ An example `dataSchema` is:
 ### `dataSource`
 
 The `dataSource` is located in `dataSchema` → `dataSource` and is simply the name of the
-[datasource](../design/architecture.md#datasources-and-segments) that data will be written to. An example
+[datasource](../design/storage.md) that data will be written to. An example
 `dataSource` is:
 
 ```
@@ -304,7 +304,7 @@ An example `metricsSpec` is:
 The `granularitySpec` is located in `dataSchema` → `granularitySpec` and is responsible for configuring
 the following operations:
 
-1. Partitioning a datasource into [time chunks](../design/architecture.md#datasources-and-segments) (via `segmentGranularity`).
+1. Partitioning a datasource into [time chunks](../design/storage.md) (via `segmentGranularity`).
 2. Truncating the timestamp, if desired (via `queryGranularity`).
 3. Specifying which time chunks of segments should be created, for batch ingestion (via `intervals`).
 4. Specifying whether ingestion-time [rollup](./rollup.md) should be used or not (via `rollup`).
@@ -329,7 +329,7 @@ A `granularitySpec` can have the following components:
 | Field | Description | Default |
 |-------|-------------|---------|
 | type |`uniform`| `uniform` |
-| segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br /><br />Avoid `WEEK` granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as `DAY` or `MONTH`, which offer more flexibility.| `day` |
+| segmentGranularity | [Time chunking](../design/storage.md) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br /><br />Avoid `WEEK` granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as `DAY` or `MONTH`, which offer more flexibility.| `day` |
 | queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).<br /><br />Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` |
 | rollup | Whether to use ingestion-time [rollup](./rollup.md) or not. Note that rollup is still effective even when `queryGranularity` is set to `none`. Your data will be rolled up if they have the exactly same timestamp. | `true` |
 | intervals | A list of intervals defining time chunks for segments. Specify interval values using ISO8601 format. For example, `["2021-12-06T21:27:10+00:00/2021-12-07T00:00:00+00:00"]`. If you omit the time, the time defaults to "00:00:00".<br /><br />Druid breaks the list up and rounds off the list values based on the `segmentGranularity`.<br /><br />If `null` or not provided, batch ingestion tasks generally determine which time chunks to output based on the timestamps found in the input data.<br /><br />If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks throw away any records with timestamps outside of the specified intervals.<br /><br />Ignored for any form of streaming ingestion. | `null` |
@@ -529,4 +529,4 @@ You can enable front coding with all types of ingestion. For information on defi
 :::
 
 Beyond these properties, each ingestion method has its own specific tuning properties. See the documentation for each
-[ingestion method](./index.md#ingestion-methods) for details.
+[ingestion method](./index.md#ingestion-methods) for details.
diff --git a/docs/multi-stage-query/concepts.md b/docs/multi-stage-query/concepts.md
@@ -34,7 +34,7 @@ sidebar_label: "Key concepts"
 The `druid-multi-stage-query` extension adds a multi-stage query (MSQ) task engine that executes SQL statements as batch
 tasks in the indexing service, which execute on [Middle Managers](../design/architecture.md#druid-services).
 [INSERT](reference.md#insert) and [REPLACE](reference.md#replace) tasks publish
-[segments](../design/architecture.md#datasources-and-segments) just like [all other forms of batch
+[segments](../design/storage.md) just like [all other forms of batch
 ingestion](../ingestion/index.md#batch). Each query occupies at least two task slots while running: one controller task,
 and at least one worker task. As an experimental feature, the MSQ task engine also supports running SELECT queries as
 batch tasks. The behavior and result format of plain SELECT (without INSERT or REPLACE) is subject to change.

diff --git a/docs/operations/basic-cluster-tuning.md b/docs/operations/basic-cluster-tuning.md
@@ -123,7 +123,7 @@ Be sure to check out [segment size optimization](./segment-optimization.md) to h
 
 The biggest contributions to heap usage on Brokers are:
 - Partial unmerged query results from Historicals and Tasks
-- The segment timeline: this consists of location information (which Historical/Task is serving a segment) for all currently [available](../design/architecture.md#segment-lifecycle) segments.
+- The segment timeline: this consists of location information (which Historical/Task is serving a segment) for all currently [available](../design/storage.md#segment-lifecycle) segments.
 - Cached segment metadata: this consists of metadata, such as per-segment schemas, for all currently available segments.
 
 The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments.

diff --git a/docs/operations/web-console.md b/docs/operations/web-console.md
@@ -26,7 +26,7 @@ Druid includes a web console for loading data, managing datasources and tasks, a
 You can also run SQL and native Druid queries in the console.
 
 Enable the following cluster settings to use the web console. Note that these settings are enabled by default.
-- Enable the Router's [management proxy](../design/router.md#enabling-the-management-proxy).
+- Enable the Router's [management proxy](../design/router.md#enable-the-management-proxy).
 - Enable [Druid SQL](../configuration/index.md#sql) for the Broker processes in the cluster.
 
 The [Router](../design/router.md) service hosts the web console.

diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
@@ -77,7 +77,7 @@ When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid instead
 
 ## Arrays
 
-Druid supports [`ARRAY` types](arrays.md), which behave as standard SQL arrays, where results are grouped by matching entire arrays. The [`UNNEST` operator](./sql-array-functions.md#unn) can be used to perform operations on individual array elements, translating each element into a separate row. 
+Druid supports [`ARRAY` types](arrays.md), which behave as standard SQL arrays, where results are grouped by matching entire arrays. The [`UNNEST` operator](./sql.md#unnest) can be used to perform operations on individual array elements, translating each element into a separate row. 
 
 `ARRAY` typed columns can be stored in segments with JSON-based ingestion using the 'auto' typed dimension schema shared with [schema auto-discovery](../ingestion/schema-design.md#schema-auto-discovery-for-dimensions) to detect and ingest arrays as ARRAY typed columns. For [SQL based ingestion](../multi-stage-query/index.md), the query context parameter `arrayIngestMode` must be specified as `"array"` to ingest ARRAY types. In Druid 28, the default mode for this parameter is `"mvd"` for backwards compatibility, which instead can only handle `ARRAY<STRING>` which it stores in [multi-value string columns](#multi-value-strings).