Skip to content

Commit

Permalink
docs: Anchor link checker (#15624)
Browse files Browse the repository at this point in the history
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
  • Loading branch information
vtlim and 317brian committed Jan 8, 2024
1 parent df5bcd1 commit 52313c5
Show file tree
Hide file tree
Showing 18 changed files with 126 additions and 123 deletions.
1 change: 1 addition & 0 deletions .github/workflows/static-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ jobs:
(cd website && npm install)
cd website
npm run build
npm run link-lint
npm run spellcheck
- name: web console
Expand Down
101 changes: 0 additions & 101 deletions docs/_bin/broken-link-check.py

This file was deleted.

4 changes: 2 additions & 2 deletions docs/data-management/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ title: "Data deletion"

## By time range, manually

Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports
Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
deleting data for time chunks by dropping segments. This is a fast, metadata-only operation.

Deletion by time range happens in two steps:

1. Segments to be deleted must first be marked as ["unused"](../design/architecture.md#segment-lifecycle). This can
1. Segments to be deleted must first be marked as ["unused"](../design/storage.md#segment-lifecycle). This can
happen when a segment is dropped by a [drop rule](../operations/rule-configuration.md) or when you manually mark a
segment unused through the Coordinator API or web console. This is a soft delete: the data is not available for
querying, but the segment files remains in deep storage, and the segment records remains in the metadata store.
Expand Down
2 changes: 1 addition & 1 deletion docs/data-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ sidebar_label: "Overview"
~ under the License.
-->

Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) in immutable
Apache Druid stores data [partitioned by time chunk](../design/storage.md) in immutable
files called [segments](../design/segments.md). Data management operations involving replacing, or deleting,
these segments include:

Expand Down
2 changes: 1 addition & 1 deletion docs/data-management/schema-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ title: "Schema changes"
Apache Druid allows you to provide a new schema for new data without the need to update the schema of any existing data.
It is sufficient to update your supervisor spec, if using [streaming ingestion](../ingestion/index.md#streaming), or to
provide the new schema the next time you do a [batch ingestion](../ingestion/index.md#batch). This is made possible by
the fact that each [segment](../design/architecture.md#datasources-and-segments), at the time it is created, stores a
the fact that each [segment](../design/segments.md), at the time it is created, stores a
copy of its own schema. Druid reconciles all of these individual segment schemas automatically at query time.

## For existing data
Expand Down
2 changes: 1 addition & 1 deletion docs/data-management/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ title: "Data updates"

## Overwrite

Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports
Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
overwriting existing data using time ranges. Data outside the replacement time range is not touched. Overwriting of
existing data is done using the same mechanisms as [batch ingestion](../ingestion/index.md#batch).

Expand Down
10 changes: 8 additions & 2 deletions docs/development/docs-contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ Now you're up to date, and you can make your changes.
git checkout -b MY-BRANCH
```

Provide a name for your feature branch in `MY-BRANCH`.
Provide a name for your feature branch in `MY-BRANCH`.

2. Find the file that you want to make changes to. All the source files for the docs are written in Markdown and located in the `docs` directory. The URL for the page includes the subdirectory the source file is in. For example, the SQL-based ingestion tutorial found at `https://druid.apache.org/docs/latest/tutorials/tutorial-msq-extern.html` is in the `tutorials` subdirectory.

If you're adding a page, create a new Markdown file in the appropriate subdirectory. Then, copy the front matter and Apache license from an existing file. Update the `title` and `id` fields. Don't forget to add it to `website/sidebars.json` so that your new page shows up in the navigation.
Expand All @@ -111,6 +112,11 @@ Provide a name for your feature branch in `MY-BRANCH`.
5. Use the following commands to run the link and spellcheckers locally:

```bash
cd website
# You only need to install once
npm install
npm run build

npm run spellcheck
npm run link-lint
```
Expand Down Expand Up @@ -216,4 +222,4 @@ Before publishing new content or updating an existing topic, you can audit your
* When American spelling is different from Commonwealth/"British" spelling, use the American spelling.
* Don’t use terms considered disrespectful. Refer to a list like Google’s [Word list](https://developers.google.com/style/word-list) for guidance and alternatives.
* Use straight quotation marks and straight apostrophes instead of the curly versions.
* Introduce a list, a table, or a procedure with an introductory sentence that prepares the reader for what they're about to read.
* Introduce a list, a table, or a procedure with an introductory sentence that prepares the reader for what they're about to read.
2 changes: 1 addition & 1 deletion docs/development/experimental-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Note that this document does not track the status of contrib extensions, all of

- [Configuration reference](../configuration/index.md#overlord-operations)
- [Task reference](../ingestion/tasks.md#locking)
- [Design](../design/architecture.md#availability-and-consistency)
- [Design](../design/storage.md#availability-and-consistency)

## Front coding

Expand Down
2 changes: 1 addition & 1 deletion docs/ingestion/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ sidebar_label: Overview
-->

Loading data in Druid is called _ingestion_ or _indexing_. When you ingest data into Druid, Druid reads the data from
your source system and stores it in data files called [_segments_](../design/architecture.md#datasources-and-segments).
your source system and stores it in data files called [_segments_](../design/segments.md).
In general, segment files contain a few million rows each.

For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes or the
Expand Down
8 changes: 4 additions & 4 deletions docs/ingestion/ingestion-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ An example `dataSchema` is:
### `dataSource`

The `dataSource` is located in `dataSchema``dataSource` and is simply the name of the
[datasource](../design/architecture.md#datasources-and-segments) that data will be written to. An example
[datasource](../design/storage.md) that data will be written to. An example
`dataSource` is:

```
Expand Down Expand Up @@ -304,7 +304,7 @@ An example `metricsSpec` is:
The `granularitySpec` is located in `dataSchema``granularitySpec` and is responsible for configuring
the following operations:

1. Partitioning a datasource into [time chunks](../design/architecture.md#datasources-and-segments) (via `segmentGranularity`).
1. Partitioning a datasource into [time chunks](../design/storage.md) (via `segmentGranularity`).
2. Truncating the timestamp, if desired (via `queryGranularity`).
3. Specifying which time chunks of segments should be created, for batch ingestion (via `intervals`).
4. Specifying whether ingestion-time [rollup](./rollup.md) should be used or not (via `rollup`).
Expand All @@ -329,7 +329,7 @@ A `granularitySpec` can have the following components:
| Field | Description | Default |
|-------|-------------|---------|
| type |`uniform`| `uniform` |
| segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br /><br />Avoid `WEEK` granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as `DAY` or `MONTH`, which offer more flexibility.| `day` |
| segmentGranularity | [Time chunking](../design/storage.md) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br /><br />Avoid `WEEK` granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as `DAY` or `MONTH`, which offer more flexibility.| `day` |
| queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).<br /><br />Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` |
| rollup | Whether to use ingestion-time [rollup](./rollup.md) or not. Note that rollup is still effective even when `queryGranularity` is set to `none`. Your data will be rolled up if they have the exactly same timestamp. | `true` |
| intervals | A list of intervals defining time chunks for segments. Specify interval values using ISO8601 format. For example, `["2021-12-06T21:27:10+00:00/2021-12-07T00:00:00+00:00"]`. If you omit the time, the time defaults to "00:00:00".<br /><br />Druid breaks the list up and rounds off the list values based on the `segmentGranularity`.<br /><br />If `null` or not provided, batch ingestion tasks generally determine which time chunks to output based on the timestamps found in the input data.<br /><br />If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks throw away any records with timestamps outside of the specified intervals.<br /><br />Ignored for any form of streaming ingestion. | `null` |
Expand Down Expand Up @@ -529,4 +529,4 @@ You can enable front coding with all types of ingestion. For information on defi
:::

Beyond these properties, each ingestion method has its own specific tuning properties. See the documentation for each
[ingestion method](./index.md#ingestion-methods) for details.
[ingestion method](./index.md#ingestion-methods) for details.
2 changes: 1 addition & 1 deletion docs/multi-stage-query/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ sidebar_label: "Key concepts"
The `druid-multi-stage-query` extension adds a multi-stage query (MSQ) task engine that executes SQL statements as batch
tasks in the indexing service, which execute on [Middle Managers](../design/architecture.md#druid-services).
[INSERT](reference.md#insert) and [REPLACE](reference.md#replace) tasks publish
[segments](../design/architecture.md#datasources-and-segments) just like [all other forms of batch
[segments](../design/storage.md) just like [all other forms of batch
ingestion](../ingestion/index.md#batch). Each query occupies at least two task slots while running: one controller task,
and at least one worker task. As an experimental feature, the MSQ task engine also supports running SELECT queries as
batch tasks. The behavior and result format of plain SELECT (without INSERT or REPLACE) is subject to change.
Expand Down
2 changes: 1 addition & 1 deletion docs/operations/basic-cluster-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Be sure to check out [segment size optimization](./segment-optimization.md) to h

The biggest contributions to heap usage on Brokers are:
- Partial unmerged query results from Historicals and Tasks
- The segment timeline: this consists of location information (which Historical/Task is serving a segment) for all currently [available](../design/architecture.md#segment-lifecycle) segments.
- The segment timeline: this consists of location information (which Historical/Task is serving a segment) for all currently [available](../design/storage.md#segment-lifecycle) segments.
- Cached segment metadata: this consists of metadata, such as per-segment schemas, for all currently available segments.

The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments.
Expand Down
2 changes: 1 addition & 1 deletion docs/operations/web-console.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Druid includes a web console for loading data, managing datasources and tasks, a
You can also run SQL and native Druid queries in the console.

Enable the following cluster settings to use the web console. Note that these settings are enabled by default.
- Enable the Router's [management proxy](../design/router.md#enabling-the-management-proxy).
- Enable the Router's [management proxy](../design/router.md#enable-the-management-proxy).
- Enable [Druid SQL](../configuration/index.md#sql) for the Broker processes in the cluster.

The [Router](../design/router.md) service hosts the web console.
Expand Down
2 changes: 1 addition & 1 deletion docs/querying/sql-data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid instead

## Arrays

Druid supports [`ARRAY` types](arrays.md), which behave as standard SQL arrays, where results are grouped by matching entire arrays. The [`UNNEST` operator](./sql-array-functions.md#unn) can be used to perform operations on individual array elements, translating each element into a separate row.
Druid supports [`ARRAY` types](arrays.md), which behave as standard SQL arrays, where results are grouped by matching entire arrays. The [`UNNEST` operator](./sql.md#unnest) can be used to perform operations on individual array elements, translating each element into a separate row.

`ARRAY` typed columns can be stored in segments with JSON-based ingestion using the 'auto' typed dimension schema shared with [schema auto-discovery](../ingestion/schema-design.md#schema-auto-discovery-for-dimensions) to detect and ingest arrays as ARRAY typed columns. For [SQL based ingestion](../multi-stage-query/index.md), the query context parameter `arrayIngestMode` must be specified as `"array"` to ingest ARRAY types. In Druid 28, the default mode for this parameter is `"mvd"` for backwards compatibility, which instead can only handle `ARRAY<STRING>` which it stores in [multi-value string columns](#multi-value-strings).

Expand Down
Loading

0 comments on commit 52313c5

Please sign in to comment.