Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/best-practices/json_type.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ show_related_blogs: true
doc_type: 'reference'
---

ClickHouse now offers a native JSON column type designed for semi-structured and dynamic data. It's important to clarify that **this is a column type, not a data format**—you can insert JSON into ClickHouse as a string or via supported formats like [JSONEachRow](/docs/interfaces/formats/JSONEachRow), but that does not imply using the JSON column type. Users should only use the JSON type when the structure of their data is dynamic, not when they simply happen to store JSON.
ClickHouse now offers a native JSON column type designed for semi-structured and dynamic data. It's important to clarify that **this is a column type, not a data format**—you can insert JSON into ClickHouse as a string or via supported formats like [JSONEachRow](/interfaces/formats/JSONEachRow), but that does not imply using the JSON column type. Users should only use the JSON type when the structure of their data is dynamic, not when they simply happen to store JSON.

## When to use the JSON type {#when-to-use-the-json-type}

Expand Down
2 changes: 1 addition & 1 deletion docs/chdb/guides/querying-s3-bucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
If you pass just the bucket name it will throw an exception.
:::

We're also going to use the [`One`](/interfaces/formats#data-format-one) input format so that the file isn't parsed, instead a single row is returned per file and we can access the file via the `_file` virtual column and the path via the `_path` virtual column.
We're also going to use the [`One`](/interfaces/formats/One) input format so that the file isn't parsed, instead a single row is returned per file and we can access the file via the `_file` virtual column and the path via the `_path` virtual column.

Check warning on line 52 in docs/chdb/guides/querying-s3-bucket.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'going to use', use present tense.

```python
import chdb
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/data-ingestion/clickpipes/kinesis.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ You have familiarized yourself with the [ClickPipes intro](./index.md) and setup
## Supported data formats {#supported-data-formats}

The supported formats are:
- [JSON](../../../interfaces/formats.md/#json)
- [JSON](/interfaces/formats/JSON)

## Supported data types {#supported-data-types}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Apache has released multiple data formats actively used in analytics environment

ClickHouse supports reading and writing [Apache Avro](https://avro.apache.org/) data files, which are widely used in Hadoop systems.

To import from an [avro file](assets/data.avro), we should use [Avro](/interfaces/formats.md/#data-format-avro) format in the `INSERT` statement:
To import from an [avro file](assets/data.avro), we should use [Avro](/interfaces/formats/Avro) format in the `INSERT` statement:

```sql
INSERT INTO sometable
Expand Down Expand Up @@ -70,7 +70,7 @@ LIMIT 3;

### Avro messages in Kafka {#avro-messages-in-kafka}

When Kafka messages use Avro format, ClickHouse can read such streams using [AvroConfluent](/interfaces/formats.md/#data-format-avro-confluent) format and [Kafka](/engines/table-engines/integrations/kafka.md) engine:
When Kafka messages use Avro format, ClickHouse can read such streams using [AvroConfluent](/interfaces/formats/AvroConfluent) format and [Kafka](/engines/table-engines/integrations/kafka.md) engine:

```sql
CREATE TABLE some_topic_stream
Expand All @@ -87,7 +87,7 @@ kafka_format = 'AvroConfluent';

## Working with Arrow format {#working-with-arrow-format}

Another columnar format is [Apache Arrow](https://arrow.apache.org/), also supported by ClickHouse for import and export. To import data from an [Arrow file](assets/data.arrow), we use the [Arrow](/interfaces/formats.md/#data-format-arrow) format:
Another columnar format is [Apache Arrow](https://arrow.apache.org/), also supported by ClickHouse for import and export. To import data from an [Arrow file](assets/data.arrow), we use the [Arrow](/interfaces/formats/Arrow) format:

```sql
INSERT INTO sometable
Expand All @@ -107,7 +107,7 @@ Also, check [data types matching](/interfaces/formats/Arrow#data-types-matching)

### Arrow data streaming {#arrow-data-streaming}

The [ArrowStream](/interfaces/formats.md/#data-format-arrow-stream) format can be used to work with Arrow streaming (used for in-memory processing). ClickHouse can read and write Arrow streams.
The [ArrowStream](/interfaces/formats/ArrowStream) format can be used to work with Arrow streaming (used for in-memory processing). ClickHouse can read and write Arrow streams.

To demonstrate how ClickHouse can stream Arrow data, let's pipe it to the following python script (it reads input stream in Arrow streaming format and outputs the result as a Pandas table):

Expand Down Expand Up @@ -140,7 +140,7 @@ We've used `arrow-stream` as a possible source of Arrow streaming data.

## Importing and exporting ORC data {#importing-and-exporting-orc-data}

[Apache ORC](https://orc.apache.org/) format is a columnar storage format typically used for Hadoop. ClickHouse supports importing as well as exporting [Orc data](assets/data.orc) using [ORC format](/interfaces/formats.md/#data-format-orc):
[Apache ORC](https://orc.apache.org/) format is a columnar storage format typically used for Hadoop. ClickHouse supports importing as well as exporting [Orc data](assets/data.orc) using [ORC format](/interfaces/formats/ORC):

```sql
SELECT *
Expand Down
18 changes: 9 additions & 9 deletions docs/integrations/data-ingestion/data-formats/binary.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

## Exporting in a Native ClickHouse format {#exporting-in-a-native-clickhouse-format}

The most efficient data format to export and import data between ClickHouse nodes is [Native](/interfaces/formats.md/#native) format. Exporting is done using `INTO OUTFILE` clause:
The most efficient data format to export and import data between ClickHouse nodes is [Native](/interfaces/formats/Native) format. Exporting is done using `INTO OUTFILE` clause:

```sql
SELECT * FROM some_data
Expand Down Expand Up @@ -74,7 +74,7 @@

## Exporting to RowBinary {#exporting-to-rowbinary}

Another binary format supported is [RowBinary](/interfaces/formats.md/#rowbinary), which allows importing and exporting data in binary-represented rows:
Another binary format supported is [RowBinary](/interfaces/formats/RowBinary), which allows importing and exporting data in binary-represented rows:

```sql
SELECT * FROM some_data
Expand All @@ -101,7 +101,7 @@
└────────────────────────────────┴────────────┴──────┘
```

Consider using [RowBinaryWithNames](/interfaces/formats.md/#rowbinarywithnames), which also adds a header row with a columns list. [RowBinaryWithNamesAndTypes](/interfaces/formats.md/#rowbinarywithnamesandtypes) will also add an additional header row with column types.
Consider using [RowBinaryWithNames](/interfaces/formats/RowBinaryWithNames), which also adds a header row with a columns list. [RowBinaryWithNamesAndTypes](/interfaces/formats/RowBinaryWithNamesAndTypes) will also add an additional header row with column types.

Check warning on line 104 in docs/integrations/data-ingestion/data-formats/binary.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will also', use present tense.

### Importing from RowBinary files {#importing-from-rowbinary-files}
To load data from a RowBinary file, we can use a `FROM INFILE` clause:
Expand All @@ -115,7 +115,7 @@
## Importing single binary value using RawBLOB {#importing-single-binary-value-using-rawblob}

Suppose we want to read an entire binary file and save it into a field in a table.
This is the case when the [RawBLOB format](/interfaces/formats.md/#rawblob) can be used. This format can be directly used with a single-column table only:
This is the case when the [RawBLOB format](/interfaces/formats/RawBLOB) can be used. This format can be directly used with a single-column table only:

```sql
CREATE TABLE images(data String) ENGINE = Memory
Expand Down Expand Up @@ -152,7 +152,7 @@

## MessagePack {#messagepack}

ClickHouse supports importing and exporting to [MessagePack](https://msgpack.org/) using the [MsgPack](/interfaces/formats.md/#msgpack). To export to MessagePack format:
ClickHouse supports importing and exporting to [MessagePack](https://msgpack.org/) using the [MsgPack](/interfaces/formats/MsgPack). To export to MessagePack format:

```sql
SELECT *
Expand All @@ -173,7 +173,7 @@

<CloudNotSupportedBadge/>

To work with [Protocol Buffers](/interfaces/formats.md/#protobuf) we first need to define a [schema file](assets/schema.proto):
To work with [Protocol Buffers](/interfaces/formats/Protobuf) we first need to define a [schema file](assets/schema.proto):

```protobuf
syntax = "proto3";
Expand All @@ -185,7 +185,7 @@
};
```

Path to this schema file (`schema.proto` in our case) is set in a `format_schema` settings option for the [Protobuf](/interfaces/formats.md/#protobuf) format:
Path to this schema file (`schema.proto` in our case) is set in a `format_schema` settings option for the [Protobuf](/interfaces/formats/Protobuf) format:

```sql
SELECT * FROM some_data
Expand All @@ -194,7 +194,7 @@
SETTINGS format_schema = 'schema:MessageType'
```

This saves data to the [proto.bin](assets/proto.bin) file. ClickHouse also supports importing Protobuf data as well as nested messages. Consider using [ProtobufSingle](/interfaces/formats.md/#protobufsingle) to work with a single Protocol Buffer message (length delimiters will be omitted in this case).
This saves data to the [proto.bin](assets/proto.bin) file. ClickHouse also supports importing Protobuf data as well as nested messages. Consider using [ProtobufSingle](/interfaces/formats/ProtobufSingle) to work with a single Protocol Buffer message (length delimiters will be omitted in this case).

Check warning on line 197 in docs/integrations/data-ingestion/data-formats/binary.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will be', use present tense.

## Cap'n Proto {#capn-proto}

Expand All @@ -212,7 +212,7 @@
}
```

Now we can import and export using [CapnProto](/interfaces/formats.md/#capnproto) format and this schema:
Now we can import and export using [CapnProto](/interfaces/formats/CapnProto) format and this schema:

```sql
SELECT
Expand Down
18 changes: 9 additions & 9 deletions docs/integrations/data-ingestion/data-formats/csv-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ To import data from the [CSV file](assets/data_small.csv) to the `sometable` tab
clickhouse-client -q "INSERT INTO sometable FORMAT CSV" < data_small.csv
```

Note that we use [FORMAT CSV](/interfaces/formats.md/#csv) to let ClickHouse know we're ingesting CSV formatted data. Alternatively, we can load data from a local file using the [FROM INFILE](/sql-reference/statements/insert-into.md/#inserting-data-from-a-file) clause:
Note that we use [FORMAT CSV](/interfaces/formats/CSV) to let ClickHouse know we're ingesting CSV formatted data. Alternatively, we can load data from a local file using the [FROM INFILE](/sql-reference/statements/insert-into.md/#inserting-data-from-a-file) clause:

```sql
INSERT INTO sometable
Expand Down Expand Up @@ -59,7 +59,7 @@ head data-small-headers.csv
"Aegithina_tiphia","2018-02-01",34
```

To import data from this file, we can use [CSVWithNames](/interfaces/formats.md/#csvwithnames) format:
To import data from this file, we can use [CSVWithNames](/interfaces/formats/CSVWithNames) format:

```bash
clickhouse-client -q "INSERT INTO sometable FORMAT CSVWithNames" < data_small_headers.csv
Expand Down Expand Up @@ -153,17 +153,17 @@ SELECT * FROM file('nulls.csv')

## TSV (tab-separated) files {#tsv-tab-separated-files}

Tab-separated data format is widely used as a data interchange format. To load data from a [TSV file](assets/data_small.tsv) to ClickHouse, the [TabSeparated](/interfaces/formats.md/#tabseparated) format is used:
Tab-separated data format is widely used as a data interchange format. To load data from a [TSV file](assets/data_small.tsv) to ClickHouse, the [TabSeparated](/interfaces/formats/TabSeparated) format is used:

```bash
clickhouse-client -q "INSERT INTO sometable FORMAT TabSeparated" < data_small.tsv
```

There's also a [TabSeparatedWithNames](/interfaces/formats.md/#tabseparatedwithnames) format to allow working with TSV files that have headers. And, like for CSV, we can skip the first X lines using the [input_format_tsv_skip_first_lines](/operations/settings/settings-formats.md/#input_format_tsv_skip_first_lines) option.
There's also a [TabSeparatedWithNames](/interfaces/formats/TabSeparatedWithNames) format to allow working with TSV files that have headers. And, like for CSV, we can skip the first X lines using the [input_format_tsv_skip_first_lines](/operations/settings/settings-formats.md/#input_format_tsv_skip_first_lines) option.

### Raw TSV {#raw-tsv}

Sometimes, TSV files are saved without escaping tabs and line breaks. We should use [TabSeparatedRaw](/interfaces/formats.md/#tabseparatedraw) to handle such files.
Sometimes, TSV files are saved without escaping tabs and line breaks. We should use [TabSeparatedRaw](/interfaces/formats/TabSeparatedRaw) to handle such files.

## Exporting to CSV {#exporting-to-csv}

Expand All @@ -183,7 +183,7 @@ FORMAT CSV
"2016_Greater_Western_Sydney_Giants_season","2017-05-01",86
```

To add a header to the CSV file, we use the [CSVWithNames](/interfaces/formats.md/#csvwithnames) format:
To add a header to the CSV file, we use the [CSVWithNames](/interfaces/formats/CSVWithNames) format:

```sql
SELECT *
Expand Down Expand Up @@ -273,7 +273,7 @@ All column types will be treated as a `String` in this case.

### Exporting and importing CSV with explicit column types {#exporting-and-importing-csv-with-explicit-column-types}

ClickHouse also allows explicitly setting column types when exporting data using [CSVWithNamesAndTypes](/interfaces/formats.md/#csvwithnamesandtypes) (and other *WithNames formats family):
ClickHouse also allows explicitly setting column types when exporting data using [CSVWithNamesAndTypes](/interfaces/formats/CSVWithNamesAndTypes) (and other *WithNames formats family):

```sql
SELECT *
Expand Down Expand Up @@ -308,7 +308,7 @@ Now ClickHouse identifies column types based on a (second) header row instead of

## Custom delimiters, separators, and escaping rules {#custom-delimiters-separators-and-escaping-rules}

In sophisticated cases, text data can be formatted in a highly custom manner but still have a structure. ClickHouse has a special [CustomSeparated](/interfaces/formats.md/#format-customseparated) format for such cases, which allows setting custom escaping rules, delimiters, line separators, and starting/ending symbols.
In sophisticated cases, text data can be formatted in a highly custom manner but still have a structure. ClickHouse has a special [CustomSeparated](/interfaces/formats/CustomSeparated) format for such cases, which allows setting custom escaping rules, delimiters, line separators, and starting/ending symbols.

Suppose we have the following data in the file:

Expand Down Expand Up @@ -341,7 +341,7 @@ LIMIT 3
└───────────────────────────┴────────────┴─────┘
```

We can also use [CustomSeparatedWithNames](/interfaces/formats.md/#customseparatedwithnames) to get headers exported and imported correctly. Explore [regex and template](templates-regex.md) formats to deal with even more complex cases.
We can also use [CustomSeparatedWithNames](/interfaces/formats/CustomSeparatedWithNames) to get headers exported and imported correctly. Explore [regex and template](templates-regex.md) formats to deal with even more complex cases.

## Working with large CSV files {#working-with-large-csv-files}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# Exporting JSON

Almost any JSON format used for import can be used for export as well. The most popular is [`JSONEachRow`](/interfaces/formats.md/#jsoneachrow):
Almost any JSON format used for import can be used for export as well. The most popular is [`JSONEachRow`](/interfaces/formats/JSONEachRow):

```sql
SELECT * FROM sometable FORMAT JSONEachRow
Expand All @@ -19,7 +19,7 @@
{"path":"Ahmadabad-e_Kalij-e_Sofla","month":"2017-01-01","hits":3}
```

Or we can use [`JSONCompactEachRow`](/interfaces/formats#jsoncompacteachrow) to save disk space by skipping column names:
Or we can use [`JSONCompactEachRow`](/interfaces/formats/JSONCompactEachRow) to save disk space by skipping column names:

```sql
SELECT * FROM sometable FORMAT JSONCompactEachRow
Expand All @@ -32,7 +32,7 @@

## Overriding data types as strings {#overriding-data-types-as-strings}

ClickHouse respects data types and will export JSON accordingly to standards. But in cases where we need to have all values encoded as strings, we can use the [JSONStringsEachRow](/interfaces/formats.md/#jsonstringseachrow) format:
ClickHouse respects data types and will export JSON accordingly to standards. But in cases where we need to have all values encoded as strings, we can use the [JSONStringsEachRow](/interfaces/formats/JSONStringsEachRow) format:

Check warning on line 35 in docs/integrations/data-ingestion/data-formats/json/exporting.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will export', use present tense.

```sql
SELECT * FROM sometable FORMAT JSONStringsEachRow
Expand All @@ -56,7 +56,7 @@

## Exporting metadata together with data {#exporting-metadata-together-with-data}

General [JSON](/interfaces/formats.md/#json) format, which is popular in apps, will export not only resulting data but column types and query stats:
General [JSON](/interfaces/formats/JSON) format, which is popular in apps, will export not only resulting data but column types and query stats:

Check warning on line 59 in docs/integrations/data-ingestion/data-formats/json/exporting.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will export', use present tense.

```sql
SELECT * FROM sometable FORMAT JSON
Expand Down Expand Up @@ -93,7 +93,7 @@
}
```

The [JSONCompact](/interfaces/formats.md/#jsoncompact) format will print the same metadata but use a compacted form for the data itself:
The [JSONCompact](/interfaces/formats/JSONCompact) format will print the same metadata but use a compacted form for the data itself:

Check warning on line 96 in docs/integrations/data-ingestion/data-formats/json/exporting.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will print', use present tense.

```sql
SELECT * FROM sometable FORMAT JSONCompact
Expand Down Expand Up @@ -127,11 +127,11 @@
}
```

Consider [`JSONStrings`](/interfaces/formats.md/#jsonstrings) or [`JSONCompactStrings`](/interfaces/formats.md/#jsoncompactstrings) variants to encode all values as strings.
Consider [`JSONStrings`](/interfaces/formats/JSONStrings) or [`JSONCompactStrings`](/interfaces/formats/JSONCompactStrings) variants to encode all values as strings.

## Compact way to export JSON data and structure {#compact-way-to-export-json-data-and-structure}

A more efficient way to have data, as well as it's structure, is to use [`JSONCompactEachRowWithNamesAndTypes`](/interfaces/formats.md/#jsoncompacteachrowwithnamesandtypes) format:
A more efficient way to have data, as well as it's structure, is to use [`JSONCompactEachRowWithNamesAndTypes`](/interfaces/formats/JSONCompactEachRowWithNamesAndTypes) format:

```sql
SELECT * FROM sometable FORMAT JSONCompactEachRowWithNamesAndTypes
Expand Down
Loading