ClickHouse · Blargian · Sep 19, 2025 · Sep 19, 2025
@@ -224,7 +224,7 @@ ORDER BY doc.update_date
 We provide a type hint for the `update_date` column in the JSON definition, as we use it in the ordering/primary key. This helps ClickHouse to know that this column won't be null and ensures it knows which `update_date` sub-column to use (there may be multiple for each type, so this is ambiguous otherwise).
 :::
 
-We can insert into this table and view the subsequently inferred schema using the [`JSONAllPathsWithTypes`](/sql-reference/functions/json-functions#jsonallpathswithtypes) function and [`PrettyJSONEachRow`](/interfaces/formats/PrettyJSONEachRow) output format:
+We can insert into this table and view the subsequently inferred schema using the [`JSONAllPathsWithTypes`](/sql-reference/functions/json-functions#JSONAllPathsWithTypes) function and [`PrettyJSONEachRow`](/interfaces/formats/PrettyJSONEachRow) output format:
 
 ```sql
 INSERT INTO arxiv FORMAT JSONAsObject 

@@ -101,7 +101,7 @@ input_format_parquet_case_insensitive_column_matching = 1 -- Column matching bet
 :::note Note on nested column structures
 The `VARIANT` and `OBJECT` columns in the original Snowflake table schema will be output as JSON strings by default, forcing us to cast these when inserting them into ClickHouse.
 
-Nested structures such as `some_file` are converted to JSON strings on copy by Snowflake. Importing this data requires us to transform these structures to Tuples at insert time in ClickHouse, using the [JSONExtract function](/sql-reference/functions/json-functions#jsonextract) as shown above.
+Nested structures such as `some_file` are converted to JSON strings on copy by Snowflake. Importing this data requires us to transform these structures to Tuples at insert time in ClickHouse, using the [JSONExtract function](/sql-reference/functions/json-functions#JSONExtract) as shown above.
 :::
 
 ## Test successful data export {#3-testing-successful-data-export}

@@ -42,6 +42,7 @@ by https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-
 | [Foursquare places](/getting-started/example-datasets/foursquare-places) | Dataset with over 100 million records containing information about places on a map, such as shops, restaurants, parks, playgrounds, and monuments. |
 | [GitHub Events Dataset](/getting-started/example-datasets/github-events) | Dataset containing all events on GitHub from 2011 to Dec 6 2020, with a size of 3.1 billion records. |
 | [Hacker News dataset](/getting-started/example-datasets/hacker-news) | Dataset containing 28 million rows of hacker news data. |
+| [Hacker News Vector Search dataset](/getting-started/example-datasets/hackernews-vector-search-dataset) | Dataset containing 28+ million Hacker News postings & their vector embeddings |
 | [LAION 5B dataset](/getting-started/example-datasets/laion-5b-dataset) | Dataset containing 100 million vectors from the LAION 5B dataset |
 | [Laion-400M dataset](/getting-started/example-datasets/laion-400m-dataset) | Dataset containing 400 million images with English image captions |
 | [New York Public Library "What's on the Menu?" Dataset](/getting-started/example-datasets/menus) | Dataset containing 1.3 million records of historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices. |

@@ -70,7 +70,7 @@ SELECT JSONExtractString(tags, 'holidays') AS holidays FROM people
 1 row in set. Elapsed: 0.002 sec.
 ```
 
-Notice how the functions require both a reference to the `String` column `tags` and a path in the JSON to extract. Nested paths require functions to be nested e.g. `JSONExtractUInt(JSONExtractString(tags, 'car'), 'year')` which extracts the column `tags.car.year`. The extraction of nested paths can be simplified through the functions [`JSON_QUERY`](/sql-reference/functions/json-functions#json_query) and [`JSON_VALUE`](/sql-reference/functions/json-functions#json_value).
+Notice how the functions require both a reference to the `String` column `tags` and a path in the JSON to extract. Nested paths require functions to be nested e.g. `JSONExtractUInt(JSONExtractString(tags, 'car'), 'year')` which extracts the column `tags.car.year`. The extraction of nested paths can be simplified through the functions [`JSON_QUERY`](/sql-reference/functions/json-functions#JSON_QUERY) and [`JSON_VALUE`](/sql-reference/functions/json-functions#json_value).
 
 Consider the extreme case with the `arxiv` dataset where we consider the entire body to be a `String`.
 

@@ -266,6 +266,7 @@ if [ -f "$FUNCTION_SQL_FILE" ]; then
       "Encryption"
       "Hash"
       "Introspection"
+      "JSON"
     )
 
     for CATEGORY in "${FUNCTION_CATEGORIES[@]}"; do
@@ -376,6 +377,7 @@ insert_src_files=(
   "encryption-functions.md"
   "hash-functions.md"
   "introspection-functions.md"
+  "json-functions.md"
 )
 
 insert_dest_files=(
@@ -394,6 +396,7 @@ insert_dest_files=(
     "docs/sql-reference/functions/encryption-functions.md"
     "docs/sql-reference/functions/hash-functions.md"
     "docs/sql-reference/functions/introspection.md"
+    "docs/sql-reference/functions/json-functions.md"
 )
 
 echo "[$SCRIPT_NAME] Inserting generated markdown content between AUTOGENERATED_START and AUTOGENERATED_END tags"