Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document all ClickHouse functions. #1833

Open
johnnymatthews opened this issue Jan 9, 2024 · 4 comments
Open

Document all ClickHouse functions. #1833

johnnymatthews opened this issue Jan 9, 2024 · 4 comments
Assignees
Labels
functions-project Issues specific to the functions documentation project

Comments

@johnnymatthews
Copy link
Contributor

johnnymatthews commented Jan 9, 2024

Problem

There are many functions in ClickHouse, but not all are documented. See #1682 #1512 as examples. This issue serves as the main holding-issue for many child/sub-issues. Essentially, if you want to know which functions have recently had documentation added, you should treat this issue as the source-of-truth.

Solution

Document 100% of the functions in ClickHouse.

Process

  1. List all the functions that exist in ClickHouse.
  2. List all the functions that have docs.
  3. Create a list of functions that don't have docs.
  4. Loop through this new list to write a description and syntax example for each function.
    1. To get this info, check the tests that each function was written with.
    2. Check in with the author of each function to get a background on why it was created.
  5. Repeat step 4 until all the functions have docs.

Scope

List each function, what it does, and a syntax example.

Example

Here is a markdown example of what a function should look like the following (a copy of this is available in the clickhouse/clickhouse repository at /docs/en/sql-reference/functions/_function-template.md):

## Example

This function serves as an example of what function documentation should look like.

**Syntax**

```sql
Example(input, secondary_input)
```

**Parameters**

- `input`: This is an example input. [String literal](../syntax#syntax-string-literal)
- `secondary_input`: This is another example input. [Expression](../syntax#syntax-expressions)

**Returned value**

This function doesn't exists. But if it did, they it would return a bool.

**Implementation details**

Any additional information pertaining to this function that users might need to know about. Things like known-bugs or edge cases would fit into this section. For example, if a function used _bankers rounding_ instead of standard rounding.

**Example**

Query:

```sql
CREATE TABLE example_table
(
    id UUID,
    name String,
) ENGINE = MergeTree
ORDER BY id;

INSERT INTO example_table VALUES (generateUUIDv4(), Example());
SELECT * FROM example_table;
```

```response
9e22db8c-d343-4bc5-8a95-501f635c0fc4	Example
```

Things to note:

  1. The arguments must end in a valid type and link to that type within the docs.
  2. The example query must be valid and complete; a user should be able to copy and paste the query directly into fiddle.clickhouse.com and have the query run without any modifications.
  3. The example result must be similar to what the user can expect to see when running it in fiddle.clickhouse.com. Some functions, such as generateUUIDv4(), will return a random string; as such, you shouldn't try to have your example response match what the user might see.

Out of scope

We will not:

  • Create a fancy new way of searching through functions.
  • Expand on or update existing function docs.
  • Create in-depth guides or tutorials for each function.
  • Add docs directly into clickhouse/clickhouse code.
  • Showcase specific functions.

These items are valid points to improve the docs, but they're entirely separate projects that require planning and setup of their own.

Timeline

This project should be completed in Q1 2024. So, by the end of March 2024.

@johnnymatthews johnnymatthews self-assigned this Jan 9, 2024
@johnnymatthews
Copy link
Contributor Author

johnnymatthews commented Jan 23, 2024

After going through all the available ClickHouse functions, and searching through the docs for appropriate documentation, I've concluded that we have 449 functions without adequate documentation. See this Google Sheet. Check the Info tab for details on how the existing docs were judged.

449 function is a lot, especially given the size of the docs team. Without significant crowd-sourcing methods, it's unlikely that we'll get all of these missing functions documented by the end of March 2024.

@johnnymatthews
Copy link
Contributor Author

johnnymatthews commented Jan 23, 2024

Here's a complete list of the missing functions:

  • FROM_UNIXTIME

See #1919.


  • JSON_QUERY
  • JSON_VALUE

See #1920.


  • L2SquaredNorm

See #1921.


  • TimeDiff

See #1922.


  • UTC_timestamp

See #1923.


  • VAR_POP
  • VAR_SAMP

See #1924.


  • YEAR

See #1925.


  • __bitBoolMaskAnd
  • __bitBoolMaskOr
  • __bitSwapLastTwo
  • __bitWrapperFunc

See #1926.


  • __getScalar

See #1927.


  • addTupleOfIntervals

See #1928.


  • aggThrow

See #1929.


  • analysisOfVariance

See #1930.


  • and

See #1931.


  • anova

See #1932.


  • anyLast_respect_nulls
  • any_respect_nulls
  • any_value
  • any_value_respect_nulls

See #1933.


  • array
  • arrayDotProduct
  • arrayEnumerateDenseRanked
  • arrayEnumerateUniqRanked
  • arrayFirstOrNull
  • arrayFlatten
  • arrayLastOrNull
  • arrayPartialShuffle
  • arrayShuffle

See #1934.


  • caseWithExpr
  • caseWithExpression
  • caseWithoutExpr
  • caseWithoutExpression

See #1935.


  • connectionId
  • connection_id

See #1936.


  • corrMatrix
  • corrStable

See #1937.


  • countMatchesCaseInsensitive
  • countSubstringsCaseInsensitive
  • countSubstringsCaseInsensitiveUTF8

See #1938.


  • covarPopMatrix
  • covarPopStable
  • covarSampMatrix
  • covarSampStable

  • cramersVBiasCorrected

See #1940.


  • currentSchemas
  • current_schemas

See #1941.


  • cutToFirstSignificantSubdomainCustomRFC
  • cutToFirstSignificantSubdomainCustomWithWWWRFC
  • cutToFirstSignificantSubdomainRFC
  • cutToFirstSignificantSubdomainWithWWWRFC

See #1942.


  • dense_rank

See #1943.


  • detectProgrammingLanguage
  • detectTonality

See #1944.


  • dictGet
  • dictGetDate
  • dictGetDateOrDefault
  • dictGetDateTime
  • dictGetDateTimeOrDefault
  • dictGetFloat32
  • dictGetFloat32OrDefault
  • dictGetFloat64
  • dictGetFloat64OrDefault
  • dictGetIPv4
  • dictGetIPv4OrDefault
  • dictGetIPv6
  • dictGetIPv6OrDefault
  • dictGetInt16
  • dictGetInt16OrDefault
  • dictGetInt32
  • dictGetInt32OrDefault
  • dictGetInt64
  • dictGetInt64OrDefault
  • dictGetInt8
  • dictGetInt8OrDefault
  • dictGetString
  • dictGetStringOrDefault
  • dictGetUInt16
  • dictGetUInt16OrDefault
  • dictGetUInt32
  • dictGetUInt32OrDefault
  • dictGetUInt64
  • dictGetUInt64OrDefault
  • dictGetUInt8
  • dictGetUInt8OrDefault
  • dictGetUUID
  • dictGetUUIDOrDefault

See #1918.


  • displayName

See #1945.


  • domainRFC
  • domainWithoutWWWRFC

See #1946.


  • e

See #1947.


  • emptyArrayFloat32
  • emptyArrayFloat64
  • emptyArrayInt16
  • emptyArrayInt32
  • emptyArrayInt64
  • emptyArrayInt8
  • emptyArrayString
  • emptyArrayToSingle
  • emptyArrayUInt16
  • emptyArrayUInt32
  • emptyArrayUInt64
  • emptyArrayUInt8

See #1948.


  • exp
  • exponentialTimeDecayedAvg
  • exponentialTimeDecayedCount
  • exponentialTimeDecayedMax
  • exponentialTimeDecayedSum

See #1949.


  • extract
  • extractGroups

See #1950.


  • filesystemUnreserved

See #1951.


  • firstSignificantSubdomainCustomRFC
  • firstSignificantSubdomainRFC
  • first_value_respect_nulls

See #1953.


  • flameGraph

See #1954.


  • flattenTuple

See #1955.


  • formatQueryOrNull
  • formatQuerySingleLineOrNull

See #1956.


  • fromUnixTimestamp64Micro
  • fromUnixTimestamp64Milli

See #1957.


  • fullHostName

See #1958.


  • getSubcolumn
  • getTypeSerializationStreams

See #1959.


  • globalIn
  • globalInIgnoreSet
  • globalNotIn
  • globalNotInIgnoreSet
  • globalNotNullIn
  • globalNotNullInIgnoreSet
  • globalNullIn
  • globalNullInIgnoreSet
  • globalVariable

See #1960.


  • greater

See #1961.


  • hasColumnInTable
  • hasSubsequenceCaseInsensitive
  • hasSubsequenceCaseInsensitiveUTF8
  • hasSubsequenceUTF8
  • hasThreadFuzzer
  • hasToken
  • hasTokenCaseInsensitive
  • hasTokenCaseInsensitiveOrNull
  • hasTokenOrNull

See #1962.


  • ignore

See #1963.


  • in
  • inIgnoreSet

See #1964.


  • initcap

See #1965.


  • intHash32
  • intHash64

See #1966.


  • isNotDistinctFrom
  • isNullable

See #1967.


  • joinGetOrNull

See #1968.


  • kostikConsistentHash

See #1969.


  • kql_array_sort_asc
  • kql_array_sort_desc

See #1970.


  • lagInFrame

See #1971.


  • last_value_respect_nulls

See #1972.


  • lcase

See #1973.


  • leadInFrame

See #1974.


  • left
  • leftUTF8

See #1975.


  • lengthUTF8

See #1976.


  • less
  • lessOrEquals

See #1977.


  • lowCardinalityIndices
  • lowCardinalityKeys

See #1978.


  • makeDate32
  • makeDateTime64

See #1979.


  • mapPartialReverseSort
  • mapPartialSort

See #1980.


  • materialize

See #1981.


  • maxMappedArrays
  • minMappedArrays

See #1982.


  • moduloLegacy

See #1983.


  • mortonDecode
  • mortonEncode

See #1984.


  • multiSearchAllPositionsCaseInsensitive
  • multiSearchAllPositionsCaseInsensitiveUTF8
  • multiSearchAllPositionsUTF8
  • multiSearchAnyCaseInsensitive
  • multiSearchAnyCaseInsensitiveUTF8
  • multiSearchAnyUTF8
  • multiSearchFirstIndexCaseInsensitive
  • multiSearchFirstIndexCaseInsensitiveUTF8
  • multiSearchFirstIndexUTF8
  • multiSearchFirstPositionCaseInsensitive
  • multiSearchFirstPositionCaseInsensitiveUTF8
  • multiSearchFirstPositionUTF8

See #1985.


  • ngramDistanceCaseInsensitive
  • ngramDistanceCaseInsensitiveUTF8
  • ngramDistanceUTF8
  • ngramSearchCaseInsensitive
  • ngramSearchCaseInsensitiveUTF8
  • ngramSearchUTF8

See #1986.


  • nonNegativeDerivative

See #1987.


  • normL2Squared

See #1988.


  • normalizeQueryKeepNames
  • normalizedQueryHashKeepNames

See #1989.


  • notInIgnoreSet
  • notNullIn
  • notNullInIgnoreSet

See #1990.


  • nth_value

See #1991.


  • nullIn
  • nullInIgnoreSet

See #1992.


  • or

See #1993.


  • partitionId

See #1994.


  • portRFC

See #1995.


  • proportionsZTest

See #1996.


  • protocol

See #1997.


  • rand
  • rand32
  • rand64
  • randCanonical

See #1998.


  • rank

See #1999.


  • readWKTPoint
  • readWKTRing

See #2000.


  • regionHierarchy
  • regionIn
  • regionToArea
  • regionToCity
  • regionToContinent
  • regionToCountry
  • regionToDistrict
  • regionToName
  • regionToPopulation
  • regionToTopContinent

See #2001.


  • reinterpretAsDate
  • reinterpretAsDateTime
  • reinterpretAsFixedString
  • reinterpretAsFloat32
  • reinterpretAsFloat64
  • reinterpretAsInt128
  • reinterpretAsInt16
  • reinterpretAsInt256
  • reinterpretAsInt32
  • reinterpretAsInt64
  • reinterpretAsInt8
  • reinterpretAsString
  • reinterpretAsUInt128
  • reinterpretAsUInt16
  • reinterpretAsUInt256
  • reinterpretAsUInt32
  • reinterpretAsUInt64
  • reinterpretAsUInt8
  • reinterpretAsUUID

See #2002.


  • reverseDNSQuery

See #2003.


  • revision

See #2004.


  • right
  • rightUTF8

See #2005.


  • roundAge
  • roundDown
  • roundDuration
  • roundToExp2

See #2006.


  • rowNumberInAllBlocks
  • rowNumberInBlock

See #2007.


  • row_number

See #2008.


  • showCertificate

See #2009.


  • sigmoid

See #2010.


  • simpleJSONExtractBool
  • simpleJSONExtractFloat
  • simpleJSONExtractInt
  • simpleJSONExtractRaw
  • simpleJSONExtractString
  • simpleJSONExtractUInt
  • simpleJSONHas

See #2011.


  • sin

See #2012.


  • singleValueOrNull

See #2013.


  • sleep
  • sleepEachRow

See #2014.


  • stddevPop
  • stddevPopStable
  • stddevSamp
  • stddevSampStable

See #2015.


  • substringIndexUTF8
  • substringUTF8

See #2016.


  • subtractDays
  • subtractHours
  • subtractInterval
  • subtractMicroseconds
  • subtractMilliseconds
  • subtractMinutes
  • subtractMonths
  • subtractNanoseconds
  • subtractQuarters
  • subtractSeconds
  • subtractTupleOfIntervals
  • subtractWeeks
  • subtractYears

See #2017.


  • sum
  • sumMapFiltered
  • sumMapFilteredWithOverflow
  • sumMapWithOverflow
  • sumWithOverflow

See #2018.


  • svg

See #2019.


  • tanh

See #2020.


  • timeSlot
  • timeSlots

See #2021.


  • toBool
  • toDateTime64OrDefault
  • toDateTime64OrNull
  • toDateTime64OrZero
  • toDecimal128
  • toDecimal128OrDefault
  • toDecimal128OrNull
  • toDecimal128OrZero
  • toDecimal256
  • toDecimal256OrDefault
  • toDecimal256OrNull
  • toDecimal256OrZero
  • toDecimal32
  • toDecimal32OrDefault
  • toDecimal32OrNull
  • toDecimal32OrZero
  • toDecimal64
  • toDecimal64OrDefault
  • toDecimal64OrNull
  • toDecimal64OrZero
  • toDecimalString
  • toFixedString
  • toFloat32
  • toFloat32OrDefault
  • toFloat32OrNull
  • toFloat32OrZero
  • toFloat64
  • toFloat64OrDefault
  • toFloat64OrNull
  • toFloat64OrZero
  • toIPv4OrDefault
  • toIPv4OrNull
  • toIPv4OrZero
  • toIPv6OrDefault
  • toIPv6OrNull
  • toIPv6OrZero
  • toISOWeek
  • toISOYear
  • toInt128
  • toInt128OrDefault
  • toInt128OrNull
  • toInt128OrZero
  • toInt16
  • toInt16OrDefault
  • toInt16OrNull
  • toInt16OrZero
  • toInt256
  • toInt256OrDefault
  • toInt256OrNull
  • toInt256OrZero
  • toInt32
  • toInt32OrDefault
  • toInt32OrNull
  • toInt32OrZero
  • toInt64
  • toInt64OrDefault
  • toInt64OrNull
  • toInt64OrZero
  • toInt8
  • toInt8OrDefault
  • toInt8OrNull
  • toInt8OrZero
  • toIntervalDay
  • toIntervalHour
  • toIntervalMicrosecond
  • toIntervalMillisecond
  • toIntervalMinute
  • toIntervalMonth
  • toIntervalNanosecond
  • toIntervalQuarter
  • toIntervalSecond
  • toIntervalWeek
  • toIntervalYear
  • toRelativeDayNum
  • toRelativeHourNum
  • toRelativeMinuteNum
  • toRelativeMonthNum
  • toRelativeQuarterNum
  • toRelativeSecondNum
  • toRelativeWeekNum
  • toRelativeYearNum
  • toStartOfMicrosecond
  • toStartOfMillisecond
  • toStartOfNanosecond
  • toTime
  • toTimezone
  • toTypeName
  • toUInt128
  • toUInt128OrNull
  • toUInt128OrZero
  • toUInt16
  • toUInt16OrDefault
  • toUInt16OrNull
  • toUInt16OrZero
  • toUInt256
  • toUInt256OrDefault
  • toUInt256OrNull
  • toUInt256OrZero
  • toUInt32
  • toUInt32OrDefault
  • toUInt32OrNull
  • toUInt32OrZero
  • toUInt64
  • toUInt64OrDefault
  • toUInt64OrNull
  • toUInt64OrZero
  • toUInt8
  • toUInt8OrDefault
  • toUInt8OrNull
  • toUInt8OrZero
  • toUnixTimestamp64Micro
  • toUnixTimestamp64Milli

See #2022.


  • today

See #2023.


  • topLevelDomainRFC

See #2024.


  • transactionID
  • transactionLatestSnapshot
  • transactionOldestSnapshot

See #2025.


  • translateUTF8

See #2026.


  • trunc
  • truncate

See #2027.


  • tryBase58Decode
  • tryBase64Decode

See #2028.


  • tumbleEnd
  • tumbleStart

See #2029.


  • tupleIntDiv
  • tupleIntDivByNumber
  • tupleIntDivOrZero
  • tupleIntDivOrZeroByNumber
  • tupleModulo
  • tupleModuloByNumber

See #2030.


  • ucase

See #2031.


  • uniqCombined64

See #2032.


  • upper
  • upperUTF8

See #2033.


  • uptime

See #2034.


  • validateNestedArraySizes

See #2035.


  • varPopStable
  • varSampStable

See #2036.


  • version

See #2037.


  • visitParamExtractBool
  • visitParamExtractFloat
  • visitParamExtractInt
  • visitParamExtractRaw
  • visitParamExtractString
  • visitParamExtractUInt
  • visitParamHas

See #2038.


  • windowID

See #2039.


  • wkt

See #2040.


  • wyHash64

See #2041.


  • yandexConsistentHash

See #2042.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions-project Issues specific to the functions documentation project
Projects
None yet
Development

No branches or pull requests

2 participants