Skip to content

Commit

Permalink
Merge pull request #54097 from bharatnc/ncb/decode-html-component
Browse files Browse the repository at this point in the history
add function decodeHTMLComponent
  • Loading branch information
robot-clickhouse-ci-1 committed Sep 4, 2023
2 parents 476e15c + 8ecbdd2 commit ec628ee
Show file tree
Hide file tree
Showing 11 changed files with 20,511 additions and 0 deletions.
36 changes: 36 additions & 0 deletions docs/en/sql-reference/functions/string-functions.md
Expand Up @@ -1230,6 +1230,42 @@ Result:
< Σ >
```

## decodeHTMLComponent

Un-escapes substrings with special meaning in HTML. For example: `&hbar;` `&gt;` `&diamondsuit;` `&heartsuit;` `&lt;` etc.

This function also replaces numeric character references with Unicode characters. Both decimal (like `&#10003;`) and hexadecimal (`&#x2713;`) forms are supported.

**Syntax**

``` sql
decodeHTMComponent(x)
```

**Arguments**

- `x` — An input string. [String](../../sql-reference/data-types/string.md).

**Returned value**

- The un-escaped string.

Type: [String](../../sql-reference/data-types/string.md).

**Example**

``` sql
SELECT decodeHTMLComponent(''CH');
SELECT decodeHMLComponent('I&heartsuit;ClickHouse');
```
Result:
```result
'CH'
I♥ClickHouse'
```

## extractTextFromHTML

This function extracts plain text from HTML or XHTML.
Expand Down
21 changes: 21 additions & 0 deletions src/Functions/CMakeLists.txt
Expand Up @@ -124,6 +124,27 @@ if (ENABLE_FUZZING)
add_compile_definitions(FUZZING_MODE=1)
endif ()

if (USE_GPERF)
# Only for regenerating
add_custom_target(generate-html-char-ref-gperf ./HTMLCharacterReference.sh
SOURCES ./HTMLCharacterReference.sh
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
BYPRODUCTS "${CMAKE_CURRENT_SOURCE_DIR}/HTMLCharacterReference.gperf"
)
add_custom_target(generate-html-char-ref ${GPERF} -t HTMLCharacterReference.gperf --output-file=HTMLCharacterReference.generated.cpp
&& clang-format -i HTMLCharacterReference.generated.cpp
# for clang-tidy, since string.h is deprecated
&& sed -i 's/\#include <string.h>/\#include <cstring>/g' HTMLCharacterReference.generated.cpp
SOURCES HTMLCharacterReference.gperf
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
)
add_dependencies(generate-html-char-ref generate-html-char-ref-gperf)
if (NOT TARGET generate-source)
add_custom_target(generate-source)
endif ()
add_dependencies(generate-source generate-html-char-ref)
endif ()

target_link_libraries(clickhouse_functions_obj PUBLIC ${PUBLIC_LIBS} PRIVATE ${PRIVATE_LIBS})

# Used to forward the linking information to the final binaries such as clickhouse / unit_tests_dbms,
Expand Down

0 comments on commit ec628ee

Please sign in to comment.