Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCSUP-3824: Documented the Regex and Raw formats #16790

Merged
merged 8 commits into from
Dec 24, 2020
Merged

DOCSUP-3824: Documented the Regex and Raw formats #16790

merged 8 commits into from
Dec 24, 2020

Conversation

olgarev
Copy link
Contributor

@olgarev olgarev commented Nov 8, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category:

@robot-clickhouse robot-clickhouse added pr-documentation Documentation PRs for the specific code PR submodule changed At least one submodule changed in this PR. labels Nov 8, 2020
@robot-clickhouse robot-clickhouse removed the submodule changed At least one submodule changed in this PR. label Nov 8, 2020
@@ -1290,6 +1291,89 @@ $ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT OR

To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-engines/integrations/hdfs.md).

## LineAsString {#lineasstring}

In this format, a sequence of string objects separated by a newline character is interpreted as a single value. This format can only be parsed for table with a single field of type [String](../sql-reference/data-types/string.md). The remaining columns must be set to [DEFAULT](../sql-reference/statements/create/table.md#default) or [MATERIALIZED](../sql-reference/statements/create/table.md#materialized), or omitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see newline chars in the example. Maybe we talk about any string sequence including strings terminated with newlines?


**Usage**

The regular expression from `format_regexp` setting is applied to every line of imported data. The number of subpatterns in the regular expression must be equal to the number of columns in imported dataset.
Copy link
Contributor

@BayoNet BayoNet Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the first sentence of the whole description must look like this one with lower details.

@olgarev olgarev marked this pull request as ready for review November 10, 2020 13:56
@olgarev olgarev requested a review from BayoNet December 6, 2020 02:06
@Avogar Avogar self-assigned this Dec 16, 2020

The regular expression from `format_regexp` setting is applied to every line of imported data. The number of subpatterns in the regular expression must be equal to the number of columns in imported dataset.

Lines of the imported data must be separated by newline character `'\n'` or DOS-style newline `"\r\n"` (except the `Raw` format, which does not support any escaping characters).
Copy link
Member

@Avogar Avogar Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except the Raw format, which does not support any escaping characters

It's not correct, Raw escaping rule doesn't affect the way lines should be separated at all. Just remove this part about Raw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Avogar The problem is that I had to re-fork the repository since this PR was published, so I cannot reach the branch to update it. Will you, please, merge these changes into master, and I will immediately create a new branch with fixes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olgarev I can't merge this PR because it has conflicts with current master and I can't resolve it because the repository with this branch doesn't exist anymore. So, you can create a new branch, copy changes from this PR, update them and make a new PR, after creating a new PR you can close this one and leave a link to the new one here.

@Avogar Avogar mentioned this pull request Dec 24, 2020
@Avogar Avogar merged commit bfda5e7 into ClickHouse:master Dec 24, 2020
@Avogar
Copy link
Member

Avogar commented Dec 24, 2020

@olgarev Hi! I figured out how to resolve conflicts and merge these changes. So, now you can add fixes.

@olgarev
Copy link
Contributor Author

olgarev commented Dec 27, 2020

@Avogar Thank you! It looks like you've already done the necessary changes (removed the wrong phrase), so I need to do nothing. Thanks again )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-documentation Documentation PRs for the specific code PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants