Skip to content

[Enhancement](Load) stream tvf support json#23752

Merged
yiguolei merged 2 commits intoapache:masterfrom
zzzzzzzs:http_stream_json
Sep 1, 2023
Merged

[Enhancement](Load) stream tvf support json#23752
yiguolei merged 2 commits intoapache:masterfrom
zzzzzzzs:http_stream_json

Conversation

@zzzzzzzs
Copy link
Contributor

@zzzzzzzs zzzzzzzs commented Sep 1, 2023

Proposed changes

Issue Number: close #23678

Further comments

stream tvf support json

[{"id":1, "name":"ftw", "age":18}]
[{"id":2, "name":"xxx", "age":17}]
[{"id":3, "name":"yyy", "age":19}]

example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select id, name from http_stream(\"format\" = \"json\", \"strip_outer_array\" = \"true\", \"read_json_by_line\" = \"true\")" -T /root/json_file.json http://127.0.0.1:8030/api/_http_stream

@zzzzzzzs
Copy link
Contributor Author

zzzzzzzs commented Sep 1, 2023

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.29 seconds
stream load tsv: 538 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162058540 Bytes

if (getTFileType() == TFileType.FILE_STREAM && (formatString.equals("csv_with_names")
|| formatString.equals("csv_with_names_and_types")
|| formatString.equals("parquet")
|| formatString.equals("avro")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        throw new AnalysisException("current http_stream does not yet support **json**, parquet and orc");

}
}

qt_sql1 "select id, city, code from ${tableName1}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add order by

}
}

qt_sql2 "select id, city, code from ${tableName2}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add order by

}
}

qt_sql4 "select id, code from ${tableName4}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add order by

}
}

qt_sql3 "select id, city, code from ${tableName3}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add order by

@zzzzzzzs
Copy link
Contributor Author

zzzzzzzs commented Sep 1, 2023

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.3 seconds
stream load tsv: 539 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17161868837 Bytes

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 6630f92 into apache:master Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Improve the functionality of http_stream tvf

4 participants