Skip to content

[GLUTEN-5414] [VL] Support arrow csv option and schema#5850

Merged
zhztheplayer merged 19 commits into
apache:mainfrom
jinchengchenghh:option
Jun 3, 2024
Merged

[GLUTEN-5414] [VL] Support arrow csv option and schema#5850
zhztheplayer merged 19 commits into
apache:mainfrom
jinchengchenghh:option

Conversation

@jinchengchenghh
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh commented May 23, 2024

Support basic option now, will support more options after arrow patch merged.
apache/arrow#41646
Before this patch, if the required schema is different with file schema, csv read will fallback.
And changed to use index in file instead of check the file column name considering case sensitive.
Add a new common test function when the rule applies to Logical plan.

Compile arrow with version 15.0.0-gluten, upgrade arrow-dataset and arrow-c-data version from 15.0.0 to 15.0.0-gluten

@github-actions
Copy link
Copy Markdown

#5414

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

8 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

9 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Can you help review this PR and rerun the failed test? Thanks! @zhztheplayer

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits. Thanks!

Comment thread .github/workflows/velox_docker.yml Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why quoting the name? Since we didn't do it on other names.

Comment thread .github/workflows/velox_docker.yml Outdated
Comment on lines 62 to 64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we modify velox_docker_cache.yml as well? Did you check that file already?

Comment on lines 1 to 4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The placing folder can be renamed from resources/datasource/csv to resource/arrow-datasource/csv or something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The csv file is not relevant to arrow datasource, it is just used to test Arrow.
It is just a normal CSV file, so I place it here.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

zhztheplayer
zhztheplayer previously approved these changes May 29, 2024
@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

jinchengchenghh commented May 31, 2024

Can you help merge this one? Thanks! @zhztheplayer

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2024

Run Gluten Clickhouse CI

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhztheplayer zhztheplayer merged commit ad817ed into apache:main Jun 3, 2024
@GlutenPerfBot
Copy link
Copy Markdown
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5850_time.csv log/native_master_06_02_2024_26ff58d3b_time.csv difference percentage
q1 33.45 33.06 -0.393 98.83%
q2 23.59 26.68 3.092 113.11%
q3 36.86 36.70 -0.152 99.59%
q4 30.73 35.50 4.771 115.52%
q5 68.58 69.83 1.252 101.83%
q6 7.13 5.99 -1.144 83.97%
q7 80.03 81.70 1.677 102.10%
q8 84.86 84.47 -0.388 99.54%
q9 120.61 119.12 -1.482 98.77%
q10 47.37 45.39 -1.982 95.82%
q11 20.14 20.18 0.042 100.21%
q12 25.06 25.30 0.241 100.96%
q13 37.33 37.41 0.075 100.20%
q14 18.89 20.07 1.177 106.23%
q15 31.14 30.21 -0.923 97.04%
q16 13.95 14.14 0.192 101.37%
q17 101.75 101.99 0.247 100.24%
q18 147.57 147.16 -0.405 99.73%
q19 13.44 17.14 3.701 127.53%
q20 28.93 26.85 -2.087 92.79%
q21 259.41 259.27 -0.141 99.95%
q22 12.33 12.39 0.057 100.46%
total 1243.15 1250.57 7.427 100.60%

Comment on lines +288 to +289
mvn clean install -am \
-DskipTests -Drat.skip -Dmaven.gitcommitid.skip -Dcheckstyle.skip
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need compile whole project? it's a time consuming command...
I wonder which depends module can not be compiled by -am options?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow-bom module includes many modules, so we need to compile whole project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants