Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create external table should fail to parse if syntax is incorrect #4262

Closed
3 tasks done
andygrove opened this issue Nov 18, 2022 · 6 comments · Fixed by #4590
Closed
3 tasks done

create external table should fail to parse if syntax is incorrect #4262

andygrove opened this issue Nov 18, 2022 · 6 comments · Fixed by #4590
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@andygrove
Copy link
Member

andygrove commented Nov 18, 2022

Describe the bug
When using create external table with csv files we can specify with header row to indicate that the file contains headers.

I ran this and forgot the syntax and omitted the word row and it did not process the header row and also did not fail and tell me that the syntax was incorrect.

To Reproduce

❯ create external table taxi_zone stored as csv with header location "/mnt/bigdata/nyctaxi/taxi+_zone_lookup.csv";
0 rows in set. Query took 0.008 seconds.
❯ select * from taxi_zone limit 10;
+------------+---------------+-------------------------+--------------+
| column_1   | column_2      | column_3                | column_4     |
+------------+---------------+-------------------------+--------------+
| LocationID | Borough       | Zone                    | service_zone |
| 1          | EWR           | Newark Airport          | EWR          |
| 2          | Queens        | Jamaica Bay             | Boro Zone    |
| 3          | Bronx         | Allerton/Pelham Gardens | Boro Zone    |
| 4          | Manhattan     | Alphabet City           | Yellow Zone  |
| 5          | Staten Island | Arden Heights           | Boro Zone    |
| 6          | Staten Island | Arrochar/Fort Wadsworth | Boro Zone    |
| 7          | Queens        | Astoria                 | Boro Zone    |
| 8          | Queens        | Astoria Park            | Boro Zone    |
| 9          | Queens        | Auburndale              | Boro Zone    |
+------------+---------------+-------------------------+--------------+

Expected behavior
SQL should have failed to parse.

Additional context
None

Task list

@andygrove andygrove added bug Something isn't working good first issue Good for newcomers labels Nov 18, 2022
@HaoYang670
Copy link
Contributor

These queries also fail to parse

❯ create external table t2 stored as csv with location "../datafusion/core/tests/example.csv";
❯ select * from t2;
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a        | b        | c        |
| 1        | 2        | 3        |
+----------+----------+----------+

❯ create external table t3 stored as csv header location "../datafusion/core/tests/example.csv";
❯ select * from t3;
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a        | b        | c        |
| 1        | 2        | 3        |
+----------+----------+----------+

❯ create external table t4 stored as csv row location "../datafusion/core/tests/example.csv";
❯ select * from t4;
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a        | b        | c        |
| 1        | 2        | 3        |
+----------+----------+----------+

❯ create external table t5 stored as csv header row location "../datafusion/core/tests/example.csv";
❯ select * from t5;
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a        | b        | c        |
| 1        | 2        | 3        |
+----------+----------+----------+

@HaoYang670
Copy link
Contributor

HaoYang670 commented Nov 18, 2022

Same errors when parsing compression type and partitioned by:

DataFusion CLI v14.0.0
❯ create external table t5 stored as csv with header row compression location "../datafusion/core/tests/example.csv";
0 rows in set. Query took 0.007 seconds.
❯ select * from t5;
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
DataFusion CLI v14.0.0
❯ create external table t5 stored as csv with header row partitioned location "../datafusion/core/tests/example.csv";
0 rows in set. Query took 0.013 seconds.
❯ select * from t5;
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
1 row in set. Query took 0.009 seconds.

@HaoYang670

This comment was marked as duplicate.

@HaoYang670
Copy link
Contributor

By the way, we implement our own consume_token in the datafusion currently, Should we update the comsume_token function in sqlparser (which I mean ignore case #1237) so that we can remove the deplicate implementation in Datafusion.

@HaoYang670
Copy link
Contributor

I've file an issue in the sqlparser to add COMPRESSION as a Keyword: sqlparser-rs/sqlparser-rs#718

@HaoYang670 HaoYang670 added the waiting-on-upstream PR is waiting on an upstream dependency to be updated label Nov 21, 2022
@HaoYang670
Copy link
Contributor

Remove the waiting-on-upstream label because the sqlparser-rs has been updated to version 28.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants