Skip to content

Conversation

@aprimadi
Copy link
Contributor

@aprimadi aprimadi commented May 7, 2023

Which issue does this PR close?

Closes #6147
Closes #1736

Rationale for this change

N/A

What changes are included in this PR?

  • Modify ListingTableFactory to create file extension configuration based on external table location instead of default file format extension
  • Add test

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label May 7, 2023
@aprimadi
Copy link
Contributor Author

aprimadi commented May 7, 2023

Nvm, this isn't as easy as I thought it would be. Will work more on this tomorrow if time allows.

@github-actions github-actions bot removed the core Core DataFusion crate label May 8, 2023
@github-actions github-actions bot added the core Core DataFusion crate label May 8, 2023
@aprimadi aprimadi changed the title Change CsvReadOptions default file_extension to None Fix CREATE EXTERNAL TABLE don't work with non-standard extension May 8, 2023
@aprimadi aprimadi changed the title Fix CREATE EXTERNAL TABLE don't work with non-standard extension Fix CREATE EXTERNAL TABLE don't work with non-standard file ext May 8, 2023
@aprimadi aprimadi changed the title Fix CREATE EXTERNAL TABLE don't work with non-standard file ext Fix CREATE EXTERNAL TABLE doesn't work with non-standard file ext May 8, 2023
@aprimadi aprimadi marked this pull request as ready for review May 8, 2023 15:22
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aprimadi !

I tested this out with the usecase from #6147 and it worked great

❯ CREATE EXTERNAL TABLE customer STORED AS CSV DELIMITER '|' LOCATION  '/Users/alamb/Software/arrow-datafusion/benchmarks/data/customer.tbl';
0 rows in set. Query took 0.272 seconds.
❯ select * from customer limit 10;
+----------+--------------------+---------------------------------------+----------+-----------------+----------+------------+-------------------------------------------------------------------------------------------------------------------+----------+
| column_1 | column_2           | column_3                              | column_4 | column_5        | column_6 | column_7   | column_8                                                                                                          | column_9 |
+----------+--------------------+---------------------------------------+----------+-----------------+----------+------------+-------------------------------------------------------------------------------------------------------------------+----------+
| 1        | Customer#000000001 | IVhzIApeRb ot,c,E                     | 15       | 25-989-741-2988 | 711.56   | BUILDING   | to the even, regular platelets. regular, ironic epitaphs nag e                                                    |          |
| 2        | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak        | 13       | 23-768-687-3665 | 121.65   | AUTOMOBILE | l accounts. blithely ironic theodolites integrate boldly: caref                                                   |          |
| 3        | Customer#000000003 | MG9kdTD2WBHm                          | 1        | 11-719-748-3364 | 7498.12  | AUTOMOBILE |  deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov            |          |
| 4        | Customer#000000004 | XxVSJsLAGtn                           | 4        | 14-128-190-5944 | 2866.83  | MACHINERY  |  requests. final, regular ideas sleep final accou                                                                 |          |
| 5        | Customer#000000005 | KvpyuHCplrB84WgAiGV6sYpZq7Tj          | 3        | 13-750-942-6364 | 794.47   | HOUSEHOLD  | n accounts will have to unwind. foxes cajole accor                                                                |          |
| 6        | Customer#000000006 | sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn  | 20       | 30-114-968-4951 | 7638.57  | AUTOMOBILE | tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious          |          |
| 7        | Customer#000000007 | TcGe5gaZNgVePxU5kRrvXBfkasDTea        | 18       | 28-190-982-9759 | 9561.95  | AUTOMOBILE | ainst the ironic, express theodolites. express, even pinto beans among the exp                                    |          |
| 8        | Customer#000000008 | I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5 | 17       | 27-147-574-9335 | 6819.74  | BUILDING   | among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly along the ide |          |
| 9        | Customer#000000009 | xKiAFTjUsCuxfeleNqefumTrjS            | 8        | 18-338-906-3675 | 8324.07  | FURNITURE  | r theodolites according to the requests wake thinly excuses: pending requests haggle furiousl                     |          |
| 10       | Customer#000000010 | 6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2    | 5        | 15-741-346-9870 | 2753.54  | HOUSEHOLD  | es regular deposits haggle. fur                                                                                   |          |
+----------+--------------------+---------------------------------------+----------+-----------------+----------+------------+-------------------------------------------------------------------------------------------------------------------+----------+
10 rows in set. Query took 0.059 seconds.

I also verified it worked on a partitioned table:

$ cp /Users/alamb/Software/arrow-datafusion/benchmarks/data/customer.tbl /tmp/test/
❯ CREATE EXTERNAL TABLE customer STORED AS CSV DELIMITER '|' LOCATION  '/tmp/test';
0 rows in set. Query took 0.295 seconds.
❯ select * from customer limit 1;
+----------+--------------------+-------------------+----------+-----------------+----------+----------+----------------------------------------------------------------+----------+
| column_1 | column_2           | column_3          | column_4 | column_5        | column_6 | column_7 | column_8                                                       | column_9 |
+----------+--------------------+-------------------+----------+-----------------+----------+----------+----------------------------------------------------------------+----------+
| 1        | Customer#000000001 | IVhzIApeRb ot,c,E | 15       | 25-989-741-2988 | 711.56   | BUILDING | to the even, regular platelets. regular, ironic epitaphs nag e |          |
+----------+--------------------+-------------------+----------+-----------------+----------+----------+----------------------------------------------------------------+----------+
1 row in set. Query took 0.065 seconds.

@alamb
Copy link
Contributor

alamb commented May 8, 2023

cc @andygrove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

2 participants