forked from apache/asterixdb
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ASTERIXDB-2713][EXT] CSV & TSV support for external dataset p3
- user model changes: no - storage format changes: no - interface changes: yes IRecordDataParser, IRecordReader, IRecordConverter Details: - record parser: - delimited-data (CSV/TSV) parser: ignore and warn for invalid records. - other parses: continue to use their existing behaviour. - stream parser: continue to use their existing behaviour. - fixes: - fixed S3 stream read() to properly advance to next files and also to notify consumers to handle properties like header properly. - fixed localfs stream read() when reached end of current file and notifying of a new file source. - extracted the read() of both streams since now they are identical. - report file, record number and field number in warnings of parser - propagate stream name to parsers that need report stream name - add test cases Change-Id: Ie1ba545d753d8afef9cef4e290e058019a465201 Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/5926 Reviewed-by: Ali Alsuliman <ali.al.solaiman@gmail.com> Reviewed-by: Murtadha Hubail <mhubail@apache.org> Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
- Loading branch information
1 parent
36ae101
commit 2bbcdd8
Showing
145 changed files
with
1,930 additions
and
254 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +0,0 @@ | ||
|
||
|
||
|
||
|
||
|
||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
f1,f2,f3,f4 | ||
1,2,3,"str" | ||
4,5,6,"rts" | ||
7,8,9,"srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
f1,f2,f3,f4 | ||
1,2,3,"str" | ||
4,5,6,"rts" | ||
7,8,9,"srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
f1,f2,f3,f4 | ||
1,2,3,"str" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
f1,f2,f3,f4 | ||
1,2,3,"str" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
f1,f2,f3,f4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
f1,f2,f3,f4 |
3 changes: 3 additions & 0 deletions
3
asterixdb/asterix-app/data/csv/no_header/no_h_missing_fields.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1,2,3,"str" | ||
4,5,6 | ||
7,8,9,"srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1,2,3,"str" | ||
4,5,6,"rts" | ||
7,8,9,"srt" |
3 changes: 3 additions & 0 deletions
3
asterixdb/asterix-app/data/csv/no_header/no_h_mul_rec_with_ln.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1,2,3,"str" | ||
4,5,6,"rts" | ||
7,8,9,"srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
1,2,3,"5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
1,2,3,"str" |
1 change: 1 addition & 0 deletions
1
asterixdb/asterix-app/data/csv/no_header/no_h_one_rec_with_ln.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
1,2,3,"str" |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
f1 f2 f3 f4 | ||
1 2 3 "str" | ||
4 5 6 "rts" | ||
7 8 9 "srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
f1 f2 f3 f4 | ||
1 2 3 "str" | ||
4 5 6 "rts" | ||
7 8 9 "srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
f1 f2 f3 f4 | ||
1 2 3 "str" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
f1 f2 f3 f4 | ||
1 2 3 "str" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
f1 f2 f3 f4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
f1 f2 f3 f4 |
3 changes: 3 additions & 0 deletions
3
asterixdb/asterix-app/data/tsv/no_header/no_h_missing_fields.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1 2 3 "str" | ||
4 5 6 | ||
7 8 9 "srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1 2 3 "str" | ||
4 5 6 "rts" | ||
7 8 9 "srt" |
3 changes: 3 additions & 0 deletions
3
asterixdb/asterix-app/data/tsv/no_header/no_h_mul_rec_with_ln.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
1 2 3 "str" | ||
4 5 6 "rts" | ||
7 8 9 "srt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
1 2 3 "str" |
1 change: 1 addition & 0 deletions
1
asterixdb/asterix-app/data/tsv/no_header/no_h_one_rec_with_ln.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
1 2 3 "str" |
55 changes: 55 additions & 0 deletions
55
...va/org/apache/asterix/test/external_dataset/aws/AwsS3ExternalDatasetOnePartitionTest.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.asterix.test.external_dataset.aws; | ||
|
||
import java.util.Collection; | ||
|
||
import org.apache.asterix.test.runtime.LangExecutionUtil; | ||
import org.apache.asterix.testframework.context.TestCaseContext; | ||
import org.apache.logging.log4j.LogManager; | ||
import org.apache.logging.log4j.Logger; | ||
import org.junit.FixMethodOrder; | ||
import org.junit.runner.RunWith; | ||
import org.junit.runners.MethodSorters; | ||
import org.junit.runners.Parameterized; | ||
|
||
/** | ||
* Runs an AWS S3 mock server and test it as an external dataset using one node one partition. | ||
*/ | ||
@RunWith(Parameterized.class) | ||
@FixMethodOrder(MethodSorters.NAME_ASCENDING) | ||
public class AwsS3ExternalDatasetOnePartitionTest extends AwsS3ExternalDatasetTest { | ||
|
||
private static final Logger LOGGER = LogManager.getLogger(); | ||
private static final String SUITE_PATH = "testsuite_external_dataset_one_partition.xml"; | ||
|
||
@Parameterized.Parameters(name = "SqlppExecutionTest {index}: {0}") | ||
public static Collection<Object[]> tests() throws Exception { | ||
TEST_CONFIG_FILE_NAME = "src/test/resources/cc-single.conf"; | ||
PREPARE_S3_BUCKET = AwsS3ExternalDatasetOnePartitionTest::prepareS3Bucket; | ||
return LangExecutionUtil.tests("only_external_dataset.xml", SUITE_PATH); | ||
} | ||
|
||
public AwsS3ExternalDatasetOnePartitionTest(TestCaseContext tcCtx) { | ||
super(tcCtx); | ||
} | ||
|
||
private static void prepareS3Bucket() { | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
20 changes: 20 additions & 0 deletions
20
...ntimets/queries_sqlpp/external-dataset/aws/s3/csv-header/query-dataset.000.s3bucket.sqlpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
// create S3 bucket with data | ||
playground data_dir data/csv/empty.csv |
37 changes: 37 additions & 0 deletions
37
...es/runtimets/queries_sqlpp/external-dataset/aws/s3/csv-header/query-dataset.001.ddl.sqlpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
DROP DATAVERSE test IF EXISTS; | ||
CREATE DATAVERSE test; | ||
USE test; | ||
|
||
DROP TYPE t1 IF EXISTS; | ||
CREATE TYPE t1 AS {f1: int, f2: int, f3: int, f4: string}; | ||
|
||
DROP DATASET ds1 IF EXISTS; | ||
CREATE EXTERNAL DATASET ds1(t1) USING S3 ( | ||
("accessKey"="dummyAccessKey"), | ||
("secretKey"="dummySecretKey"), | ||
("region"="us-west-2"), | ||
("serviceEndpoint"="http://localhost:8001"), | ||
("container"="playground"), | ||
("definition"="data_dir"), | ||
("format"="CSV"), | ||
("header"="true") | ||
); |
Oops, something went wrong.