Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7978: Fixed Width Format Plugin #2282

Draft
wants to merge 46 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8fd4018
Start of fixed width format plugin
Jul 15, 2021
5d1ea8b
Work in Progress. Producing Rows. Currently complains about buffer no…
Jul 22, 2021
7ef5cd3
First working version
Jul 26, 2021
6f4a2e7
Added more data types, refactored code
Jul 29, 2021
f59d4e4
Checkstyle fixes
Jul 29, 2021
9f2648c
Removed println statement from Batch Reader, Simplified logic
Jul 29, 2021
8c3f6eb
Modified format, fixed maxRecords in next(), modified Exception handl…
Aug 6, 2021
7c3b5a2
Addressing Review Comments.
Sep 13, 2021
57d49db
Added Serialization/Deserialization test, added blank row test file, …
Oct 15, 2021
1a4818e
Fixed Serialization/Deserialization test
Oct 20, 2021
1f1051e
Added another constructor to enable user to not have to enter dateTim…
Nov 4, 2021
cb1b932
Added method to validate field name input and verify there are no dup…
Nov 5, 2021
8419862
Added two getters to FixedwidthFormatConfig to prep for offset verifi…
Nov 16, 2021
31e1549
Added a check for overlapping fields
estherbuchwalter Nov 17, 2021
07edbde
Updated check for overlapping fields
estherbuchwalter Nov 18, 2021
fa47a14
Added field validation for data types, indices, width. Includes creat…
Nov 23, 2021
523366a
Modified validation for field width and field index. Added comments t…
Nov 24, 2021
1a91592
Added to field validation for field names. Checks for valid length an…
estherbuchwalter Nov 24, 2021
4b221b5
WIP converting to EVF v2. Pushing to repo for troubleshooting purposes.
Dec 10, 2021
4875367
Start of fixed width format plugin
Jul 15, 2021
ef0bc82
Work in Progress. Producing Rows. Currently complains about buffer no…
Jul 22, 2021
be14e25
First working version
Jul 26, 2021
c9014d2
Added more data types, refactored code
Jul 29, 2021
6d7a2a5
Checkstyle fixes
Jul 29, 2021
056df13
Removed println statement from Batch Reader, Simplified logic
Jul 29, 2021
d2097b3
Modified format, fixed maxRecords in next(), modified Exception handl…
Aug 6, 2021
f2920a0
Addressing Review Comments.
Sep 13, 2021
7a68da5
Added Serialization/Deserialization test, added blank row test file, …
Oct 15, 2021
e0110da
Fixed Serialization/Deserialization test
Oct 20, 2021
f01f1aa
Added another constructor to enable user to not have to enter dateTim…
Nov 4, 2021
5da7a77
Added method to validate field name input and verify there are no dup…
Nov 5, 2021
22ccbcb
Added two getters to FixedwidthFormatConfig to prep for offset verifi…
Nov 16, 2021
ecf6fb8
Added a check for overlapping fields
estherbuchwalter Nov 17, 2021
1978e14
Updated check for overlapping fields
estherbuchwalter Nov 18, 2021
a79f8a5
Added field validation for data types, indices, width. Includes creat…
Nov 23, 2021
32e5312
Modified validation for field width and field index. Added comments t…
Nov 24, 2021
4d534df
Added to field validation for field names. Checks for valid length an…
estherbuchwalter Nov 24, 2021
aa74ec5
WIP converting to EVF v2. Pushing to repo for troubleshooting purposes.
Dec 10, 2021
1972fb9
Updating pom.xml with new drill snapshot version
Mar 22, 2022
f498123
Merge branch 'apache:master' into format-fixedwidth
MFoss19 Mar 22, 2022
1e75757
Renamed classes
tswagger Mar 22, 2022
a134619
Merge branch 'master' into format-fixedwidth
tswagger Mar 22, 2022
dfe894e
Merge remote-tracking branch 'megan/format-fixedwidth' into format-fi…
tswagger Mar 22, 2022
28df7b2
Merge branch 'format-fixedwidth' of github.com:MFoss19/drill into for…
Mar 22, 2022
b68c54d
Merge branch 'format-fixedwidth' of github.com:MFoss19/drill into for…
Mar 22, 2022
bf6a16c
Updated pom.xml
tswagger Mar 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
85 changes: 85 additions & 0 deletions contrib/format-fixedwidth/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
<?xml version="1.0"?>
<!--

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
MFoss19 marked this conversation as resolved.
Show resolved Hide resolved
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>drill-contrib-parent</artifactId>
<groupId>org.apache.drill.contrib</groupId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>drill-format-fixedwidth</artifactId>
<name>Drill : Contrib : Format : FixedWidth</name>

<dependencies>
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-java-exec</artifactId>
<version>${project.version}</version>
</dependency>
<!-- <dependency>-->
<!-- <groupId>com.epam</groupId>-->
<!-- <artifactId>parso</artifactId>-->
<!-- <version>2.0.14</version>-->
<!-- </dependency>-->

<!-- Test dependencies -->
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-java-exec</artifactId>
<classifier>tests</classifier>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.drill</groupId>
<artifactId>drill-common</artifactId>
<classifier>tests</classifier>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<executions>
<execution>
<id>copy-java-sources</id>
<phase>process-sources</phase>
<goals>
<goal>copy-resources</goal>
</goals>
<configuration>
<outputDirectory>${basedir}/target/classes/org/apache/drill/exec/store/fixedwidth</outputDirectory>
<resources>
<resource>
<directory>src/main/java/org/apache/drill/exec/store/fixedwidth</directory>
<filtering>true</filtering>
</resource>
</resources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
package org.apache.drill.exec.store.fixedwidth;

import org.apache.drill.common.AutoCloseables;
import org.apache.drill.common.exceptions.CustomErrorContext;
import org.apache.drill.common.exceptions.UserException;
import org.apache.drill.common.types.TypeProtos;
import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
import org.apache.drill.exec.record.metadata.SchemaBuilder;
import org.apache.drill.exec.record.metadata.TupleMetadata;
import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
import org.apache.hadoop.mapred.FileSplit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.BufferedReader;
//import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

public class FixedWidthBatchReader implements ManagedReader {

private final int maxRecords; // Do we need this?
private final FixedWidthFormatConfig config;
private InputStream fsStream;
private ResultSetLoader loader;
private FileSplit split;
private CustomErrorContext errorContext;
private static final Logger logger = LoggerFactory.getLogger(FixedWidthBatchReader.class);
private BufferedReader reader;

public FixedWidthBatchReader(FileSchemaNegotiator negotiator, FixedWidthFormatConfig config, int maxRecords) {
this.loader = open(negotiator);
this.config = config;
this.maxRecords = maxRecords;
}

@Override
public boolean next() {
return true;
}

@Override
public void close() {
if (fsStream != null){
AutoCloseables.closeSilently(fsStream);
fsStream = null;
}
}

private ResultSetLoader open(FileSchemaNegotiator negotiator) {
// this.split = (FileSplit) negotiator.split();
this.errorContext = negotiator.parentErrorContext();
// openFile(negotiator);

try {
negotiator.tableSchema(buildSchema(), true);
this.loader = negotiator.build();
} catch (Exception e) {
throw UserException
.dataReadError(e)
.message("Failed to open input file: {}", this.split.getPath().toString())
.addContext(this.errorContext)
.addContext(e.getMessage())
.build(FixedWidthBatchReader.logger);
}
this.reader = new BufferedReader(new InputStreamReader(this.fsStream, Charsets.UTF_8));
return this.loader;
}

// private void openFile(FileSchemaNegotiator negotiator) {
// try {
// this.fsStream = negotiator.file().fileSystem().openPossiblyCompressedStream(this.split.getPath());
// sasFileReader = new SasFileReaderImpl(this.fsStream);
// firstRow = sasFileReader.readNext();
// } catch (IOException e) {
// throw UserException
// .dataReadError(e)
// .message("Unable to open Fixed Width File %s", this.split.getPath())
// .addContext(e.getMessage())
// .addContext(this.errorContext)
// .build(FixedWidthBatchReader.logger);
// }
// }

private TupleMetadata buildSchema() {
SchemaBuilder builder = new SchemaBuilder();
for (FixedWidthFieldConfig field : config.getFields()) {
if (field.getType() == TypeProtos.MinorType.VARDECIMAL){
builder.addNullable(field.getName(), TypeProtos.MinorType.VARDECIMAL,38,4);
//revisit this
} else {
builder.addNullable(field.getName(), field.getType());
}
}
return builder.buildSchema();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.drill.exec.store.fixedwidth;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonTypeName;
import org.apache.drill.common.PlanStringBuilder;
import org.apache.drill.common.types.TypeProtos;

import java.util.Objects;


@JsonTypeName("fixedwidthReaderFieldDescription")
@JsonInclude(JsonInclude.Include.NON_DEFAULT)
public class FixedWidthFieldConfig implements Comparable<FixedWidthFieldConfig> {

private final String name;
private final int index;
private int width;
private TypeProtos.MinorType type;
private final String dateTimeFormat;

public FixedWidthFieldConfig(@JsonProperty("name") String name,
@JsonProperty("index") int index,
@JsonProperty("width") int width,
@JsonProperty("type") TypeProtos.MinorType type) {
this(name, index, width, type, null);
}

@JsonCreator
public FixedWidthFieldConfig(@JsonProperty("name") String name,
@JsonProperty("index") int index,
@JsonProperty("width") int width,
@JsonProperty("type") TypeProtos.MinorType type,
@JsonProperty("dateTimeFormat") String dateTimeFormat) {
this.name = name;
this.index = index;
this.width = width;
this.type = type;
this.dateTimeFormat = dateTimeFormat;
}

public String getName() {return name;}

public int getIndex() {return index;}

public int getWidth() {return width;}

public TypeProtos.MinorType getType() {return type;}

public void setType() {
this.type = TypeProtos.MinorType.VARCHAR;
}

public String getDateTimeFormat() {return dateTimeFormat;}

@Override
public int hashCode() {
return Objects.hash(name, index, width, type, dateTimeFormat);
}

@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null || getClass() != obj.getClass()) {
return false;
}
FixedWidthFieldConfig other = (FixedWidthFieldConfig) obj;
return Objects.equals(name, other.name)
&& Objects.equals(index, other.index)
&& Objects.equals(width, other.width)
&& Objects.equals(type, other.type)
&& Objects.equals(dateTimeFormat, other.dateTimeFormat);
}

@Override
public String toString() {
return new PlanStringBuilder(this)
.field("name", name)
.field("index", index)
.field("width", width)
.field("type", type)
.field("dateTimeFormat", dateTimeFormat)
.toString();
}

@Override
public int compareTo(FixedWidthFieldConfig o) {
return Integer.compare(this.getIndex(), o.getIndex());
}
}