Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Conversation

@sammcveety
Copy link
Contributor

@sammcveety
Copy link
Contributor Author

Rebased

/**
* Returns the base output filename for this file based sink.
*/
public String getBaseOutputFilename() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to keep this in Dataflow -- removal is a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* implies {@code #getMaxEndOffSet()}.
*/
public FileBasedSource(String fileName, long minBundleSize,
long startOffset, long endOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that there is an API gap here -- you can't do a known offset with a VP filename.

This is probably okay, because if you don't know what file you're reading how could you know what positions you need to read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

}

if (validate) {
checkState(filepattern.isAccessible(), "Cannot validate with a RVP.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a useful error message -- how does a user know what a RVP is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dhalperi
Copy link
Contributor

(Did a base review, but now I will also start diffing)

Copy link
Contributor Author

@sammcveety sammcveety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Returns the base output filename for this file based sink.
*/
public String getBaseOutputFilename() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* implies {@code #getMaxEndOffSet()}.
*/
public FileBasedSource(String fileName, long minBundleSize,
long startOffset, long endOffset) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

}

if (validate) {
checkState(filepattern.isAccessible(), "Cannot validate with a RVP.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most diffs look good. (This review is against original PR, without any fixups since my first comments.)

One minor change to improve diffs.

/**
* Creates a {@code CompressedSource} from a delegate file based source and a decompressing
* channel factory.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.base.Preconditions.checkNotNull;

import com.google.cloud.dataflow.sdk.coders.Coder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of divergence here, but LGETM: https://www.diffchecker.com/kEkSAaQW

package com.google.cloud.dataflow.sdk.io;

import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.base.Preconditions.checkState;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* the License.
*/

package com.google.cloud.dataflow.sdk.io;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* Returns an XmlSink that writes objects of the class specified as XML elements.
*
* <p>The specified class must be able to be used to create a JAXB context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import static com.google.cloud.dataflow.sdk.util.Structs.addLong;

import com.google.api.services.dataflow.model.SourceMetadata;
import com.google.cloud.dataflow.sdk.io.FileBasedSource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import com.google.cloud.dataflow.sdk.coders.VoidCoder;
import com.google.cloud.dataflow.sdk.io.BoundedSource.BoundedReader;
import com.google.cloud.dataflow.sdk.io.TextIO.CompressionType;
import com.google.cloud.dataflow.sdk.io.TextIO.TextSource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XmlSink.write()
.toFilenamePrefix(testFilePrefix)
.ofRecordClass(testClass)
.withRootElement(testRootElement);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.run();
}

/** Options for testing. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to where it is in Beam? https://www.diffchecker.com/e0gOweZQ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L317/after testRun

import com.google.cloud.dataflow.sdk.io.PubsubIO;
import com.google.cloud.dataflow.sdk.io.PubsubIO.PubsubTopic;
import com.google.cloud.dataflow.sdk.io.TextIO;
import com.google.cloud.dataflow.sdk.options.DataflowPipelineOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sammcveety
Copy link
Contributor Author

Done

@dhalperi dhalperi merged commit c737a68 into GoogleCloudPlatform:master Dec 13, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants