Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-7821] [table] Deprecate Table.limit() and replace it by Table.offset() and Table.fetch(). #4813

Closed
wants to merge 1 commit into from

Conversation

fhueske
Copy link
Contributor

@fhueske fhueske commented Oct 12, 2017

What is the purpose of the change

  • Table.limit(n) has unexpected semantics of returning all but the first n rows.
  • Returning the first n rows from a sorted result requires to specify a 0 offset (table.orderBy(...).limit(0, n)) which is more complicated than necessary.

Brief change log

  • deprecate Table.limit() and replace it by Table.offset() and Table.fetch()

Verifying this change

  • existing tests for limit were adapted for offset and fetch.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no

Documentation

  • Documentation was adapted.

@@ -27,10 +27,26 @@ import org.junit._
class SortValidationTest extends TableTestBase {
Copy link

@hustfxj hustfxj Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had better add the following two validation tests:

 ds.orderBy('a.asc).offset(10).offset(10)
 ds.orderBy('a.asc).fetch(5)

Now fetch can only be used after offset, maybe the fetch can be used after orderBy. Then ds.orderBy('a.asc).fetch(5) is equivalent with ds.orderBy('a.asc).offset(0)fetch(5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll add a test that validates that table.orderBy('a.asc).offset(10).offset(10) is rejected.

table.orderBy('a.asc).fetch(5) is already supported as you suggested (see the modified SortITCase).

Copy link
Member

@xccui xccui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @fhueske. It looks good. I just left some duplicated minor comments about the semantics.

Thanks, Xingcan

// returns all records beginning with the 4th record from the sorted result
Table result2 = in.orderBy("a.asc").offset(3);

// returns the first 5 records beginning with the 10th record from the sorted result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 11th?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it must be 10th, because SQL indices start at 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @xccui is right. An offset of 10 means that the first 10 rows are omitted from the result.
Otherwise offset 1 would not offset.
I'll update the docs.

// returns all records beginning with the 4th record from the sorted result
val result2: Table = in.orderBy('a.asc).offset(3)

// returns the first 5 records beginning with the 10th record from the sorted result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 11th?

* thus must be preceded by it.
*
* [[Table.offset(o)]] can be combined with a subsequent [[Table.fetch(n)]] call to return the
* first n rows starting from the offset position o.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, "from the offset position o" means "from the (o+1)th record", right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add a notice about the SQL indices here, as it might confuses users. Or you add skips 3 rows and begins with the 4th record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

* {{{
* // returns unlimited number of records beginning with the 4th record
* tab.orderBy('name.desc).offset(3)
* // return the first 5 records starting from the 10th record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11th?

* {{{
* // returns the first 3 records.
* tab.orderBy('name.desc).fetch(3)
* // return the first 5 records starting from the 10th record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11th?

*
* @param fetch the number of records to return
*/
def fetch(fetch: Int): Table = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fetch: Int < 0 is allowed, maybe we should explicitly define the semantics of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks @xccui

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fhueske. I also added some comments.

*/
@deprecated(message = "Deprecated in favor of Table.offset() and Table.fetch()", since = "1.4.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of deprecation does not work for Java. Please add the Java annotation as well.

// returns all records beginning with the 4th record from the sorted result
Table result2 = in.orderBy("a.asc").offset(3);

// returns the first 5 records beginning with the 10th record from the sorted result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it must be 10th, because SQL indices start at 1.

* thus must be preceded by it.
*
* [[Table.offset(o)]] can be combined with a subsequent [[Table.fetch(n)]] call to return the
* first n rows starting from the offset position o.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add a notice about the SQL indices here, as it might confuses users. Or you add skips 3 rows and begins with the 4th record.

@fhueske
Copy link
Contributor Author

fhueske commented Oct 24, 2017

Thanks for the reviews @xccui and @twalthr.
I'll address the feedback and merge this PR.

fhueske added a commit to fhueske/flink that referenced this pull request Oct 24, 2017
@asfgit asfgit closed this in e1c7e49 Oct 24, 2017
@fhueske fhueske deleted the tableOffsetFetch branch October 24, 2017 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants