Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6951: Row set based mock data source #1809

Merged
merged 1 commit into from Jul 15, 2019

Conversation

paul-rogers
Copy link
Contributor

@paul-rogers paul-rogers commented Jun 19, 2019

The mock data source is used in several tests to generate a large volume of sample data, such as when testing spilling. The mock data source also lets us try new plugin featues in a very simple context. During the development of the row set framework, the mock data source was converted to use the new framework to verify functionality. This commit upgrades the mock data source with that work.

The work changes non of the functionality. It does, however, improve memory usage. Batches are limited, by default, to 10 MB in size. The row set framework minimizes internal fragmentation in the largest vector. (Previously, internal fragmentation averaged 25% but could be as high as 50%.)

As it turns out, the hash aggregate tests depended on the internal fragmentation: without it, the hash agg no longer spilled for the same row count. Adjusted the generated row counts to recreate a data volume that caused spilling.

One test in particular always failed due to assertions in the hash agg code. These seem true bugs and are described in DRILL-7301. After multiple failed attempts to get the test to work, it ws disabled until DRILL-7301 is fixed.

Added a new unit test to sanity check the mock data source. (No test existed for this functionality except as verified via other unit tests.)

@paul-rogers paul-rogers changed the title Drill 6951 DRILL-6951: Merge row set based mock data source Jun 19, 2019
@paul-rogers paul-rogers changed the title DRILL-6951: Merge row set based mock data source DRILL-6951: Row set based mock data source Jun 20, 2019
@arina-ielchiieva
Copy link
Member

@paul-rogers looks good, please squash the commits.

@paul-rogers
Copy link
Contributor Author

Squashed commits and rebased on latest master.

@paul-rogers
Copy link
Contributor Author

Rebased onto master.

@arina-ielchiieva
Copy link
Member

@paul-rogers please resolve the conflicts.

@paul-rogers
Copy link
Contributor Author

Rebased on master. Tests fail with two unrelated failures (these failures occur about half the time I've run full tests over the last few months):

[ERROR] Failures: 
[ERROR]   TestDynamicUDFSupport.testDropFunction:555 Binary should exist in local udf directory
[ERROR] Errors: 
[ERROR]   TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340 » Rpc

The mock data source is used in several tests to generate a large volume
of sample data, such as when testing spilling. The mock data source also
lets us try new plugin featues in a very simple context. During the
development of the row set framework, the mock data source was converted
to use the new framework to verify functionality. This commit upgrades
the mock data source with that work.

The work changes non of the functionality. It does, however, improve
memory usage. Batchs are limited, by default, to 10 MB in size. The row
set framework minimizes internal fragmentation in the largest vector.
(Previously, internal fragmentation averaged 25% but could be as high as
50%.)

As it turns out, the hash aggregate tests depended on the internal
fragmentation: without it, the hash agg no longer spilled for the same
row count. Adjusted the generated row counts to recreate a data volume
that caused spilling.

One test in particular always failed due to assertions in the hash agg
code. These seem true bugs and are described in DRILL-7301. After
multiple failed attempts to get the test to work, it ws disabled until
DRILL-7301 is fixed.

Added a new unit test to sanity check the mock data source. (No test
already existed for this functionality except as verified via other unit
tests.)
@arina-ielchiieva
Copy link
Member

+1

@arina-ielchiieva arina-ielchiieva merged commit 3599dfd into apache:master Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants