Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit #16648

Closed
wants to merge 6 commits into from

Conversation

bdrillard
Copy link

@bdrillard bdrillard commented Jan 19, 2017

[class_splitting] Code Generation: Constant Pool Limit

What changes were proposed in this pull request?

Supports code generation for large structures that would previously trigger a Constant Pool limit exception, as noted in SPARK-18016. In this fix, when the volume of generated code for the class would exceed 1600k bytes, a new private nested class is declared, and any new functions that would have been inlined to the outer class with an addNewFunction call are inlined to the new nested class instead. addNewFunction also would now return the name of the function registered (class-qualified, if it would be inlined to a nested class), so that the caller of the function can call it even if inlined to a different class. Additional nested classes are generated if the threshold is met subsequent times. These nested classes are instantiated and declared at the bottom of the generated outer class.

Because private nested classes have access to the outer class's global state, but their functions and local state do not count towards the outer class's Constant Pool, and that they can be instantiated in the same outer class without the need to declare additional classes and handle the dependency injection, they seem to be a good candidate to solve this particular issue.

One key quality of this patch is that the common path for code generation remains unaffected. The 1600k threshold necessary to split of a nested class should only be exceeded in scenarios where the schema is extremely large. Generated code for most use cases will still be inlined entirely to the single outer class.

One other feature of this patch is the compaction of primitive declarations and simply-assigned object declarations (objects initialized to null, or not initialized at all) into arrays. Code generation can sometimes produce an excess of global state, so we reduce the number of declarations that would count towards the Constant Limit by creating arrays such that for a given variable of common type and declaration, it can be mapped to a specific index of an array. Using arrays of mutable state, the number of variables in global state is less constrained by the constant pool limit, but by the maximum size of an array.

This patch splits code (only code registered through the addNewFunction call) among the outer class and nested classes, and creates arrays for compactable mutable state like below:

     /* 6 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
     /* 7 */    // Global "mutable" state
     /* 8 */    private Object[] references;
     /* 9 */    private UnsafeRow result;
    /* 10 */    private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
    /* 11 */    private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
    /* 12 */    UTF8String[] mutableStateArray1 = new UTF8String[24003]; // compacted states
    /* 13 */    java.lang.STring[] mutableStateArray = new java.lang.String[8001];
    /* 14 */    int[] mutableStateArray3 = new int[8000];
    /* 15 */    boolean[] mutableStateArray2 = new boolean[32004];
    /* 16 */
    /* 17 */    public SpecificUnsafeProjection(Object[] references) {
    /* 18 */        this.references = references;
    /* 19 */        result = new UnsafeRow(8001);
    /* 20 */        holder = new org.apache.spark.sql.catalyst.expressions.codegen.BuffereHolder(result, 256032);
    /* 21 */        rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 8001);
    /* 22 */        // array initializations for compacted states
    /* 23 */        for (int i = 0; i < mutableStateArray1.length; i++) {
    /* 24 */            mutableStateArray1[i] = null;
    /* 25 */        }
    /* 26 */
    /* 27 */        for (int i = 0; i < mutableStateArray2.length; i++) {
    /* 28 */            mutableStateArray2[i] = false;
    /* 29 */        }
    /* 30 */     }
    /* 31 */     // method inlined to outer class
    /* 32 */     private void evalIfCondExpr50(InternalRow i) {
    /* 33 */         boolean isNull703 = i.isNullAt(0);
    /* 34 */         org.apache.spark.sql.Row value703 = isNull703 ? null : 
    /* 35 */             ((org.apache.spark.sql.Row)i.get(0, null));
    /* 36 */
    /* 37 */         if (isNull703) {
    /* 38 */             throw new NullPointerException(((java.lang.String) references[200]));
    /* 39 */         }
    /* 40 */         boolean isNull701 = true;
    /* 41 */         boolean value701 = false;
    /* 42 */         if (!false) {
    /* 43 */
    /* 44 */             mutableStateArray3[49] = 50;
    /* 45 */
    /* 46 */             isNull701 = false;
    /* 47 */             if (!isNull701) {
    /* 48 */                 value701 = value703.isNullAt(mutableStateArray3[49]);
    /* 49 */             }
    /* 50 */         }  // examples of mutableStateArray aliases for primitive global state
    /* 51 */         mutableStateArray2[200] = isNull701;
    /* 52 */         mutableStateArray2[201] = value701;
    /* 53 */     }
     ...              
 /* 70344 */     public UnsafeRow apply(InternalRow i) {
 /* 70345 */         nestedClassInstance5.apply589_0(i); // referencing a nested class method
     ...
 /* 70398 */         return result;
 /* 70399 */     }
 /* 70400 */     // Instantiation of nested classes
 /* 70401 */     private NestedClass5 nestedClassInstance5 = new NestedClass();
 /* 70402 */     private NestedClass4 nestedClassInstance4 = new NestedClass();
     ...
 /* 70406 */     private NestedClass nestedClassInstance = new NestedClass();
 /* 70407 */     // Declaration of a nested class
 /* 70408 */     private NestedClass5 {
 /* 70409 */         // Code inlined to a nested class
 /* 70410 */         private void apply519_0(InternalRow fieldName1105) {
     ...
/* 340829 */     } // end of last nested class
/* 348030 */ } // end of the outer class

How was this patch tested?

Added a new test to the DataframeComplexTypeSuite that tests converting a large structure to a dataset. Ran full regression tests across every module.

@hvanhovell
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 20, 2017

Test build #71730 has finished for PR 16648 at commit 85e81ed.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@mkiedys
Copy link

mkiedys commented Feb 23, 2017

This patch doesn't seem to be working:

org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection$NestedClass8 has grown past JVM limit of 0xFFFF

@mkiedys
Copy link

mkiedys commented Feb 23, 2017

Steps to replicate:

val schema = StructType(
  (0 to 8000).map(n  StructField(s"column_$n", StringType))
)

val values = schema.map(_  null)
val rows = spark.sparkContext.parallelize(Seq(Row(values:_*)))
val frame = spark.sqlContext.createDataFrame(rows, schema)

frame
  .write
  .format("parquet")
  .save("/tmp/test")

@bdrillard
Copy link
Author

bdrillard commented Feb 23, 2017

Thanks for that other test case. The one you provide I would say falls in the same class of error, however, this patch is still capable of addressing some others that still exist. While class-splitting is capable of handling more complex schemas (ones that are reliant on object creation like for JavaBean's and Avro), there are still instances where the shear number of variables can still blow the constant pool limit. In particular, if an enormous amount of mutable state is kicked up into the outer class. In spark 2.0.x, it was previously the case that global mutable state was used more sparingly, however, (as is more directly the case for your test case) there are instances where conditional expressions produce an enormous amount of mutable state (see SPARK-18091 for a recent change that can produce a great degree of mutable state). In your test case, the shear amount of mutable state generated for conditional null-checks is already over 65,536.

One strategy might be to create a cache of excess mutable state only when the volume of mutable state threatens to breach the constant pool limit. Some type of cache (perhaps even just a simple array in the outer class) still accessible to the outer and nested classes would allow us to both keep code between classes within limits, and also keep the amount of mutable state in the outer class manageable. I thought such a caching scheme was a bit out of the scope of this look into class-splitting.

Other strategies may exist for addressing the constant pool limit for the existing code-generation scheme, but I don't see how they fit quite as well given Catalysts proclivity for generating a single large class for each of its operations.

I'd be glad to look into an implementation of such an excess-mutable state caching scheme if it'd help this PR along by addressing a wider range of use cases.

Thanks again for the case, I'm glad to hear thoughts on this issue.

@vitillo
Copy link
Contributor

vitillo commented Mar 9, 2017

@bdrillard Is there a particular reason why this patch hasn't been looked at yet? I think you should CC some of the authors of the code you have changed to speed things up.

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74341 has finished for PR 16648 at commit abfd06f.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

val frame = spark.sqlContext.createDataFrame(rows, schema)

frame.show()
// .write
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to remove these commented lines before the merge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I had just cleaned up that test to match automated the Scala stylechecks.

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74345 has finished for PR 16648 at commit 187d5c3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74346 has finished for PR 16648 at commit 2bf183e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bdrillard
Copy link
Author

bdrillard commented Mar 10, 2017

I've made some changes to this PR to address @mkiedys comments, and I'm using their test case, as it sets a higher bar for both class splitting and management of mutable state. Mutable state and its initialization seems to create a significant potential limitation for the size of schemas that can be marshaled to datasets. Not only is it possible for the amount of private variables required by mutable state to themselves grow beyond 2^16, but the initialization functions, which include references to that state, when inlined to the main outerclass, also puts significant strain towards the Constant Pool limit. The strategy I attempt to implement, including class splitting, as already mentioned above, is to 'compact' mutable state of primitives and simply-assigned objects into bounded arrays that can be initialized with simple loops rather than large init functions.

final ${ctx.javaType(dataType)} ${ev.value} = $partitionMaskTerm + $countTerm;
$countTerm++;""", isNull = "false")
final ${ctx.javaType(dataType)} ${ev.value} = $partitionMaskTermAccessor + $countTermAccessor;
$countTermAccessor++;""", isNull = "false")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Accessor suffixes to variable names add quite a bit of noise in this PR. What value do they add from your perspective?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having addMutableState return an accessor string is an important part of addressing the manner in which mutable state can contribute to Constant Pool errors. Code that creates mutable state usually takes for granted that the symbol used to declare the state will be inlined as a private member variable to the class. However, for sufficiently complicated schemas, mutable state and its initialization alone can breach the Constant Pool limit. The strategy I settled on was to have mutable state potentially be compacted into arrays of like type and initialization, this way, we can reduce the number of references that would count to the constant pool limit. Of course, if the mutable state is stored in an array, rather than in a private variable named after the symbol, we need to return back the accessor for that index in the compacted mutable state array, hence the 'accessor' suffixes. I had also tried a class-based approach, in which excess mutable state could become static members of nested classes, initialization functions for the state could still exceed the constant pool limit.

This PR can be condensed to two core components to approach a solution to the (hard-and-fast) Constant Pool limit:

  • split excess code among classes
  • compact excess mutable state into arrays

I should mention, not all mutable state is compacted into arrays. Only primitives and collections of simply-assigned objects (null assigned, or no assignment). But this array compaction strategy reduces references enough to allow even complex schemas in which we would potentially generate much more state than 2^16 to still be converted to datasets.

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74348 has finished for PR 16648 at commit 635535e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75399 has finished for PR 16648 at commit 7928853.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75585 has finished for PR 16648 at commit 320db91.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@ethanyxu
Copy link

I encountered this Exception when handling a data frame with 3000+ columns. I hope this patch got resolved soon.

@robert3005
Copy link
Contributor

@bdrillard if you don't have time to finish this up I am happy to update this to latest. I would really like to see this fixed since it's silly that you can't have more than 3k columns

@ethanyxu
Copy link

ethanyxu commented May 6, 2017

Just wanted to mention this is a blocker for using most of the pipeline transformers for wide data frames, which is sad since 3000 columns (my use case) is not really very large.

@HyukjinKwon
Copy link
Member

gentle ping @bdrillard

@bdrillard
Copy link
Author

@HyukjinKwon @robert3005 I'll have some time soon to update this PR for the latest master. Thanks for the interest. It is a non-trivial change and would require a comprehensive code review.

@HyukjinKwon
Copy link
Member

Yea, I just wanted to make sure this is on progress in any way.

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76816 has finished for PR 16648 at commit f5f8f5c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76818 has finished for PR 16648 at commit f35ef24.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76822 has finished for PR 16648 at commit fa95364.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 16, 2017

Test build #76978 has started for PR 16648 at commit 8ce47fa.

@kiszk
Copy link
Member

kiszk commented May 19, 2017

As written in the comment, this PR enables the following two features. Current generated code in the description seem to show only feature 1. Would it be possible to update code to include features 1 and 2?

  1. split excess code among classes
  2. compact excess mutable state into arrays

@bdrillard
Copy link
Author

bdrillard commented May 19, 2017

@kiszk, I've updated the pull-request description to include example code generation for mutable state compaction as well (which comes from inspecting the generated code for the test case for SPARK-18016).

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #77102 has finished for PR 16648 at commit 5a4a39c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* variable is inlined to the class, or an array access if the variable is to be stored
* in an array of variables of the same type and initialization.
*/
def addMutableState(
Copy link
Member

@kiszk kiszk May 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating generated code. It makes clear.

When I see this method, addMutableState seems to return an array element if a variable is primitive or simple object even if the number of mutable states are small. Does to always use array element lead to performance overhead compared to using instance variables?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk That's a good question. My initial reaction would be to say that any differences would be negligible, since Java will store both member variables of an Object and values of an array in heap, and one usually assumes O(1) access time for an array item (naturally we'd assume O(1) access for a member variable).

But I've tried to take time and do analysis of the bytecode generated under both scenarios, where all data is inlined as member variables, and where data is compacted into arrays. The conservative conclusion is that the difference in performance is negligible. The results and the testing classes can be seen here. Some folks with more knowledge of JVM architecture may catch a flaw in my experiment design, this is my shot at an informative benchmark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that both of order is the same. Beyond that, I am interested in the difference of coefficients. Do you know them?

When I see the benchmark program, I have one question. Is there any reason to measure the performance by only one trial. IIUC, in general, Java performance is measured after warmup run due to the existence of Just-In-Time compiler.
Here is a good example.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example on creating a more representative timing profile. I'll rework the experiment to programmatically run several trials and account for warmup, etc. I had run several trials each with similar results, but we may as well make a more robust profiling driver that can run far more trials.

As for calculating the coefficients of the algorithm directly, I have not yet done so. I'll have to perform analysis of the bytecode I generated (and included in the repo) to provide a more satisfying answer.

@kiszk
Copy link
Member

kiszk commented May 21, 2017

@bdrillard Can we split this PR into two smaller PRs?

  1. split excess code among classes
  2. compact excess mutable state into arrays

IIUC, addMutableState() does return a new variable in a String only when 1. is applied. It would make easier to review changes. Since to return a String by addMutableState() make diff complicated. Am I correct?

What do you think?

@bdrillard
Copy link
Author

@kiszk We could do that, definitely. Changes in Feature 1 (splitting excess code among classes) are limited to the CodeGeneration class, and the few Generate... classes included with CodeGeneration. The changes of Feature 2 begin in the CodeGeneration class, but then impact any class that calls addMutableState.

I can either do two separate PRs, or I can refactor the commit history such that there are two commits that can be viewed separately in this same PR, one for each feature.

The only reason I'd hesitate to open two separate PRs (and so try to get the commit history to separate the two major features as two commits), is that code generation and state proliferation are tightly coupled: the test case that I include, SPARK-18016, cannot be resolved with just splitting into nested sub-classes or by compacting mutable state -- both features are required. Any class that would require sub-classes would probably also require mutable state compaction (and vice-versa).

So I'd suggest the strategy where we have a commit for Feature 1 and a commit for Feature 2. That way in the PR we can choose to only view one commit at a time during review, but the code is still tested against the more representative test case. Would that be alright?

@kiszk
Copy link
Member

kiszk commented May 23, 2017

I see. I understand two facts.

  1. We can split this into two changes from the implementation view.
  2. We cannot fix the test case by only one change.

In my personal opinion, it would be good to split this into two PRs while I understand Fact 2. IIUC, the review is performed based on each PR, not each commit. To split it into two PRs would make review easier.
What do you think?

If you make the first PR, you could put the test case with smaller columns. Then, when you will make another PR, you could update it with larger columns.

@bdrillard
Copy link
Author

@kiszk Sure, I'm glad to help make this change easier to review. I'll first make a PR that focuses on code splitting into nested classes. There should be a test case with a number of columns that is high enough to induce at least one instance of generating a nested class, but still low enough to not trigger failure based on the amount of global mutable state. If that PR is reviewed and is considered acceptable, we can open a second to address global mutable state.

@bdrillard
Copy link
Author

I've created the first part of a pair of PRs to help make this review easier. Please see #18075 for a PR of the first feature (class splitting of excess code into nested sub-classes). If that PR is considered acceptable, we can address the second feature.

attn: @kiszk

@cloud-fan
Copy link
Contributor

So this PR introduces 2 approaches to work around the Constant Pool Limit: 1. put member variables to inner class 2: compact primitive declarations into arrays.

It looks to me that either of them can solve the problem, do we really need both of them?

@bdrillard
Copy link
Author

bdrillard commented Jun 1, 2017

@cloud-fan Good question, and I think we can resolve it by using different values for N in the testcase I have in the other PR (which will translate to a number of string columns deeper in the test). At N = 4000, we have a threshold where the amount of local state + global state would trigger a JaninoRuntimeException on the constant pool. #18075 can fix that issue at N = 4000 by beginning to inline functions to nested classes, thus reducing the amount of items counting towards the constant pool, but we note that #18075 does nothing to address global state.

We should also note that #18075 does slightly more than putting just member variables into nested classes. While it is true that a significant degree of local state alone that would get inlined to the Outer Class gets inlined instead to nested classes instead, the patch leads to even more reductions in the size of the constant pool, since there are additional items that get inlined to nested classes that also count towards the limit (e.g. field references, method references, variable types, method types, etc, see Java class file, The Constant Pool).

The second feature (included here, but not in #18075), is precisely as you describe: it takes simply-declared fields that would be inlined globally and compacts them into like-typed and like-declared arrays.

However, if we set N = 8000 (even assuming the patch in #18075), we can trigger yet another JaninoRuntimeException, this time because the amount of global state (plus any local state that was inlined to the Outer Class and not any single subclass) is sufficiently great to cause the exception. If we include mutable state compaction and class splitting, we can set N to a value greater than 10,000 (I had success for the test still at N = 12000, but at 16,000 my machine began to thrash).

Conversely, if we only include mutable state compaction at N = 8000, and exclude class splitting, there are instances where we actually end up with very little global state, but the amount of local state and functions inlined to the Outer Class is still sufficient to exceed the constant pool limit and trigger the same exception. This can occur if we have a great number of primitive columns, like N = 8000 integer columns.

Looking at both #18075 and this pull-request, I think the takeaway is that even if all we do for the moment is split excess code among nested classes, we can still make a significant gain in the number of columns a Dataset can hold, which gives #18075 merit on its own. If we want to increase that limit even more though, we'll have to address proliferation of global state as well, perhaps by opening a follow-up PR that focuses on it more closely, maybe using the compaction strategy I've attempted here in #16648, or by exploring another method.

Thoughts on this?

@cloud-fan
Copy link
Contributor

thanks, this makes a lot sense! I'll review #18075

asfgit pushed a commit that referenced this pull request Jun 15, 2017
…lass Splitting

## What changes were proposed in this pull request?

This pull-request exclusively includes the class splitting feature described in #16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes:

1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes.
2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used.
3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class.

## How was this patch tested?

Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool.

Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in #16648): excess global mutable state. A second PR may be opened to resolve that issue.

Author: ALeksander Eskilson <alek.eskilson@cerner.com>

Closes #18075 from bdrillard/class_splitting_only.
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
…lass Splitting

## What changes were proposed in this pull request?

This pull-request exclusively includes the class splitting feature described in apache#16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes:

1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes.
2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used.
3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class.

## How was this patch tested?

Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool.

Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in apache#16648): excess global mutable state. A second PR may be opened to resolve that issue.

Author: ALeksander Eskilson <alek.eskilson@cerner.com>

Closes apache#18075 from bdrillard/class_splitting_only.
@kiszk
Copy link
Member

kiszk commented Jul 28, 2017

ping @bdrillard for the 2nd part of this PR

@bdrillard
Copy link
Author

Thanks @kiszk, I'll work on preparing a PR for the second half of this issue.

@kiszk
Copy link
Member

kiszk commented Aug 15, 2017

kindly ping @bdrillard

1 similar comment
@kiszk
Copy link
Member

kiszk commented Sep 8, 2017

kindly ping @bdrillard

@kiszk
Copy link
Member

kiszk commented Sep 28, 2017

@bdrillard gentle ping

@bdrillard
Copy link
Author

I'm blocking out time to prepare the part 2 PR for this issue starting today over this week, regarding compaction of excess primitive state.

cc: @kiszk

@kiszk
Copy link
Member

kiszk commented Oct 10, 2017

@bdrillard Thank you very much

@bdrillard
Copy link
Author

@kiszk please see #19518 for part 2 of this original PR, and thanks!

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@bdrillard
Copy link
Author

This PR was addressed in #18075.

@bdrillard bdrillard closed this May 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet