[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit #16648

bdrillard · 2017-01-19T16:11:57Z

[class_splitting] Code Generation: Constant Pool Limit

What changes were proposed in this pull request?

Supports code generation for large structures that would previously trigger a Constant Pool limit exception, as noted in SPARK-18016. In this fix, when the volume of generated code for the class would exceed 1600k bytes, a new private nested class is declared, and any new functions that would have been inlined to the outer class with an addNewFunction call are inlined to the new nested class instead. addNewFunction also would now return the name of the function registered (class-qualified, if it would be inlined to a nested class), so that the caller of the function can call it even if inlined to a different class. Additional nested classes are generated if the threshold is met subsequent times. These nested classes are instantiated and declared at the bottom of the generated outer class.

Because private nested classes have access to the outer class's global state, but their functions and local state do not count towards the outer class's Constant Pool, and that they can be instantiated in the same outer class without the need to declare additional classes and handle the dependency injection, they seem to be a good candidate to solve this particular issue.

One key quality of this patch is that the common path for code generation remains unaffected. The 1600k threshold necessary to split of a nested class should only be exceeded in scenarios where the schema is extremely large. Generated code for most use cases will still be inlined entirely to the single outer class.

One other feature of this patch is the compaction of primitive declarations and simply-assigned object declarations (objects initialized to null, or not initialized at all) into arrays. Code generation can sometimes produce an excess of global state, so we reduce the number of declarations that would count towards the Constant Limit by creating arrays such that for a given variable of common type and declaration, it can be mapped to a specific index of an array. Using arrays of mutable state, the number of variables in global state is less constrained by the constant pool limit, but by the maximum size of an array.

This patch splits code (only code registered through the addNewFunction call) among the outer class and nested classes, and creates arrays for compactable mutable state like below:

     /* 6 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
     /* 7 */    // Global "mutable" state
     /* 8 */    private Object[] references;
     /* 9 */    private UnsafeRow result;
    /* 10 */    private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
    /* 11 */    private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
    /* 12 */    UTF8String[] mutableStateArray1 = new UTF8String[24003]; // compacted states
    /* 13 */    java.lang.STring[] mutableStateArray = new java.lang.String[8001];
    /* 14 */    int[] mutableStateArray3 = new int[8000];
    /* 15 */    boolean[] mutableStateArray2 = new boolean[32004];
    /* 16 */
    /* 17 */    public SpecificUnsafeProjection(Object[] references) {
    /* 18 */        this.references = references;
    /* 19 */        result = new UnsafeRow(8001);
    /* 20 */        holder = new org.apache.spark.sql.catalyst.expressions.codegen.BuffereHolder(result, 256032);
    /* 21 */        rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 8001);
    /* 22 */        // array initializations for compacted states
    /* 23 */        for (int i = 0; i < mutableStateArray1.length; i++) {
    /* 24 */            mutableStateArray1[i] = null;
    /* 25 */        }
    /* 26 */
    /* 27 */        for (int i = 0; i < mutableStateArray2.length; i++) {
    /* 28 */            mutableStateArray2[i] = false;
    /* 29 */        }
    /* 30 */     }
    /* 31 */     // method inlined to outer class
    /* 32 */     private void evalIfCondExpr50(InternalRow i) {
    /* 33 */         boolean isNull703 = i.isNullAt(0);
    /* 34 */         org.apache.spark.sql.Row value703 = isNull703 ? null : 
    /* 35 */             ((org.apache.spark.sql.Row)i.get(0, null));
    /* 36 */
    /* 37 */         if (isNull703) {
    /* 38 */             throw new NullPointerException(((java.lang.String) references[200]));
    /* 39 */         }
    /* 40 */         boolean isNull701 = true;
    /* 41 */         boolean value701 = false;
    /* 42 */         if (!false) {
    /* 43 */
    /* 44 */             mutableStateArray3[49] = 50;
    /* 45 */
    /* 46 */             isNull701 = false;
    /* 47 */             if (!isNull701) {
    /* 48 */                 value701 = value703.isNullAt(mutableStateArray3[49]);
    /* 49 */             }
    /* 50 */         }  // examples of mutableStateArray aliases for primitive global state
    /* 51 */         mutableStateArray2[200] = isNull701;
    /* 52 */         mutableStateArray2[201] = value701;
    /* 53 */     }
     ...              
 /* 70344 */     public UnsafeRow apply(InternalRow i) {
 /* 70345 */         nestedClassInstance5.apply589_0(i); // referencing a nested class method
     ...
 /* 70398 */         return result;
 /* 70399 */     }
 /* 70400 */     // Instantiation of nested classes
 /* 70401 */     private NestedClass5 nestedClassInstance5 = new NestedClass();
 /* 70402 */     private NestedClass4 nestedClassInstance4 = new NestedClass();
     ...
 /* 70406 */     private NestedClass nestedClassInstance = new NestedClass();
 /* 70407 */     // Declaration of a nested class
 /* 70408 */     private NestedClass5 {
 /* 70409 */         // Code inlined to a nested class
 /* 70410 */         private void apply519_0(InternalRow fieldName1105) {
     ...
/* 340829 */     } // end of last nested class
/* 348030 */ } // end of the outer class

How was this patch tested?

Added a new test to the DataframeComplexTypeSuite that tests converting a large structure to a dataset. Ran full regression tests across every module.

hvanhovell · 2017-01-20T17:20:12Z

ok to test

SparkQA · 2017-01-20T20:24:43Z

Test build #71730 has finished for PR 16648 at commit 85e81ed.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

mkiedys · 2017-02-23T01:39:37Z

This patch doesn't seem to be working:

org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection$NestedClass8 has grown past JVM limit of 0xFFFF

mkiedys · 2017-02-23T02:01:10Z

Steps to replicate:

val schema = StructType(
  (0 to 8000).map(n ⇒ StructField(s"column_$n", StringType))
)

val values = schema.map(_ ⇒ null)
val rows = spark.sparkContext.parallelize(Seq(Row(values:_*)))
val frame = spark.sqlContext.createDataFrame(rows, schema)

frame
  .write
  .format("parquet")
  .save("/tmp/test")

bdrillard · 2017-02-23T15:21:47Z

Thanks for that other test case. The one you provide I would say falls in the same class of error, however, this patch is still capable of addressing some others that still exist. While class-splitting is capable of handling more complex schemas (ones that are reliant on object creation like for JavaBean's and Avro), there are still instances where the shear number of variables can still blow the constant pool limit. In particular, if an enormous amount of mutable state is kicked up into the outer class. In spark 2.0.x, it was previously the case that global mutable state was used more sparingly, however, (as is more directly the case for your test case) there are instances where conditional expressions produce an enormous amount of mutable state (see SPARK-18091 for a recent change that can produce a great degree of mutable state). In your test case, the shear amount of mutable state generated for conditional null-checks is already over 65,536.

One strategy might be to create a cache of excess mutable state only when the volume of mutable state threatens to breach the constant pool limit. Some type of cache (perhaps even just a simple array in the outer class) still accessible to the outer and nested classes would allow us to both keep code between classes within limits, and also keep the amount of mutable state in the outer class manageable. I thought such a caching scheme was a bit out of the scope of this look into class-splitting.

Other strategies may exist for addressing the constant pool limit for the existing code-generation scheme, but I don't see how they fit quite as well given Catalysts proclivity for generating a single large class for each of its operations.

I'd be glad to look into an implementation of such an excess-mutable state caching scheme if it'd help this PR along by addressing a wider range of use cases.

Thanks again for the case, I'm glad to hear thoughts on this issue.

vitillo · 2017-03-09T10:38:38Z

@bdrillard Is there a particular reason why this patch hasn't been looked at yet? I think you should CC some of the authors of the code you have changed to speed things up.

SparkQA · 2017-03-10T21:39:35Z

Test build #74341 has finished for PR 16648 at commit abfd06f.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

mrksmb · 2017-03-10T21:47:54Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

+    val frame = spark.sqlContext.createDataFrame(rows, schema)
+
+    frame.show()
+//      .write


Might be good to remove these commented lines before the merge.

Yep, I had just cleaned up that test to match automated the Scala stylechecks.

SparkQA · 2017-03-10T21:49:23Z

Test build #74345 has finished for PR 16648 at commit 187d5c3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-10T21:54:27Z

Test build #74346 has finished for PR 16648 at commit 2bf183e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

bdrillard · 2017-03-10T21:58:19Z

I've made some changes to this PR to address @mkiedys comments, and I'm using their test case, as it sets a higher bar for both class splitting and management of mutable state. Mutable state and its initialization seems to create a significant potential limitation for the size of schemas that can be marshaled to datasets. Not only is it possible for the amount of private variables required by mutable state to themselves grow beyond 2^16, but the initialization functions, which include references to that state, when inlined to the main outerclass, also puts significant strain towards the Constant Pool limit. The strategy I attempt to implement, including class splitting, as already mentioned above, is to 'compact' mutable state of primitives and simply-assigned objects into bounded arrays that can be initialized with simple loops rather than large init functions.

mrksmb · 2017-03-10T22:15:52Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/expressions/MonotonicallyIncreasingID.scala

-      final ${ctx.javaType(dataType)} ${ev.value} = $partitionMaskTerm + $countTerm;
-      $countTerm++;""", isNull = "false")
+      final ${ctx.javaType(dataType)} ${ev.value} = $partitionMaskTermAccessor + $countTermAccessor;
+      $countTermAccessor++;""", isNull = "false")


The Accessor suffixes to variable names add quite a bit of noise in this PR. What value do they add from your perspective?

Having addMutableState return an accessor string is an important part of addressing the manner in which mutable state can contribute to Constant Pool errors. Code that creates mutable state usually takes for granted that the symbol used to declare the state will be inlined as a private member variable to the class. However, for sufficiently complicated schemas, mutable state and its initialization alone can breach the Constant Pool limit. The strategy I settled on was to have mutable state potentially be compacted into arrays of like type and initialization, this way, we can reduce the number of references that would count to the constant pool limit. Of course, if the mutable state is stored in an array, rather than in a private variable named after the symbol, we need to return back the accessor for that index in the compacted mutable state array, hence the 'accessor' suffixes. I had also tried a class-based approach, in which excess mutable state could become static members of nested classes, initialization functions for the state could still exceed the constant pool limit.

This PR can be condensed to two core components to approach a solution to the (hard-and-fast) Constant Pool limit:

split excess code among classes

compact excess mutable state into arrays

I should mention, not all mutable state is compacted into arrays. Only primitives and collections of simply-assigned objects (null assigned, or no assignment). But this array compaction strategy reduces references enough to allow even complex schemas in which we would potentially generate much more state than 2^16 to still be converted to datasets.

SparkQA · 2017-03-11T00:03:06Z

Test build #74348 has finished for PR 16648 at commit 635535e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-30T20:47:26Z

Test build #75399 has finished for PR 16648 at commit 7928853.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-07T00:24:43Z

Test build #75585 has finished for PR 16648 at commit 320db91.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

ethanyxu · 2017-04-19T18:51:35Z

I encountered this Exception when handling a data frame with 3000+ columns. I hope this patch got resolved soon.

robert3005 · 2017-04-27T22:09:02Z

@bdrillard if you don't have time to finish this up I am happy to update this to latest. I would really like to see this fixed since it's silly that you can't have more than 3k columns

ethanyxu · 2017-05-06T16:18:14Z

Just wanted to mention this is a blocker for using most of the pipeline transformers for wide data frames, which is sad since 3000 columns (my use case) is not really very large.

HyukjinKwon · 2017-05-11T14:13:26Z

gentle ping @bdrillard

bdrillard · 2017-05-11T14:18:03Z

@HyukjinKwon @robert3005 I'll have some time soon to update this PR for the latest master. Thanks for the interest. It is a non-trivial change and would require a comprehensive code review.

HyukjinKwon · 2017-05-11T14:19:51Z

Yea, I just wanted to make sure this is on progress in any way.

SparkQA · 2017-05-11T16:14:19Z

Test build #76816 has finished for PR 16648 at commit f5f8f5c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-11T16:34:23Z

Test build #76818 has finished for PR 16648 at commit f35ef24.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-11T20:07:09Z

Test build #76822 has finished for PR 16648 at commit fa95364.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-16T18:24:24Z

Test build #76978 has started for PR 16648 at commit 8ce47fa.

kiszk · 2017-05-19T13:38:10Z

As written in the comment, this PR enables the following two features. Current generated code in the description seem to show only feature 1. Would it be possible to update code to include features 1 and 2?

split excess code among classes
compact excess mutable state into arrays

bdrillard · 2017-05-19T18:45:23Z

@kiszk, I've updated the pull-request description to include example code generation for mutable state compaction as well (which comes from inspecting the generated code for the test case for SPARK-18016).

SparkQA · 2017-05-19T21:37:54Z

Test build #77102 has finished for PR 16648 at commit 5a4a39c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-05-20T05:45:56Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+   *         variable is inlined to the class, or an array access if the variable is to be stored
+   *         in an array of variables of the same type and initialization.
+   */
+  def addMutableState(


Thank you for updating generated code. It makes clear.

When I see this method, addMutableState seems to return an array element if a variable is primitive or simple object even if the number of mutable states are small. Does to always use array element lead to performance overhead compared to using instance variables?

@kiszk That's a good question. My initial reaction would be to say that any differences would be negligible, since Java will store both member variables of an Object and values of an array in heap, and one usually assumes O(1) access time for an array item (naturally we'd assume O(1) access for a member variable).

But I've tried to take time and do analysis of the bytecode generated under both scenarios, where all data is inlined as member variables, and where data is compacted into arrays. The conservative conclusion is that the difference in performance is negligible. The results and the testing classes can be seen here. Some folks with more knowledge of JVM architecture may catch a flaw in my experiment design, this is my shot at an informative benchmark.

I agree that both of order is the same. Beyond that, I am interested in the difference of coefficients. Do you know them?

When I see the benchmark program, I have one question. Is there any reason to measure the performance by only one trial. IIUC, in general, Java performance is measured after warmup run due to the existence of Just-In-Time compiler.
Here is a good example.

What do you think?

Thanks for the example on creating a more representative timing profile. I'll rework the experiment to programmatically run several trials and account for warmup, etc. I had run several trials each with similar results, but we may as well make a more robust profiling driver that can run far more trials.

As for calculating the coefficients of the algorithm directly, I have not yet done so. I'll have to perform analysis of the bytecode I generated (and included in the repo) to provide a more satisfying answer.

kiszk · 2017-05-21T14:00:29Z

@bdrillard Can we split this PR into two smaller PRs?

split excess code among classes
compact excess mutable state into arrays

IIUC, addMutableState() does return a new variable in a String only when 1. is applied. It would make easier to review changes. Since to return a String by addMutableState() make diff complicated. Am I correct?

What do you think?

bdrillard · 2017-05-22T21:06:18Z

@kiszk We could do that, definitely. Changes in Feature 1 (splitting excess code among classes) are limited to the CodeGeneration class, and the few Generate... classes included with CodeGeneration. The changes of Feature 2 begin in the CodeGeneration class, but then impact any class that calls addMutableState.

I can either do two separate PRs, or I can refactor the commit history such that there are two commits that can be viewed separately in this same PR, one for each feature.

The only reason I'd hesitate to open two separate PRs (and so try to get the commit history to separate the two major features as two commits), is that code generation and state proliferation are tightly coupled: the test case that I include, SPARK-18016, cannot be resolved with just splitting into nested sub-classes or by compacting mutable state -- both features are required. Any class that would require sub-classes would probably also require mutable state compaction (and vice-versa).

So I'd suggest the strategy where we have a commit for Feature 1 and a commit for Feature 2. That way in the PR we can choose to only view one commit at a time during review, but the code is still tested against the more representative test case. Would that be alright?

kiszk · 2017-05-23T16:47:17Z

I see. I understand two facts.

We can split this into two changes from the implementation view.
We cannot fix the test case by only one change.

In my personal opinion, it would be good to split this into two PRs while I understand Fact 2. IIUC, the review is performed based on each PR, not each commit. To split it into two PRs would make review easier.
What do you think?

If you make the first PR, you could put the test case with smaller columns. Then, when you will make another PR, you could update it with larger columns.

bdrillard · 2017-05-23T17:10:28Z

@kiszk Sure, I'm glad to help make this change easier to review. I'll first make a PR that focuses on code splitting into nested classes. There should be a test case with a number of columns that is high enough to induce at least one instance of generating a nested class, but still low enough to not trigger failure based on the amount of global mutable state. If that PR is reviewed and is considered acceptable, we can open a second to address global mutable state.

bdrillard · 2017-05-23T20:44:40Z

I've created the first part of a pair of PRs to help make this review easier. Please see #18075 for a PR of the first feature (class splitting of excess code into nested sub-classes). If that PR is considered acceptable, we can address the second feature.

attn: @kiszk

cloud-fan · 2017-06-01T06:18:16Z

So this PR introduces 2 approaches to work around the Constant Pool Limit: 1. put member variables to inner class 2: compact primitive declarations into arrays.

It looks to me that either of them can solve the problem, do we really need both of them?

bdrillard · 2017-06-01T18:39:08Z

@cloud-fan Good question, and I think we can resolve it by using different values for N in the testcase I have in the other PR (which will translate to a number of string columns deeper in the test). At N = 4000, we have a threshold where the amount of local state + global state would trigger a JaninoRuntimeException on the constant pool. #18075 can fix that issue at N = 4000 by beginning to inline functions to nested classes, thus reducing the amount of items counting towards the constant pool, but we note that #18075 does nothing to address global state.

We should also note that #18075 does slightly more than putting just member variables into nested classes. While it is true that a significant degree of local state alone that would get inlined to the Outer Class gets inlined instead to nested classes instead, the patch leads to even more reductions in the size of the constant pool, since there are additional items that get inlined to nested classes that also count towards the limit (e.g. field references, method references, variable types, method types, etc, see Java class file, The Constant Pool).

The second feature (included here, but not in #18075), is precisely as you describe: it takes simply-declared fields that would be inlined globally and compacts them into like-typed and like-declared arrays.

However, if we set N = 8000 (even assuming the patch in #18075), we can trigger yet another JaninoRuntimeException, this time because the amount of global state (plus any local state that was inlined to the Outer Class and not any single subclass) is sufficiently great to cause the exception. If we include mutable state compaction and class splitting, we can set N to a value greater than 10,000 (I had success for the test still at N = 12000, but at 16,000 my machine began to thrash).

Conversely, if we only include mutable state compaction at N = 8000, and exclude class splitting, there are instances where we actually end up with very little global state, but the amount of local state and functions inlined to the Outer Class is still sufficient to exceed the constant pool limit and trigger the same exception. This can occur if we have a great number of primitive columns, like N = 8000 integer columns.

Looking at both #18075 and this pull-request, I think the takeaway is that even if all we do for the moment is split excess code among nested classes, we can still make a significant gain in the number of columns a Dataset can hold, which gives #18075 merit on its own. If we want to increase that limit even more though, we'll have to address proliferation of global state as well, perhaps by opening a follow-up PR that focuses on it more closely, maybe using the compaction strategy I've attempted here in #16648, or by exploring another method.

Thoughts on this?

cloud-fan · 2017-06-01T18:52:08Z

thanks, this makes a lot sense! I'll review #18075

…lass Splitting ## What changes were proposed in this pull request? This pull-request exclusively includes the class splitting feature described in #16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes: 1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes. 2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used. 3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class. ## How was this patch tested? Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool. Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in #16648): excess global mutable state. A second PR may be opened to resolve that issue. Author: ALeksander Eskilson <alek.eskilson@cerner.com> Closes #18075 from bdrillard/class_splitting_only.

…lass Splitting ## What changes were proposed in this pull request? This pull-request exclusively includes the class splitting feature described in apache#16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes: 1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes. 2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used. 3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class. ## How was this patch tested? Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool. Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in apache#16648): excess global mutable state. A second PR may be opened to resolve that issue. Author: ALeksander Eskilson <alek.eskilson@cerner.com> Closes apache#18075 from bdrillard/class_splitting_only.

kiszk · 2017-07-28T01:24:54Z

ping @bdrillard for the 2nd part of this PR

bdrillard · 2017-07-28T13:59:35Z

Thanks @kiszk, I'll work on preparing a PR for the second half of this issue.

kiszk · 2017-08-15T17:46:15Z

kindly ping @bdrillard

kiszk · 2017-09-08T18:10:32Z

kindly ping @bdrillard

kiszk · 2017-09-28T13:53:46Z

@bdrillard gentle ping

bdrillard · 2017-10-09T14:29:21Z

I'm blocking out time to prepare the part 2 PR for this issue starting today over this week, regarding compaction of excess primitive state.

cc: @kiszk

kiszk · 2017-10-10T10:59:22Z

@bdrillard Thank you very much

bdrillard · 2017-10-17T18:24:50Z

@kiszk please see #19518 for part 2 of this original PR, and thanks!

AmplabJenkins · 2017-12-14T21:04:54Z

Can one of the admins verify this patch?

bdrillard · 2018-05-17T00:15:00Z

This PR was addressed in #18075.

bdrillard force-pushed the class_splitting branch from 85e81ed to abfd06f Compare March 10, 2017 21:35

bdrillard force-pushed the class_splitting branch from abfd06f to 187d5c3 Compare March 10, 2017 21:43

mrksmb reviewed Mar 10, 2017

View reviewed changes

bdrillard force-pushed the class_splitting branch from a1e5937 to 2bf183e Compare March 10, 2017 21:51

bdrillard force-pushed the class_splitting branch from 51ee31c to 635535e Compare March 10, 2017 21:55

mrksmb reviewed Mar 10, 2017

View reviewed changes

bdrillard force-pushed the class_splitting branch from 320db91 to f5f8f5c Compare May 11, 2017 16:07

glentakahashi mentioned this pull request May 13, 2017

Fix 3000 column limit palantir/spark#182

Closed

bdrillard force-pushed the class_splitting branch from 1c08d1c to 8ce47fa Compare May 16, 2017 18:19

[class_splitting] adding more explicit mutableStateArray comments

5a4a39c

kiszk reviewed May 20, 2017

View reviewed changes

bdrillard mentioned this pull request May 23, 2017

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - Class Splitting #18075

Closed

kiszk mentioned this pull request Oct 19, 2017

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - State Compaction #19518

Closed

bdrillard closed this May 17, 2018

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit #16648

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit #16648

Conversation

bdrillard commented Jan 19, 2017 • edited

What changes were proposed in this pull request?

How was this patch tested?

hvanhovell commented Jan 20, 2017

SparkQA commented Jan 20, 2017

mkiedys commented Feb 23, 2017 • edited

mkiedys commented Feb 23, 2017

bdrillard commented Feb 23, 2017 • edited

vitillo commented Mar 9, 2017

SparkQA commented Mar 10, 2017

mrksmb Mar 10, 2017

Choose a reason for hiding this comment

bdrillard Mar 10, 2017

Choose a reason for hiding this comment

SparkQA commented Mar 10, 2017

SparkQA commented Mar 10, 2017

bdrillard commented Mar 10, 2017 • edited

mrksmb Mar 10, 2017

Choose a reason for hiding this comment

bdrillard Mar 10, 2017

Choose a reason for hiding this comment

SparkQA commented Mar 11, 2017

SparkQA commented Mar 30, 2017

SparkQA commented Apr 7, 2017

ethanyxu commented Apr 19, 2017

robert3005 commented Apr 27, 2017

ethanyxu commented May 6, 2017

HyukjinKwon commented May 11, 2017

bdrillard commented May 11, 2017

HyukjinKwon commented May 11, 2017

SparkQA commented May 11, 2017

SparkQA commented May 11, 2017

SparkQA commented May 11, 2017

SparkQA commented May 16, 2017

kiszk commented May 19, 2017

bdrillard commented May 19, 2017 • edited

SparkQA commented May 19, 2017

kiszk May 20, 2017 • edited

Choose a reason for hiding this comment

bdrillard May 22, 2017

Choose a reason for hiding this comment

kiszk May 23, 2017

Choose a reason for hiding this comment

bdrillard May 23, 2017

Choose a reason for hiding this comment

kiszk commented May 21, 2017

bdrillard commented May 22, 2017

kiszk commented May 23, 2017 • edited

bdrillard commented May 23, 2017

bdrillard commented May 23, 2017

cloud-fan commented Jun 1, 2017

bdrillard commented Jun 1, 2017 • edited

cloud-fan commented Jun 1, 2017

kiszk commented Jul 28, 2017

bdrillard commented Jul 28, 2017

kiszk commented Aug 15, 2017

kiszk commented Sep 8, 2017

kiszk commented Sep 28, 2017

bdrillard commented Oct 9, 2017

kiszk commented Oct 10, 2017

bdrillard commented Oct 17, 2017

AmplabJenkins commented Dec 14, 2017

bdrillard commented May 17, 2018

bdrillard commented Jan 19, 2017 •

edited

mkiedys commented Feb 23, 2017 •

edited

bdrillard commented Feb 23, 2017 •

edited

bdrillard commented Mar 10, 2017 •

edited

bdrillard commented May 19, 2017 •

edited

kiszk May 20, 2017 •

edited

kiszk commented May 23, 2017 •

edited

bdrillard commented Jun 1, 2017 •

edited