-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flink-5734] code generation for normalizedkey sorter #3511
Changes from 50 commits
6fa431b
e302747
321acdc
8675db8
e1abaad
0c765b1
fd67e14
2f0e082
64d73c9
ade1d60
f1398ba
850930f
ccc0ed0
6c28a63
2e0fc90
5cb9945
576bcb0
6ad0669
dc3bf6e
5037cfb
602543c
60c776c
5ec5197
3265aef
a643eb9
48f46ad
5b0bae0
41b244f
af35766
b6d6462
c723ffd
fd0a6f7
530a6ff
79c4b76
1028f5a
5ccb169
196f42f
7346ae8
82ef38e
77d5e06
2275538
7341e9e
b41f71f
fec0278
33cbc36
54b3fcb
b8f1e53
36419be
84cb9a8
db14ac5
41fd844
d12c55f
1c9830f
909b59e
cce40c5
5e31cf0
b2bf565
9016cce
ff3a35e
157cd4c
c9705fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -204,6 +204,18 @@ under the License. | |
<artifactId>reflections</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.freemarker</groupId> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use latest version?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
<artifactId>freemarker</artifactId> | ||
<version>2.3.26-incubating</version> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.codehaus.janino</groupId> | ||
<artifactId>janino</artifactId> | ||
<version>${janino.version}</version> | ||
</dependency> | ||
|
||
</dependencies> | ||
|
||
<build> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.flink.runtime.codegeneration; | ||
|
||
import freemarker.template.Version; | ||
import org.apache.flink.api.common.ExecutionConfig; | ||
import org.apache.flink.api.common.typeutils.TypeComparator; | ||
import org.apache.flink.api.common.typeutils.TypeSerializer; | ||
import org.apache.flink.core.memory.MemorySegment; | ||
import org.apache.flink.runtime.operators.sort.FixedLengthRecordSorter; | ||
import org.apache.flink.runtime.operators.sort.InMemorySorter; | ||
import org.apache.flink.runtime.operators.sort.NormalizedKeySorter; | ||
|
||
import freemarker.template.Configuration; | ||
import freemarker.template.Template; | ||
import freemarker.template.TemplateException; | ||
import freemarker.template.TemplateExceptionHandler; | ||
import org.codehaus.commons.compiler.CompileException; | ||
import org.codehaus.janino.SimpleCompiler; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import java.io.IOException; | ||
import java.io.StringWriter; | ||
import java.lang.reflect.Constructor; | ||
import java.lang.reflect.InvocationTargetException; | ||
import java.util.HashMap; | ||
import java.util.List; | ||
|
||
/** | ||
* {@link SorterFactory} is a singleton class that provides functionalities to create the most suitable sorter | ||
* for underlying data based on {@link TypeComparator}. | ||
* Note: the generated code can be inspected by configuring Janino to write the code that is being compiled | ||
* to a file, see http://janino-compiler.github.io/janino/#debugging | ||
*/ | ||
public class SorterFactory { | ||
// ------------------------------------------------------------------------ | ||
// Constants | ||
// ------------------------------------------------------------------------ | ||
private static final Logger LOG = LoggerFactory.getLogger(SorterFactory.class); | ||
|
||
/** Fixed length records with a length below this threshold will be in-place sorted, if possible. */ | ||
private static final int THRESHOLD_FOR_IN_PLACE_SORTING = 32; | ||
|
||
// ------------------------------------------------------------------------ | ||
// Singleton Attribute | ||
// ------------------------------------------------------------------------ | ||
private static SorterFactory sorterFactory; | ||
|
||
// ------------------------------------------------------------------------ | ||
// Attributes | ||
// ------------------------------------------------------------------------ | ||
private SimpleCompiler classCompiler; | ||
private HashMap<String, Constructor> constructorCache; | ||
private final Template template; | ||
|
||
/** | ||
* This is only for testing. If an error occurs, we want to fail the test, instead of falling back | ||
* to a non-generated sorter. | ||
*/ | ||
public boolean forceCodeGeneration = false; | ||
|
||
/** | ||
* Constructor. | ||
*/ | ||
private SorterFactory() { | ||
this.classCompiler = new SimpleCompiler(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should set the classloader of the compiler as the usercode classloader to ensure that the generated classes are cleaned up when the job terminates. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for raising this issue. Actually it's a bit more complicated than just setting the parent classloader, because of the caching of the generated classes. The problem is that the
This will ensure that we don't try to reuse generated classes from a previous job, since the classloader will be different across jobs, so we will have different keys. And the cache won't keep anything alive, since it will only have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a bit tricky, I hope I haven't messed it up. I will think about this a bit more in the next few days, and maybe do some more testing, to see that we are not keeping alive anything from past jobs. Unfortunately, I don't know how this could be tested in an automated way. I think I will test it manually by just submitting hundreds of jobs, and watching in a profiler that object counts are not growing. |
||
this.constructorCache = new HashMap<>(); | ||
Configuration templateConf; | ||
templateConf = new Configuration(new Version(2,3,26)); | ||
templateConf.setClassForTemplateLoading(SorterFactory.class, "/templates"); | ||
templateConf.setDefaultEncoding("UTF-8"); | ||
templateConf.setTemplateExceptionHandler(TemplateExceptionHandler.RETHROW_HANDLER); | ||
try { | ||
template = templateConf.getTemplate(SorterTemplateModel.TEMPLATE_NAME); | ||
} catch (IOException e) { | ||
throw new RuntimeException("Couldn't read sorter template.", e); | ||
} | ||
} | ||
|
||
/** | ||
* A method to get a singleton instance | ||
* or create one if it hasn't been created yet. | ||
* @return | ||
*/ | ||
public static synchronized SorterFactory getInstance() { | ||
if (sorterFactory == null){ | ||
sorterFactory = new SorterFactory(); | ||
} | ||
|
||
return sorterFactory; | ||
} | ||
|
||
|
||
/** | ||
* Create a sorter for the given type comparator and | ||
* assign serializer, comparator and memory to the sorter. | ||
* @param serializer | ||
* @param comparator | ||
* @param memory | ||
* @return | ||
*/ | ||
public <T> InMemorySorter<T> createSorter(ExecutionConfig config, TypeSerializer<T> serializer, | ||
TypeComparator<T> comparator, List<MemorySegment> memory) { | ||
if (config.isCodeGenerationForSortersEnabled()){ | ||
try { | ||
return createCodegenSorter(serializer, comparator, memory); | ||
} catch (IOException | TemplateException | ClassNotFoundException | IllegalAccessException | | ||
InstantiationException | NoSuchMethodException | InvocationTargetException | CompileException e) { | ||
|
||
String msg = "Serializer: " + serializer + | ||
"[" + serializer + "], comparator: [" + comparator + "], exception: " + e.toString(); | ||
if (!forceCodeGeneration) { | ||
LOG.warn("An error occurred while trying to create a code-generated sorter. " + | ||
"Using non-codegen sorter instead. " + msg); | ||
return createNonCodegenSorter(serializer, comparator, memory); | ||
} else { | ||
throw new RuntimeException("An error occurred while trying to create a code-generated sorter. " + | ||
"Failing the job, because forceCodeGeneration is true. " + msg); | ||
} | ||
} | ||
} else { | ||
return createNonCodegenSorter(serializer, comparator, memory); | ||
} | ||
} | ||
|
||
private <T> InMemorySorter<T> createCodegenSorter(TypeSerializer<T> serializer, TypeComparator<T> comparator, | ||
List<MemorySegment> memory) | ||
throws IOException, TemplateException, ClassNotFoundException, IllegalAccessException, | ||
InstantiationException, NoSuchMethodException, InvocationTargetException, CompileException { | ||
SorterTemplateModel sorterModel = new SorterTemplateModel(comparator); | ||
|
||
Constructor sorterConstructor; | ||
|
||
synchronized (this){ | ||
if (constructorCache.getOrDefault(sorterModel.getSorterName(), null) != null) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now that we use a |
||
sorterConstructor = constructorCache.get(sorterModel.getSorterName()); | ||
} else { | ||
String sorterName = sorterModel.getSorterName(); | ||
|
||
StringWriter generatedCodeWriter = new StringWriter(); | ||
template.process(sorterModel.getTemplateVariables(), generatedCodeWriter); | ||
|
||
this.classCompiler.cook(generatedCodeWriter.toString()); | ||
|
||
sorterConstructor = this.classCompiler.getClassLoader().loadClass(sorterName).getConstructor( | ||
TypeSerializer.class, TypeComparator.class, List.class | ||
); | ||
|
||
constructorCache.put(sorterName, sorterConstructor); | ||
} | ||
} | ||
|
||
@SuppressWarnings("unchecked") | ||
InMemorySorter<T> sorter = (InMemorySorter<T>) sorterConstructor.newInstance(serializer, comparator, memory); | ||
|
||
if (LOG.isInfoEnabled()){ | ||
LOG.info("Using a custom sorter : " + sorter.toString()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
} | ||
|
||
return sorter; | ||
} | ||
|
||
private <T> InMemorySorter<T> createNonCodegenSorter(TypeSerializer<T> serializer, TypeComparator<T> comparator, | ||
List<MemorySegment> memory) { | ||
InMemorySorter<T> sorter; | ||
// instantiate a fix-length in-place sorter, if possible, otherwise the out-of-place sorter | ||
if (comparator.supportsSerializationWithKeyNormalization() && | ||
serializer.getLength() > 0 && serializer.getLength() <= THRESHOLD_FOR_IN_PLACE_SORTING && | ||
comparator.isNormalizedKeyPrefixOnly(comparator.getNormalizeKeyLen())) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the additional condition is fixing a bug? Shouldn't the condition should be inverted, i.e., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, but currently the
Yes, sorry, I've fixed the condition in b2bf565 . |
||
// Note about the last part of the condition: | ||
// FixedLengthRecordSorter doesn't do an additional check after the bytewise comparison, so | ||
// we cannot choose that if the normalized key doesn't always determine the order. | ||
// (cf. the part of NormalizedKeySorter.compare after the if) | ||
sorter = new FixedLengthRecordSorter<>(serializer, comparator.duplicate(), memory); | ||
} else { | ||
sorter = new NormalizedKeySorter<>(serializer, comparator.duplicate(), memory); | ||
} | ||
return sorter; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to enable this here but just for curiosity, how much does this increase the build time for
flink-gelly
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I'm not sure whether this is a good way to get this estimation.
I estimated the build time by running all tests from
flink-gelly-examples
inside IntelliJ IDEA.With
CLUSTER_WITH_CODEGENERATION_ENABLED
:2m 20s
Without
CLUSTER_WITH_CODEGENERATION_ENABLED
:1m 27s
Patch for disabling
CLUSTER_WITH_CODEGENERATION_ENABLED
case : https://gist.github.com/heytitle/89961fcaabcf326eadee190b9d6085a6