Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #1251

Open
wants to merge 171 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
7f8fe38
infer anonymous object type when possible
Oct 14, 2022
7a9d5f9
Spotless
Oct 14, 2022
14e5b77
anonymous type when arity one; refactor method; update tests
Oct 20, 2022
c9b5f74
get sequenceType of LetClause as inferred in InferTypeVisitor
Oct 20, 2022
299867e
native query for cast to float
Oct 20, 2022
2226ae6
remove anonymous type from test to avoid parse failure
Oct 21, 2022
3456d21
add field staticType to LetClause with inferred type
Oct 26, 2022
4427546
invert if statement for cast native query
Oct 27, 2022
8bac0f6
arity of group by variables is not increased
Oct 27, 2022
ce8c426
set castedSequenceType to One when cast input is exactly One
Oct 27, 2022
0e79bef
function inlining visitors (wip)
Nov 1, 2022
76a5aea
function inlining visitors (wip)
Nov 1, 2022
49c2608
variable inlining
Nov 2, 2022
193ce1b
explicit field reference
Nov 2, 2022
498a4ff
refactoring, correct recursion detection
Nov 2, 2022
387f0ff
no variable inlining
Nov 3, 2022
902e99f
inlining add missing visit method
Nov 6, 2022
9e78a46
make namespaces globally accessible
Nov 8, 2022
c4a1a39
type promotion for function inlining
Nov 8, 2022
acc2132
cleanup, spotless
Nov 8, 2022
74358da
cleanup, spotless
Nov 9, 2022
c404cfb
if this == Zero, then check other is variation of zero
Nov 10, 2022
ee2f6ce
abs, size and count spark sql
Nov 15, 2022
180e497
infer item type for array unboxing
Nov 15, 2022
96b3b30
spotless
Nov 15, 2022
ba396c3
complete implementation of FunctionInliningVisitor
Nov 15, 2022
48f2f34
SimpleMapExpression doesn't implement DataFrames
Nov 15, 2022
7832ed6
TypeSwitch only Dataframe if all children are Dataframe
Nov 16, 2022
fb54f53
spotless
Nov 16, 2022
53874c6
spotless
Nov 16, 2022
686e78a
Merge branch 'native-cast-float' into 'master-dominik'
Nov 16, 2022
c3d2377
only change arity when not in varToExclude
Nov 16, 2022
164dc89
Merge branch 'letclausesparkiterator-set-sequencetype' into 'master-d…
Nov 16, 2022
969e0c1
Merge branch 'bugfix-arity' into 'master-dominik'
Nov 16, 2022
f50a2e3
Merge branch 'infer-array-type' into 'master-dominik'
Nov 16, 2022
cf4a852
Merge branch 'bugfix-groupby-arity' into 'master-dominik'
Nov 16, 2022
05e73ae
change dates in test
Nov 16, 2022
3d2b198
change dates in test
Nov 16, 2022
3317c57
Merge branch 'rumble-optimizations' into 'master-dominik'
Nov 16, 2022
1618398
Merge branch 'function-inlining' into 'master-dominik'
Nov 16, 2022
b7e8f5b
3 cast as integer? can infer its type
Nov 16, 2022
28948a7
Merge branch 'master-dominik' into native-spark-queries
Nov 16, 2022
a024064
Merge branch 'casting-infer-arity-one' into 'master-dominik'
Nov 17, 2022
a344ef1
set arity of array type to ZeroOrMore, undo test change
Nov 17, 2022
314d474
Merge branch 'infer-array-type' into 'master-dominik'
Nov 17, 2022
1d13128
if arity one, use value comparison
Nov 17, 2022
ab88f5b
Merge branch 'group-comparison-to-value-comparison' into 'master-domi…
Nov 17, 2022
5f67d77
Merge branch 'master-dominik' into native-spark-queries
Nov 17, 2022
020bef8
unary operation iterator
Nov 18, 2022
e4fedd6
do not apply type promotion for object and array types
Nov 18, 2022
f5e79d6
spotless
Nov 18, 2022
563e19c
size function with input arity one has type integer
Nov 18, 2022
207a847
Merge branch 'master' into master-dominik
Nov 24, 2022
5ed71b4
Merge branch 'master-dominik' into native-spark-queries
Nov 24, 2022
a775cac
native Spark SQL implementations
Nov 24, 2022
fb22abf
use DataFrame for exists
Nov 24, 2022
7d43da9
native array constructor
Nov 25, 2022
5ef10a8
filter over array with short-circuit array count function
Nov 25, 2022
8f95e66
spotless
Nov 25, 2022
5bf6bd3
remove redundant parantheses
Nov 28, 2022
f484a56
use JSONAssert for tests
Nov 28, 2022
82b0bf1
only use temp variables if referenced by param expression
Nov 29, 2022
8514862
Merge branch 'duplicated-parameters' into 'master-dominik'
Nov 29, 2022
4d88eca
Merge branch 'master-dominik' into native-spark-queries
Nov 29, 2022
58bb174
Merge branch 'master' into master-dominik
Nov 30, 2022
de36ab6
Merge branch 'master-dominik' into native-spark-queries
Nov 30, 2022
7f56b71
native implementation of clauses (wip)
Dec 2, 2022
d8ef004
native implementation of clauses (wip)
Dec 19, 2022
8543a21
native implementation of clauses (broken)
Dec 20, 2022
1a18ac6
native implementation of clauses
Dec 23, 2022
664e736
bugfix: duplicate entry for string
Jan 4, 2023
2610e82
native implementation of object creation
Jan 4, 2023
09754f7
native implementation of object creation for merged objects
Jan 4, 2023
9c7ce6c
only assign function params if they have different names
Jan 5, 2023
0d7ebba
allow referencing sequences in native queries
Jan 5, 2023
d1867a8
check for array type
Jan 5, 2023
06acf7b
native empty function iterator
Jan 9, 2023
e0e170e
native object constructor add brackets
Jan 9, 2023
e7ccbb0
native comma expression for objects of same type
Jan 10, 2023
d9a3750
for clause use outer explode
Jan 10, 2023
f4e8ee4
spotless
Jan 10, 2023
5bdd581
improve static typing
Jan 10, 2023
9bc345a
only filter in return clause
Jan 10, 2023
e228f71
use positional value for where clause
Jan 10, 2023
672ada5
ForClause does not need sequence
Jan 12, 2023
63091fc
improve typing and order by
Jan 12, 2023
f0064de
filter null values in dataframes
Jan 13, 2023
ebdb6ce
spotless
Jan 13, 2023
1c24d47
sorting (wip)
Jan 16, 2023
908d324
sorting with partitions
Jan 16, 2023
abaaf68
native min and max of sequence
Jan 16, 2023
2f0dbc5
bugfix in multiply, cast double
Jan 17, 2023
cdcd70f
bugfix in multiply
Jan 17, 2023
3b2bdc7
comma expression child queries not implemented for flwor
Jan 17, 2023
0d13f8a
set clause type to FOR
Jan 18, 2023
4c8f51f
implementation of empty object
Jan 18, 2023
92460dc
rename variables in sql queries
Jan 19, 2023
099d32a
string length native implementation
Jan 19, 2023
29ab0af
native group by
Jan 19, 2023
0cb5395
ignore sort when grouping
Jan 20, 2023
17306c4
native sql count clause
Jan 20, 2023
f34a19e
sum, count columns not implemented in native queries
Jan 20, 2023
4dcdacf
native implementation of count aggregation for group by
Jan 20, 2023
5f4effe
sequences in spark objects
Jan 23, 2023
621716e
Merge branch 'native-spark-clauses-return-sequences' into 'native-spa…
Jan 25, 2023
5d7fdaa
Merge branch 'master' into master-dominik
Jan 25, 2023
7ef0a56
Merge branch 'master-dominik' into native-spark-clauses
Jan 25, 2023
7b2d37e
Merge branch 'master-dominik' into native-spark-clauses
Jan 25, 2023
d33f3cd
allow flwor as subquery of other expressions, propagate view
Jan 26, 2023
4d749e2
rename variable
Jan 31, 2023
a5a04c4
do not apply where after grouping
Feb 1, 2023
c8ff51e
bugfix: use correct view in forclause
Feb 2, 2023
d2a9d06
bugfix: select only columns of dataframe
Feb 2, 2023
cb45cdd
native SQL for range iterator
Feb 3, 2023
3e25d44
correct type for cast to double
Feb 3, 2023
bc0517f
sum function and type promotion native sql
Feb 3, 2023
d62b18d
decide comparison operation based on sequence type
Feb 7, 2023
d28b53d
use a sequence instead of an array, simplify sequence lookup native q…
Feb 9, 2023
5cd1e39
generalize optimization pipelines
Feb 9, 2023
33e9253
allow general expressions in predicates
Feb 9, 2023
636eaa2
allow general visitors for optimizations
Feb 13, 2023
5c26213
only execute code if variable in output projection
Feb 13, 2023
c2b78bc
undo breaking optimization
Feb 13, 2023
1338435
Merge branch 'rewrite-comparisons' into 'native-spark-clauses'
Feb 13, 2023
59f8fc8
Merge branch 'native-spark-clauses' into 'master-dominik'
Feb 14, 2023
f84dd1f
find and remove unused code
Feb 14, 2023
1cdedea
remove duplicate code
Feb 14, 2023
d5ecae1
fix: use result as param
Feb 14, 2023
df4d0d5
undo change of test
Feb 14, 2023
4b74598
floor function allow any numeric value
Feb 15, 2023
55d6a4b
bugfix: count clause support
Feb 15, 2023
9d5f2f7
Merge branch 'bugfix-floor-function' into 'master-dominik'
Feb 15, 2023
2741e02
Merge branch 'dead-code-detection' into 'master-dominik'
Feb 15, 2023
184ece1
only create flwor queries when view is set
Feb 16, 2023
177a729
include library modules in comparison rewriting
Feb 16, 2023
10eace9
Merge branch 'extended-comparison-rewriting' into 'master-dominik'
Feb 16, 2023
cce9598
Validate Type with no-op validator
Feb 16, 2023
c80f13c
cast and type promotion native sql
Feb 16, 2023
e521584
spotless.
Feb 16, 2023
79aa10f
Merge branch 'native-type-promotion' into 'master-dominik'
Feb 16, 2023
0c1dd27
extend validate expression with optional validation
Feb 23, 2023
d908a00
rename visitor
Feb 23, 2023
150d46a
Merge branch 'rename-visitor' into 'master-dominik'
Feb 23, 2023
491a796
spotless.
Feb 23, 2023
1b1256f
propagate type for sum function
Feb 28, 2023
158a48f
Merge branch 'infer-type-for-sum' into 'master-dominik'
Feb 28, 2023
a726126
rename annotated to annotate
Mar 1, 2023
046ee76
Merge branch 'extend-validate-expression' into 'master-dominik'
Mar 1, 2023
6c104b7
add repartition parameter to ParquetFileFunctionIterator
Mar 9, 2023
d2a9c27
spotless.
Mar 9, 2023
055c36e
Merge branch 'parquet-file-repartition' into 'master-dominik'
Mar 9, 2023
ae02eca
Merge from master.
Mar 23, 2023
bc7637b
Reduce diff.
Mar 23, 2023
6a99ebd
Merge branch 'master' of github.com:RumbleDB/rumble into master-domin…
Apr 5, 2023
9e90d7b
Fix build.
Apr 5, 2023
3424e1b
Take over.
Apr 5, 2023
703d984
Take over.
Apr 5, 2023
ccd7ba3
Solve conflict.
Apr 25, 2023
d1363c1
Merge master back.
May 4, 2023
7dc94b4
Merge back.
May 4, 2023
6ec5b16
Revert file.
May 4, 2023
a19514c
Merge branch 'master' of github.com:RumbleDB/rumble into master-domin…
May 17, 2023
a8f5180
Merge branch 'master' of github.com:RumbleDB/rumble into master-domin…
Jul 7, 2023
06c7519
Merge back.
Jul 7, 2023
b1e427e
Merge back.
Feb 28, 2024
ac1f5b8
Fix concurrent modification.
Feb 28, 2024
5974106
Merge branch 'master' into master-dominik-merge
ghislainfourny Jul 9, 2024
95e383b
Merge branch 'master' into master-dominik-merge
ghislainfourny Jul 10, 2024
0b97abf
Merge master.
Jul 12, 2024
0fea179
Merge.
Jul 12, 2024
8c0f5de
Fix build.
Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,12 @@
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<dependency>
<groupId>org.skyscreamer</groupId>
<artifactId>jsonassert</artifactId>
<version>1.5.1</version>
<scope>test</scope>
</dependency>
<!--<dependency>
<groupId>edu.vanderbilt.accre</groupId>
<artifactId>laurelin</artifactId>
Expand Down
146 changes: 119 additions & 27 deletions src/main/java/org/rumbledb/compiler/InferTypeVisitor.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
package org.rumbledb.compiler;

import java.net.URI;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

import org.apache.spark.sql.AnalysisException;
import org.apache.spark.sql.types.StructType;
import org.rumbledb.config.RumbleRuntimeConfiguration;
Expand Down Expand Up @@ -112,13 +121,6 @@
import org.rumbledb.types.SequenceType;
import sparksoniq.spark.SparkSessionManager;

import java.net.URI;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;


/**
Expand Down Expand Up @@ -257,8 +259,41 @@ public StaticContext visitCommaExpression(CommaExpression expression, StaticCont
if (inferredType.isEmptySequence()) {
inferredType = childExpressionInferredType;
} else {
ItemType resultingItemType = inferredType.getItemType()
.findLeastCommonSuperTypeWith(childExpressionInferredType.getItemType());
ItemType resultingItemType;
if (
inferredType.getItemType().isObjectItemType()
&& childExpressionInferredType.getItemType().isObjectItemType()
) {
final Map<String, FieldDescriptor> currentItemTypeObject = inferredType.getItemType()
.getObjectContentFacet();
resultingItemType = (currentItemTypeObject.keySet().size() == childExpressionInferredType
.getItemType()
.getObjectContentFacet()
.keySet()
.size()
&& currentItemTypeObject
.keySet()
.stream()
.allMatch(
key -> childExpressionInferredType.getItemType()
.getObjectContentFacet()
.containsKey(key)
&& currentItemTypeObject
.get(key)
.getType()
.equals(
childExpressionInferredType.getItemType()
.getObjectContentFacet()
.get(key)
.getType()
)
))
? inferredType.getItemType()
: BuiltinTypesCatalogue.objectItem;
} else {
resultingItemType = inferredType.getItemType()
.findLeastCommonSuperTypeWith(childExpressionInferredType.getItemType());
}
SequenceType.Arity resultingArity =
((inferredType.getArity() == SequenceType.Arity.OneOrZero
|| inferredType.getArity() == SequenceType.Arity.ZeroOrMore)
Expand Down Expand Up @@ -380,7 +415,33 @@ public StaticContext visitObjectConstructor(ObjectConstructorExpression expressi
}
}
}
expression.setStaticSequenceType(new SequenceType(BuiltinTypesCatalogue.objectItem));
if (
expression.getKeys() != null
&& expression.getKeys()
.stream()
.allMatch(key -> key instanceof StringLiteralExpression)
&& expression.getValues()
.stream()
.map(Expression::getStaticSequenceType)
.allMatch(type -> type.getArity() == SequenceType.Arity.One)
) {
expression.setStaticSequenceType(
new SequenceType(
ItemTypeFactory.createAnonymousObjectType(
expression.getKeys()
.stream()
.map(key -> ((StringLiteralExpression) key).getValue())
.collect(Collectors.toList()),
expression.getValues()
.stream()
.map(value -> value.getStaticSequenceType().getItemType())
.collect(Collectors.toList())
)
)
);
} else {
expression.setStaticSequenceType(new SequenceType(BuiltinTypesCatalogue.objectItem));
}
return argument;
}

Expand Down Expand Up @@ -480,6 +541,29 @@ private boolean tryAnnotateSpecificFunctions(FunctionCallExpression expression,
expression.setStaticSequenceType(args.get(0).getStaticSequenceType());
return true;
}
// handle 'size' function
if (
functionName.equals(Name.createVariableInDefaultFunctionNamespace("size"))
&& args.get(0).getStaticSequenceType().getArity() == SequenceType.Arity.One
) {
// set output type to 'Integer' if inputType is 'Array'
expression.setStaticSequenceType(
new SequenceType(BuiltinTypesCatalogue.integerItem, SequenceType.Arity.One)
);
return true;
}

if (functionName.equals(Name.createVariableInDefaultFunctionNamespace("sum"))) {
expression.setStaticSequenceType(
new SequenceType(
args.get(0).getStaticSequenceType().getItemType(),
args.get(0).getStaticSequenceType().getArity() == SequenceType.Arity.OneOrMore
? SequenceType.Arity.One
: SequenceType.Arity.OneOrZero
)
);
return true;
}

return false;
}
Expand Down Expand Up @@ -657,7 +741,9 @@ public StaticContext visitCastExpression(CastExpression expression, StaticContex
expression.getMetadata()
);
}

if (expressionSequenceType.getArity() == SequenceType.Arity.One) {
castedSequenceType = new SequenceType(castedSequenceType.getItemType(), SequenceType.Arity.One);
}
expression.setStaticSequenceType(castedSequenceType);
return argument;
}
Expand Down Expand Up @@ -701,6 +787,9 @@ public StaticContext visitTreatExpression(TreatExpression expression, StaticCont
);
}

if (SequenceType.ITEM_STAR.equals(treatedSequenceType)) {
treatedSequenceType = expressionSequenceType;
}
expression.setStaticSequenceType(treatedSequenceType);
return argument;
}
Expand Down Expand Up @@ -880,6 +969,8 @@ public StaticContext visitAdditiveExpr(AdditiveExpression expression, StaticCont
private ItemType resolveNumericType(ItemType left, ItemType right) {
if (left.equals(BuiltinTypesCatalogue.doubleItem) || right.equals(BuiltinTypesCatalogue.doubleItem)) {
return BuiltinTypesCatalogue.doubleItem;
} else if (left.equals(BuiltinTypesCatalogue.floatItem) || right.equals(BuiltinTypesCatalogue.floatItem)) {
return BuiltinTypesCatalogue.floatItem;
} else if (left.equals(BuiltinTypesCatalogue.decimalItem) || right.equals(BuiltinTypesCatalogue.decimalItem)) {
return BuiltinTypesCatalogue.decimalItem;
} else {
Expand Down Expand Up @@ -1619,7 +1710,7 @@ public StaticContext visitObjectLookupExpression(ObjectLookupExpression expressi
: SequenceType.Arity.ZeroOrMore;

ItemType inferredType = BuiltinTypesCatalogue.item;
// if we have a specific object type and a string literal as key try perform better inference
// if we have a specific object type and a string literal as key try to perform better inference
if (
mainType.getItemType().isObjectItemType()
&& (expression.getLookupExpression() instanceof StringLiteralExpression)
Expand Down Expand Up @@ -1669,7 +1760,14 @@ public StaticContext visitArrayUnboxingExpression(ArrayUnboxingExpression expres
expression.getMetadata()
);
}

if (mainType.getItemType().isArrayItemType()) {
SequenceType sequenceType = new SequenceType(
mainType.getItemType().getArrayContentFacet(),
SequenceType.Arity.ZeroOrMore
);
expression.setStaticSequenceType(sequenceType);
return argument;
}
expression.setStaticSequenceType(SequenceType.createSequenceType("item*"));
return argument;
}
Expand Down Expand Up @@ -1889,18 +1987,12 @@ public StaticContext visitForClause(ForClause expression, StaticContext argument
basicChecks(inferredType, expression.getClass().getSimpleName(), true, false, expression.getMetadata());
if (inferredType.isEmptySequence()) {
if (!expression.isAllowEmpty()) {
if (
!expression.getVariableName().equals(Name.TEMP_VAR1)
&& !expression.getVariableName().equals(Name.TEMP_VAR2)
) {
// for sure we will not have any tuple to process and return the empty sequence
throwStaticTypeException(
"In for clause Inferred type is empty sequence, empty is not allowed, so the result returned is for sure () and this is not a CommaExpression",
ErrorCode.StaticallyInferredEmptySequenceNotFromCommaExpression,
expression.getMetadata()
);
}
inferredType = new SequenceType(BuiltinTypesCatalogue.atomicItem);
// for sure we will not have any tuple to process and return the empty sequence
throwStaticTypeException(
"In for clause Inferred type is empty sequence, empty is not allowed, so the result returned is for sure () and this is not a CommaExpression",
ErrorCode.StaticallyInferredEmptySequenceNotFromCommaExpression,
expression.getMetadata()
);
}
} else {
// we take the single arity version of the inferred type or optional arity if we allow empty and the
Expand Down Expand Up @@ -1943,7 +2035,7 @@ public StaticContext visitLetClause(LetClause expression, StaticContext argument
expression.getVariableName(),
expression.getMetadata()
);

expression.setStaticType(inferredType);
return argument;
}

Expand Down Expand Up @@ -2022,7 +2114,7 @@ public StaticContext visitGroupByClause(GroupByClause expression, StaticContext
groupingVars.add(groupByVar.getVariableName());
}

// finally if there was a for clause we need to change the arity of the variables binded so far in the flowr
// finally if there was a for clause we need to change the arity of the variables bound so far in the flwor
// expression, from ? to * and from 1 to +
// excluding the grouping variables
StaticContext nextClauseStaticContext = expression.getNextClause().getStaticContext();
Expand Down
Loading
Loading