New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api #37843
Conversation
will add more similar case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK; see if there are more very similar cases. Do you have any rough benchmarks that show it's faster?
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, yeah
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
Yes, there are other cases. I am sorting out the test data and hope to fix them all in this one |
@@ -44,7 +44,16 @@ public interface Expression { | |||
* List of fields or columns that are referenced by this expression. | |||
*/ | |||
default NamedReference[] references() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare
public static TestValue[] distinctUseStreamApi(TestObj[] input) {
return Arrays.stream(input).map(s -> s.values)
.flatMap(Arrays::stream).distinct().toArray(TestValue[]::new);
}
and
public static TestValue[] distinctUseLoopApi(TestObj[] input) {
List<TestValue> list = new ArrayList<>();
Set<TestValue> uniqueValues = new HashSet<>();
for (TestObj s : input) {
TestValue[] values = s.values;
for (TestValue testValue : values) {
if (uniqueValues.add(testValue)) {
list.add(testValue);
}
}
}
return list.toArray(new TestValue[0]);
}
TestValue
and TestObj
define as follows:
public static class TestObj {
TestValue[] values;
public TestObj(int size, int range) {
values = new TestValue[size];
for (int i = 0; i < values.length; i++) {
values[i] = new TestValue(RandomUtils.nextInt(0, range));
}
}
}
public static class TestValue {
private int value;
public TestValue(int value) {
this.value = value;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
TestValue testValue = (TestValue) o;
return value == testValue.value;
}
@Override
public int hashCode() {
return Objects.hashCode(value);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use following method build test object
public static TestObj[] objs(int length, int size, int range) {
TestObj[] objects = new TestObj[length];
for (int i = 0; i < length; i++) {
objects[i] = new TestObj(size, range);
}
return objects;
}
and test length, size, range
:
-1, 5, 100
- 5, 5, 100
- 10, 5, 100
- 20, 5, 100
- 50, 5, 100
- 100, 5, 100
- 500, 5, 100
- 1000, 5, 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java 8
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 35 35 1 2.8 351.9 1.0X
Use Loop api 18 18 0 5.5 180.6 1.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 129 130 1 0.8 1288.7 1.0X
Use Loop api 82 83 1 1.2 824.4 1.6X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 228 229 1 0.4 2280.0 1.0X
Use Loop api 160 161 1 0.6 1599.7 1.4X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 430 431 1 0.2 4301.0 1.0X
Use Loop api 311 312 1 0.3 3109.9 1.4X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 50: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 860 862 2 0.1 8597.6 1.0X
Use Loop api 701 702 1 0.1 7013.1 1.2X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 1454 1456 3 0.1 14540.1 1.0X
Use Loop api 1317 1318 2 0.1 13168.9 1.1X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 500: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 5584 5586 2 0.0 55841.2 1.0X
Use Loop api 5784 5786 3 0.0 57839.1 1.0X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 10727 10728 2 0.0 107266.4 1.0X
Use Loop api 10534 10535 1 0.0 105342.5 1.0X
Java 11
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 41 42 1 2.4 408.5 1.0X
Use Loop api 22 23 1 4.5 224.4 1.8X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 159 160 1 0.6 1594.5 1.0X
Use Loop api 86 87 0 1.2 864.7 1.8X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 275 276 2 0.4 2748.0 1.0X
Use Loop api 167 169 3 0.6 1673.5 1.6X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 511 513 2 0.2 5113.5 1.0X
Use Loop api 315 317 2 0.3 3151.8 1.6X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 50: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 1012 1014 2 0.1 10118.2 1.0X
Use Loop api 675 677 2 0.1 6747.0 1.5X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 1665 1667 3 0.1 16645.2 1.0X
Use Loop api 1253 1254 1 0.1 12528.3 1.3X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 500: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 6305 6308 5 0.0 63046.3 1.0X
Use Loop api 5375 5376 1 0.0 53751.0 1.2X
OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test for distinct with input size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 12081 12083 3 0.0 120806.6 1.0X
Use Loop api 10463 10467 5 0.0 104634.7 1.2X
Java 17
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 33 36 2 3.1 325.2 1.0X
Use Loop api 16 18 2 6.1 164.4 2.0X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 103 111 5 1.0 1032.9 1.0X
Use Loop api 75 80 3 1.3 746.4 1.4X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 202 210 5 0.5 2022.3 1.0X
Use Loop api 152 164 8 0.7 1522.6 1.3X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 345 362 14 0.3 3446.2 1.0X
Use Loop api 283 299 15 0.4 2827.3 1.2X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 50: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 729 767 33 0.1 7295.0 1.0X
Use Loop api 581 598 12 0.2 5811.8 1.3X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 1370 1381 16 0.1 13700.8 1.0X
Use Loop api 1107 1114 10 0.1 11070.0 1.2X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 500: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 6541 6545 7 0.0 65405.0 1.0X
Use Loop api 4694 4782 124 0.0 46939.4 1.4X
OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Test for distinct with input size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Arrays.steam api 11999 12185 263 0.0 119990.3 1.0X
Use Loop api 9282 9366 118 0.0 92822.1 1.3X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Java 11 and 17, using loop looks more better,
...on/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
Outdated
Show resolved
Hide resolved
...on/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
Outdated
Show resolved
Hide resolved
spark/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java Lines 225 to 230 in 0996a15
Test mapToIntAndSum
vs
with Java 8
Java 11
Java 17
Test encode
vs
with Java 8
Java 11
Java 17
Only when the input size is small has improvment, so not change in this pr |
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomSumMetric.java
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomAvgMetric.java
Show resolved
Hide resolved
Is the summary is that it's basically always a win? I'm convinced if so, just let me know when you have made all the changes you want to |
This pr has been completed and waiting for CI |
Yes, in most cases, loop is always better than Arrays.stream api, only |
@srowen If you have time, please help me review this. Thanks ~ |
return true; | ||
} | ||
|
||
private boolean isAnyBlockNotStartWithShuffleBlockPrefix(String[] blockIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be static methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all change to static or only OneForOneBlockFetcher ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9c979c4 changed all possible
} | ||
} | ||
} | ||
return list.toArray(new NamedReference[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to build the List too? why not just .toArray on the Set, because ordering is important? LinkedHashSet could help there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let met check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1ecf017 change to use LinkedHashSet, let me check the performance and waiting CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's fine, thanks for checking. Whatever seems most efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping @huaxingao Could you help confirm that is the ordering important for the result of Expression#references
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order shouldn't matter. Thanks for checking with me @LuciferYang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @huaxingao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to use HashSet if the result order is not important
...on/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
Show resolved
Hide resolved
} | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is not a perf critical path, I would recommend to keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK
@@ -17,7 +17,7 @@ | |||
|
|||
package org.apache.spark.sql.connector.expressions; | |||
|
|||
import java.util.Arrays; | |||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we've avoided wildcard imports, just enumerate them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -62,8 +61,7 @@ public String build(Expression expr) { | |||
String name = e.name(); | |||
switch (name) { | |||
case "IN": { | |||
List<String> children = | |||
Arrays.stream(e.children()).map(c -> build(c)).collect(Collectors.toList()); | |||
List<String> children = expressionsToStringList(e.children()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Given this is called only here, by not avoid the subList ? (give start offset and len params to expressionsToStringList
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is e00330f ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the .subList
only wraps a SubList
object and does not trigger operations such as memory copy, so the performance gap may be small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, sublist is fairly optimal ... but can be avoided here is possible. It is a nit comment actually :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor nit, looks good to me.
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2PredicateSuite.scala
Show resolved
Hide resolved
for (Expression e : children()) { | ||
Collections.addAll(set, e.references()); | ||
} | ||
return set.toArray(new NamedReference[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one last tiny suggestion - either pass an array of size set.size(), or make a static final empty array and pass it here, to avoid allocating an empty array. It's tiny but hey we are micro-optimizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choice make a static final empty array and pass it here
due to using an empty array is more recommended and the empty array is only used to get the target array class type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Install dependencies for documentation generation failed, not relate to this one, re-run it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nstall dependencies for documentation generation
still failed
https://github.com/LuciferYang/spark/actions/runs/3064658259/jobs/4947983207
× Building wheel for pyzmq (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [186 lines of output]
/tmp/pip-build-env-812k46kb/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py:262: UserWarning: Unknown distribution option: 'cffi_modules'
warnings.warn(msg)
running bdist_wheel
running build
running build_py
copying zmq/error.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/asyncio.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/_future.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/_typing.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/decorators.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/constants.py -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq
creating build/lib.linux-x86_64-cpython-39/zmq/log
copying zmq/log/__main__.py -> build/lib.linux-x86_64-cpython-39/zmq/log
copying zmq/log/handlers.py -> build/lib.linux-x86_64-cpython-39/zmq/log
copying zmq/log/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/log
creating build/lib.linux-x86_64-cpython-39/zmq/green
copying zmq/green/device.py -> build/lib.linux-x86_64-cpython-39/zmq/green
copying zmq/green/poll.py -> build/lib.linux-x86_64-cpython-39/zmq/green
copying zmq/green/core.py -> build/lib.linux-x86_64-cpython-39/zmq/green
copying zmq/green/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/green
creating build/lib.linux-x86_64-cpython-39/zmq/green/eventloop
copying zmq/green/eventloop/ioloop.py -> build/lib.linux-x86_64-cpython-39/zmq/green/eventloop
copying zmq/green/eventloop/zmqstream.py -> build/lib.linux-x86_64-cpython-39/zmq/green/eventloop
copying zmq/green/eventloop/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/green/eventloop
creating build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_imports.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_future.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_asyncio.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_win32_shim.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_etc.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_draft.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_message.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_monqueue.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_multipart.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_error.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_constants.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_poll.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/conftest.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_monitor.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_security.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_context.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_mypy.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_ssh.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_z85.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_cffi_backend.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_ioloop.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_auth.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_pair.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_device.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_includes.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_socket.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_cython.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_log.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_pubsub.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_ext.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_zmqstream.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_reqrep.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_proxy_steerable.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_version.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_retry_eintr.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
copying zmq/tests/test_decorators.py -> build/lib.linux-x86_64-cpython-39/zmq/tests
creating build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/proxydevice.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/monitoredqueuedevice.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/proxysteerabledevice.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/monitoredqueue.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/basedevice.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/devices/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/devices
creating build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/asyncio.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/certs.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/ioloop.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/base.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
copying zmq/auth/thread.py -> build/lib.linux-x86_64-cpython-39/zmq/auth
creating build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/interop.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/garbage.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/win32.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/jsonapi.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/monitor.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/z85.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/strtypes.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/utils
creating build/lib.linux-x86_64-cpython-39/zmq/backend
copying zmq/backend/select.py -> build/lib.linux-x86_64-cpython-39/zmq/backend
copying zmq/backend/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/backend
creating build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
creating build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/error.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/context.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/message.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/_poll.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/devices.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/socket.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/utils.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/backend/cffi/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
creating build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/version.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/tracker.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/context.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/attrsettr.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/poll.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/frame.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/stopwatch.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/socket.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
copying zmq/sugar/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/sugar
creating build/lib.linux-x86_64-cpython-39/zmq/ssh
copying zmq/ssh/tunnel.py -> build/lib.linux-x86_64-cpython-39/zmq/ssh
copying zmq/ssh/forward.py -> build/lib.linux-x86_64-cpython-39/zmq/ssh
copying zmq/ssh/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/ssh
creating build/lib.linux-x86_64-cpython-39/zmq/eventloop
copying zmq/eventloop/ioloop.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop
copying zmq/eventloop/zmqstream.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop
copying zmq/eventloop/future.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop
copying zmq/eventloop/_deprecated.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop
copying zmq/eventloop/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop
creating build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/log.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/stack_context.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/concurrent.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/ioloop.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/util.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
copying zmq/eventloop/minitornado/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado
creating build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/interface.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/auto.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/common.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/windows.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/posix.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/eventloop/minitornado/platform/__init__.py -> build/lib.linux-x86_64-cpython-39/zmq/eventloop/minitornado/platform
copying zmq/__init__.pxd -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/__init__.pyi -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/py.typed -> build/lib.linux-x86_64-cpython-39/zmq
copying zmq/devices/monitoredqueue.pxd -> build/lib.linux-x86_64-cpython-39/zmq/devices
copying zmq/utils/buffers.pxd -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/zmq_compat.h -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/pyversion_compat.h -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/mutex.h -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/ipcmaxlen.h -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/getpid_compat.h -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/compiler.json -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/utils/config.json -> build/lib.linux-x86_64-cpython-39/zmq/utils
copying zmq/backend/__init__.pyi -> build/lib.linux-x86_64-cpython-39/zmq/backend
copying zmq/backend/cython/context.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/socket.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/__init__.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/message.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/checkrc.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/libzmq.pxd -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cython/constant_enums.pxi -> build/lib.linux-x86_64-cpython-39/zmq/backend/cython
copying zmq/backend/cffi/_cdefs.h -> build/lib.linux-x86_64-cpython-39/zmq/backend/cffi
copying zmq/sugar/__init__.pyi -> build/lib.linux-x86_64-cpython-39/zmq/sugar
running build_ext
running configure
Using bundled libzmq
already have bundled/zeromq
already have platform.hpp
checking for timer_create
************************************************
************************************************
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.9 -c /tmp/timer_createwdqm_dcl.c -o tmp/timer_createwdqm_dcl.o
/tmp/timer_createwdqm_dcl.c: In function ‘main’:
/tmp/timer_createwdqm_dcl.c:2:5: warning: implicit declaration of function ‘timer_create’ [-Wimplicit-function-declaration]
2 | timer_create();
| ^~~~~~~~~~~~
x86_64-linux-gnu-gcc -pthread tmp/timer_createwdqm_dcl.o -L/usr/lib/x86_64-linux-gnu -o a.out
/usr/bin/ld: tmp/timer_createwdqm_dcl.o: in function `main':
/tmp/timer_createwdqm_dcl.c:2: undefined reference to `timer_create'
collect2: error: ld returned 1 exit status
no timer_create, linking librt
************************************************
building 'zmq.libzmq' extension
creating build/temp.linux-x86_64-cpython-39/buildutils
creating build/temp.linux-x86_64-cpython-39/bundled
creating build/temp.linux-x86_64-cpython-39/bundled/zeromq
creating build/temp.linux-x86_64-cpython-39/bundled/zeromq/src
x86_64-linux-gnu-g++ -pthread -std=c++11 -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DZMQ_HAVE_CURVE=1 -DZMQ_USE_TWEETNACL=1 -DZMQ_USE_EPOLL=1 -DZMQ_IOTHREADS_USE_EPOLL=1 -DZMQ_POLL_BASED_ON_POLL=1 -Ibundled/zeromq/include -Ibundled -I/usr/include/python3.9 -c buildutils/initlibzmq.cpp -o build/temp.linux-x86_64-cpython-39/buildutils/initlibzmq.o
buildutils/initlibzmq.cpp:10:10: fatal error: Python.h: No such file or directory
10 | #include "Python.h"
| ^~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-g++' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyzmq
ERROR: Could not build wheels for pyzmq, which is required to install pyproject.toml-based projects
Failed to build pyzmq
Error: Process completed with exit code 1.
friendly ping @HyukjinKwon @Yikun to help to check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess already addressed here: #37904
mind to rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has been rebased, thank you @Yikun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to Python linter check failed...
https://github.com/LuciferYang/spark/actions/runs/3065275580/jobs/4949221030
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/pandas/window.py:112: error: Module has no attribute "lit" [attr-defined]
Found 1 error in 1 file (checked 340 source files)
1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GA passed
rebase and re-run GA |
Merged to master |
thanks @srowen @mridulm @huaxingao for review ~ |
### What changes were proposed in this pull request? This PR replaces `Arrays.stream` api with loop where performance improvement can be obtained. ### Why are the changes needed? Minor performance improvement. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github actions Closes apache#37843 from LuciferYang/ExpressionArrayToStrings. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Sean Owen <srowen@gmail.com>
What changes were proposed in this pull request?
This PR replaces
Arrays.stream
api with loop where performance improvement can be obtained.Why are the changes needed?
Minor performance improvement.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass Github actions