New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LIVY-572] Avoid usage of spark classes in ColumnBuffer #162
Conversation
@@ -43,7 +63,13 @@ public void addRow(Object[] fields) { | |||
} | |||
|
|||
for (int i = 0; i < fields.length; i++) { | |||
columns[i].add(fields[i]); | |||
Object value; | |||
if (columnIsString[i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking columns[i].getType().equals(DataType.STRING)
here avoids the extra boolean array.
(It's also an enum so you could use ==
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason why I did so was to avoid doing the check for every field of each row and hence avoid introducing an overhead in a part of the code which is critical for performances. Anyway, if you feel strong about this I can change it: it is also true that the overhead is probably very small compared to all the operations which we need to perform, e.g. transfer the data over the network...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ==
there's actually no more overhead at all. In fact, if you look at CPU caches and things like that it's probably faster since there's less risk of a cache miss...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done, thanks.
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
============================================
+ Coverage 68.59% 68.73% +0.14%
+ Complexity 905 904 -1
============================================
Files 100 100
Lines 5662 5662
Branches 848 848
============================================
+ Hits 3884 3892 +8
+ Misses 1225 1216 -9
- Partials 553 554 +1
Continue to review full report at Codecov.
|
Merging to master. |
Thanks @vanzin! What about backporting to 0.6? |
If there's a need for another RC I'll do it, but we have never released a .z release before, I'd rather just do 0.7 instead. |
What changes were proposed in this pull request?
The
ColumnBuffers
can be created both inside spark jobs and in the Livy server. The latter case happens when operation logs are returned and fails before this patch because we are using Spark classes in this code after the refactor in LIVY-503. Unfortunately, we do not have test coverage for operation logs retrieval and this is the reason why this wasn't spot out earlier.Since operation logs are retrieved by beeline for each query, this means that every query run through beeline fails, unless
livy.server.thrift.logging.operation.enabled
is set tofalse
.How was this patch tested?
manual tests using beeline