-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resize hash table before building #9069
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
what is the hashtable build time? before and after your pr. |
zbtzbtzbt
reviewed
Apr 18, 2022
yiguolei
previously approved these changes
May 23, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
apache#9180) * avoiding a corrupt image file when there is image.ckpt with non-zero size For now, saveImage writes data to image.ckpt via an append FileOutputStream, when there is a non-zero size file named image.ckpt, a disaster would happen due to a corrupt image file. Even worse, fe only keeps the lastest image file and removes others. BTW, image file should be synced to disk. It is dangerous to only keep the latest image file, because an image file is validated when generating the next image file. Then we keep an non validated image file but remove validated ones. So I will issue a pr which keeps at least 2 image file. * append other data after MetaHeader * use channel.force instead of sync
Co-authored-by: Rongqian Li <rongqian_li@idgcapital.com>
…n mode (apache#9195) Co-authored-by: yiguolei <yiguolei@gmail.com>
* rename ImageSeq to LatestImageSeq in Storage * keep at least one validated image file
…he#9011) * load newly generated image file as soon as generated to check if it is valid. * delete the latest invalid image file * fix * fix * get filePath from saveImage() to ensure deleting the correct file while exception happens * fix Co-authored-by: wuhangze <wuhangze@jd.com>
… spark load. (apache#9136) Buffer flip is used incorrectly. When the hash key is string type, the hash value is always zero. The reason is that the buffer of string type is obtained by wrap, which is not needed to flip. If we do so, the buffer limit for read will be zero.
…ge back to SSD (apache#9158) 1. fix bug described in apache#9159 2. fix a `fill_tuple` bug introduced from apache#9173
start_fe.sh: line 174: [: -eq: unary operator expected
…hread (apache#9472) * add ArrowReaderProperties to parquet::arrow::FileReader * support perfecth batch
…alDictValue` exceeds integer range (apache#9436)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
…e it consistent with hive and trino behavior. (apache#9190) Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't. This would cause different query result with hive. Add a new session variable "trim_tailing_spaces_for_external_table_query". If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
…vparquet/vbroker scanner (apache#9666) * [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner 1. fix bug of vjson scanner not support `range_from_file_path` 2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different 3. fix bug of vparquest filter_block reference of column in not 1 4. refactor code to simple all the code It only changed vectorized load, not original row based load. Co-authored-by: lihaopeng <lihaopeng@baidu.com>
Enhance java style. Now: checkstyle about code order is in this page--Class and Interface Declarations This pr can make idea auto rearrange code
Add insert best practices
Currently, the libhdfs3 library integrated by doris BE does not support accessing the cluster with kerberos authentication enabled, and found that kerberos-related dependencies(gsasl and krb5) were not added when build libhdfs3. so, this pr will enable kerberos support and rebuild libhdfs3 with dependencies gsasl and krb5: - gsasl version: 1.8.0 - krb5 version: 1.19
select column from table where column is null
Disable by default because current checksum logic has some bugs. And it will also bring some overhead.
…pache#9703) Due to the current architecture, predicate derivation at rewrite cannot satisfy all cases, because rewrite is performed on first and then where, and when there are subqueries, all cases cannot be derived. So keep the predicate pushdown method here. eg. select * from t1 left join t2 on t1 = t2 where t1 = 1; InferFiltersRule can't infer t2 = 1, because this is out of specification. The expression(t2 = 1) can actually be deduced to push it down to the scan node.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/sql/function
Issues or PRs related to the SQL functions
area/vectorization
kind/docs
Categorizes issue or PR as related to documentation.
reviewed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Initialize hash table size by the tuple number instead of fixed number 1024 to reduce BuildTableExpanseTime.
After initialize table size, the total build time decreased by 8.9% on tpch 10G,
select count(*) from lineitem join orders on l_orderkey = o_orderkey
Issue Number: close #xxx
Problem Summary:
Describe the overview of changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...