Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1121: Predicate pushdown does not work #1053

Closed
PengleiShi opened this issue Mar 3, 2022 · 8 comments
Closed

ORC-1121: Predicate pushdown does not work #1053

PengleiShi opened this issue Mar 3, 2022 · 8 comments
Labels

Comments

@PengleiShi
Copy link
Contributor

Hi, I have a problem, my test data is tpc-ds 1g, spark 3.2, orc version 1.6.11,
test sql select count(1) from call_center_orc where cc_call_center_sk > 100;
cc_call_center_sk is the first column in call_center_orc, and predicate pushdown is effectual
image
but when i test select count(1) from call_center_orc where cc_company > 100;
cc_company is not the first column, predicate pushdown does not work
image

And I debug the code, I found the problem is SchemaEvolution.ppdSafeConversion,
image
in my case, result.size is 2, but in pickRowGroups
image
the columnIx is the column index in orc meta, which is 19 for cc_company, this causes orc will not evaluate pushdown filters with row group stats, and can not skip the row group.

@dongjoon-hyun
Copy link
Member

Thank you for reporting, @PengleiShi . Could you make a PR for that?

@dongjoon-hyun
Copy link
Member

cc @pgaref , @williamhyun , @guiyanakuang

@guiyanakuang
Copy link
Member

I'll do some investigation later

@PengleiShi
Copy link
Contributor Author

Thank you for reporting, @PengleiShi . Could you make a PR for that?

I'am willing to try.

@guiyanakuang
Copy link
Member

I reviewed SchemaEvolution.java and determined that it was a bug.
I think maybe we can initialize a map to store fileTypeId to readTypeId via readerFileTypes.

Map<Integer, Integer> typeIdsMap = new HashMap<>();

for (int i = 0; i < this.readerFileTypes.length; i++) {
  this.typeIdsMap.put(readerFileTypes[i].getId(), i);
}

public boolean isPPDSafeConversion(final int colId) {
  if (hasConversion()) {
    Integer readTypeId = typeIdsMap.get(colId);
    return readTypeId != null &&
        ppdSafeConversion[readTypeId];
  }

  // when there is no schema evolution PPD is safe
  return true;
}

Welcome @PengleiShi for making this pr. 🍻

@PengleiShi
Copy link
Contributor Author

@guiyanakuang Thanks, I'll do it. And should i open a new issue in Jira?

@guiyanakuang
Copy link
Member

Yes, we need the PR prefix to be associated with Jira.

@dongjoon-hyun
Copy link
Member

This is resolved via #1055

@dongjoon-hyun dongjoon-hyun changed the title Predicate pushdown does not work ORC-1121: Predicate pushdown does not work Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants