New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-3268 Add AUTO_INCREMENT column in GenerateTableFetch to benefit index #1376

Open
wants to merge 19 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@qfdk

qfdk commented Dec 30, 2016

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Feb 14, 2017

Contributor

@qfdk Thanks for your contribution! Excuse us for taking so long to respond.

Please note that when you submit a PR for a JIRA, push the "submit patch" button to move JIRA status forward to "Patch Available". This ensure other developers to know it's ready for PR.
https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-CodeReviewProcess

I like the idea of using indexed column instead of offset.
I wanted to test this PR. However, the same GenerateTableFetch class is recently updated by #1407 and this PR is now conflicted to merge.

Would you update this PR by rebasing with the latest master branch and fix conflicts? And also add Expression support to the "AUTO_INCREMENT(index) column name" you added?

Contributor

ijokarumawak commented Feb 14, 2017

@qfdk Thanks for your contribution! Excuse us for taking so long to respond.

Please note that when you submit a PR for a JIRA, push the "submit patch" button to move JIRA status forward to "Patch Available". This ensure other developers to know it's ready for PR.
https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-CodeReviewProcess

I like the idea of using indexed column instead of offset.
I wanted to test this PR. However, the same GenerateTableFetch class is recently updated by #1407 and this PR is now conflicted to merge.

Would you update this PR by rebasing with the latest master branch and fix conflicts? And also add Expression support to the "AUTO_INCREMENT(index) column name" you added?

@ijokarumawak

I took a look on the code and commented feedback. Please add some unit testing code to illustrate and ensure how this is supposed to work by adding new test methods into the existing TestGenerateTableFetch.

Thanks!

@patricker

This comment has been minimized.

Show comment
Hide comment
@patricker

patricker Feb 16, 2017

Contributor

How database specific is this? I hadn't heard of it before, in looking around online it looks like it's for MySQL/MariaDB?

It sounds like a great idea for the systems that can support it, I'm just wondering where it belongs. Is this something that should be in the DatabaseAdapter layer?

Contributor

patricker commented Feb 16, 2017

How database specific is this? I hadn't heard of it before, in looking around online it looks like it's for MySQL/MariaDB?

It sounds like a great idea for the systems that can support it, I'm just wondering where it belongs. Is this something that should be in the DatabaseAdapter layer?

qfdk added some commits Feb 23, 2017

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Feb 27, 2017

Contributor

@patricker Thank you for pointing that out. It can vary among databases. Peter, What RDB do you usually use? @qfdk I thought you're using PostgreSQL by reading the HCC question you posted. It would be great if we can review it together to see whether this PR can work with different databases.

@qfdk I saw you've committed several updates, is it ready to another review cycle?

Contributor

ijokarumawak commented Feb 27, 2017

@patricker Thank you for pointing that out. It can vary among databases. Peter, What RDB do you usually use? @qfdk I thought you're using PostgreSQL by reading the HCC question you posted. It would be great if we can review it together to see whether this PR can work with different databases.

@qfdk I saw you've committed several updates, is it ready to another review cycle?

@qfdk

This comment has been minimized.

Show comment
Hide comment
@qfdk

qfdk Feb 27, 2017

@ijokarumawak yes you are right, It's my post :) Because the new master can't passer travis-ci, I can't understand that. So I tried to compile it with travis-ci then i committed several updates to launch travis-ci job :( I'm not sure if it works with different databases. Now i use the first version to test but i got that in image. I think we could improve it :)

qfdk commented Feb 27, 2017

@ijokarumawak yes you are right, It's my post :) Because the new master can't passer travis-ci, I can't understand that. So I tried to compile it with travis-ci then i committed several updates to launch travis-ci job :( I'm not sure if it works with different databases. Now i use the first version to test but i got that in image. I think we could improve it :)

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Feb 28, 2017

Contributor

@qfdk I understand. It's ok to have Travis test failure with other USER_LANGUAGE than English for now. Since the test has passed with English locale, we can move forward. We're aware of several tests can fail with other locale, and actively fixing those tests right now, for example, NIFI-3466.

In order to add new capability to this generic processor, we need to test with variety Database engines as many user use it with different databases. If you verified it with PostgreSQL, then I'd try reviewing this with other database such as Oracle, SQL server or MySQL.

Contributor

ijokarumawak commented Feb 28, 2017

@qfdk I understand. It's ok to have Travis test failure with other USER_LANGUAGE than English for now. Since the test has passed with English locale, we can move forward. We're aware of several tests can fail with other locale, and actively fixing those tests right now, for example, NIFI-3466.

In order to add new capability to this generic processor, we need to test with variety Database engines as many user use it with different databases. If you verified it with PostgreSQL, then I'd try reviewing this with other database such as Oracle, SQL server or MySQL.

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Mar 28, 2017

Contributor

Hi @qfdk , excuse me for not providing further review comments. Today, I wanted to test this updated PR but unfortunately it is conflicted. Would you rebase it with the latest master, address merge conflict and squash commits into one? Please let us know when it gets ready to resume review.

It may not be ideal, but I think it will take time to get this PR fully reviewed with various DBMS to see if it works as expected and how it can improve performance. On the other hand, the NiFi community is discussing about releasing 1.2.0. To make this effort along with the release cycles, can we remove 1.2.0 as Fix Version from JIRA NIFI-3268? The Fix Version is used to track which version the issue is fixed, and should be empty until it's resolved.

Thanks for your contribution.

Contributor

ijokarumawak commented Mar 28, 2017

Hi @qfdk , excuse me for not providing further review comments. Today, I wanted to test this updated PR but unfortunately it is conflicted. Would you rebase it with the latest master, address merge conflict and squash commits into one? Please let us know when it gets ready to resume review.

It may not be ideal, but I think it will take time to get this PR fully reviewed with various DBMS to see if it works as expected and how it can improve performance. On the other hand, the NiFi community is discussing about releasing 1.2.0. To make this effort along with the release cycles, can we remove 1.2.0 as Fix Version from JIRA NIFI-3268? The Fix Version is used to track which version the issue is fixed, and should be empty until it's resolved.

Thanks for your contribution.

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Mar 29, 2017

Contributor

@qfdk I updated the JIRA and emptied Fix Version for now.

Please let me request one more thing, is it possible for you to provide test result with PostgreSQL (or whatever DBMS you're comfortable with) to illustrate how much this enhancement can improve performance? It'd be really helpful to shorten review cycle. Thanks!

Contributor

ijokarumawak commented Mar 29, 2017

@qfdk I updated the JIRA and emptied Fix Version for now.

Please let me request one more thing, is it possible for you to provide test result with PostgreSQL (or whatever DBMS you're comfortable with) to illustrate how much this enhancement can improve performance? It'd be really helpful to shorten review cycle. Thanks!

@qfdk

This comment has been minimized.

Show comment
Hide comment
@qfdk

qfdk Mar 29, 2017

@ijokarumawak Hey, i will take a look as soon as possible because i found some new modifications in the master so i will try to do that. PS: nifi-1.1.0.2.1.1.0-2 works with my modification, we have 26,360,516 documents (5.6 Gb), in 5 minutes we could finish reading et avron2json. To provide this you want a vedio or just an image if i have some free time

qfdk commented Mar 29, 2017

@ijokarumawak Hey, i will take a look as soon as possible because i found some new modifications in the master so i will try to do that. PS: nifi-1.1.0.2.1.1.0-2 works with my modification, we have 26,360,516 documents (5.6 Gb), in 5 minutes we could finish reading et avron2json. To provide this you want a vedio or just an image if i have some free time

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Mar 30, 2017

Contributor

@qfdk "26,360,516 documents (5.6 Gb), in 5 minutes" is amazing! A video would be very helpful, but a image or two might be enough. I'd like to know the difference between the time to read all those documents (records) with this proposed index column, and without it. Thank you!

Contributor

ijokarumawak commented Mar 30, 2017

@qfdk "26,360,516 documents (5.6 Gb), in 5 minutes" is amazing! A video would be very helpful, but a image or two might be enough. I'd like to know the difference between the time to read all those documents (records) with this proposed index column, and without it. Thank you!

@qfdk

This comment has been minimized.

Show comment
Hide comment
@qfdk

qfdk Mar 30, 2017

@ijokarumawak I made a shot video :) https://youtu.be/ZD-CXgfJ6Xg
this video works for nifi-1.1.0.2.1.1.0-2 & PostgreSql 9.5 & Postgis 2.2, I will re-check the latest version of nifi. In this video, i don't wait the end of "OFFSET" method because i dont want to speand too much time on it after 10 000 documents it became very slowly ...

qfdk commented Mar 30, 2017

@ijokarumawak I made a shot video :) https://youtu.be/ZD-CXgfJ6Xg
this video works for nifi-1.1.0.2.1.1.0-2 & PostgreSql 9.5 & Postgis 2.2, I will re-check the latest version of nifi. In this video, i don't wait the end of "OFFSET" method because i dont want to speand too much time on it after 10 000 documents it became very slowly ...

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Mar 31, 2017

Contributor

@qfdk Thank you very much for recording and sharing the video. I can see how it improves execution time of generated SQL. This enhancement is significant for the first execution of GenerateTableFetch processor if there are many records to fetch. The performance difference is huge!

For the 2nd time or later, GenerateTableFetch uses 'where' clause using 'Max value columns' so it will not be a problem after 1st run if the max value columns are properly indexed. But if there are so many records inserted or updated, between GenerateTableFetch runs, it will have the same issue if we use limit offset.

I understand how it should work now. Please let me know when you finish updating PR. I will review it as soon as possible. Thanks!

Contributor

ijokarumawak commented Mar 31, 2017

@qfdk Thank you very much for recording and sharing the video. I can see how it improves execution time of generated SQL. This enhancement is significant for the first execution of GenerateTableFetch processor if there are many records to fetch. The performance difference is huge!

For the 2nd time or later, GenerateTableFetch uses 'where' clause using 'Max value columns' so it will not be a problem after 1st run if the max value columns are properly indexed. But if there are so many records inserted or updated, between GenerateTableFetch runs, it will have the same issue if we use limit offset.

I understand how it should work now. Please let me know when you finish updating PR. I will review it as soon as possible. Thanks!

@qfdk

This comment has been minimized.

Show comment
Hide comment
@qfdk

qfdk Mar 31, 2017

@ijokarumawak I'm very glad to work in this code in my free time. Now there was some issues that i want to debug the prossor lunched but nothing happend ... nifi-1.1.0.2.1.1.0-2 works verry well but the lastest version it can't work correctly .. I would change some code in this week-end and try to fix them.
QUESTION: I can't debug the code with remote debug in Intellij i made a break point but it can't arrive at this point, do you have somme trucs for this ?

qfdk commented Mar 31, 2017

@ijokarumawak I'm very glad to work in this code in my free time. Now there was some issues that i want to debug the prossor lunched but nothing happend ... nifi-1.1.0.2.1.1.0-2 works verry well but the lastest version it can't work correctly .. I would change some code in this week-end and try to fix them.
QUESTION: I can't debug the code with remote debug in Intellij i made a break point but it can't arrive at this point, do you have somme trucs for this ?

@ijokarumawak

This comment has been minimized.

Show comment
Hide comment
@ijokarumawak

ijokarumawak Apr 3, 2017

Contributor

@qfdk Sometimes, I encounter issue while debugging, such as forgetting terminate old IntelliJ debugger or having other thread being stopped at different break point, then NiFi jetty thread can't respond. If NiFi UI keeps showing a little running circle and doesn't update, probably that is the case.
Otherwise, I've been able to debug it with IntelliJ fine.

Contributor

ijokarumawak commented Apr 3, 2017

@qfdk Sometimes, I encounter issue while debugging, such as forgetting terminate old IntelliJ debugger or having other thread being stopped at different break point, then NiFi jetty thread can't respond. If NiFi UI keeps showing a little running circle and doesn't update, probably that is the case.
Otherwise, I've been able to debug it with IntelliJ fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment