Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stack-overflow caused by BQ recursion #251

Merged
merged 5 commits into from
Sep 15, 2023
Merged

Conversation

Shahroz16
Copy link
Contributor

closes: #248

@Shahroz16 Shahroz16 self-assigned this Sep 8, 2023
@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Pull request title looks good 👍!

If this pull request gets merged, it will cause a new release of the software. Example: If this project's latest release version is 1.0.0. If this pull request gets merged in, the next release of this project will be 1.0.1. This pull request is not a breaking change.

All merged pull requests will eventually get deployed. But some types of pull requests will trigger a deployment (such as features and bug fixes) while some pull requests will wait to get deployed until a later time.

This project uses a special format for pull requests titles. Expand this section to learn more (expand by clicking the ᐅ symbol on the left side of this sentence)...

This project uses a special format for pull requests titles. Don't worry, it's easy!

This pull request title should be in this format:

<type>: short description of change being made

If your pull request introduces breaking changes to the code, use this format:

<type>!: short description of breaking change

where <type> is one of the following:

  • feat: - A feature is being added or modified by this pull request. Use this if you made any changes to any of the features of the project.

  • fix: - A bug is being fixed by this pull request. Use this if you made any fixes to bugs in the project.

  • docs: - This pull request is making documentation changes, only.

  • refactor: - A change was made that doesn't fix a bug or add a feature.

  • test: - Adds missing tests or fixes broken tests.

  • style: - Changes that do not effect the code (whitespace, linting, formatting, semi-colons, etc)

  • perf: - Changes improve performance of the code.

  • build: - Changes to the build system (maven, npm, gulp, etc)

  • ci: - Changes to the CI build system (Travis, GitHub Actions, Circle, etc)

  • chore: - Other changes to project that don't modify source code or test files.

  • revert: - Reverts a previous commit that was made.

Examples:

feat: edit profile photo
refactor!: remove deprecated v1 endpoints
build: update npm dependencies
style: run formatter 

Need more examples? Want to learn more about this format? Check out the official docs.

Note: If your pull request does multiple things such as adding a feature and makes changes to the CI server and fixes some bugs then you might want to consider splitting this pull request up into multiple smaller pull requests.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Sample app builds 📱

Below you will find the list of the latest versions of the sample apps. It's recommended to always download the latest builds of the sample apps to accurately test the pull request.


  • java_layout: shahroz/fix-bq-outofstack (1694540814)
  • kotlin_compose: shahroz/fix-bq-outofstack (1694540809)

@Shahroz16 Shahroz16 requested a review from a team September 8, 2023 10:59
logger.debug("queue task $taskStorageId run failed $error")

when (error as? CustomerIOError) {
is CustomerIOError.HttpRequestsPaused, is CustomerIOError.NoHttpRequestMade -> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added is CustomerIOError.NoHttpRequestMade because if there is no connection, there is no point trying out every request and using resources, might as well just wait till next time.

Copy link
Contributor

@levibostian levibostian Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚧 (blocker comment)

CustomerIOError.NoHttpRequestMade can be used for more then no network connection. I can see us introducing a bug in the future by accident because the way NoHttpRequestMade is named, it doesn't imply it has to only be used when no network connection is established.

I do agree that there is no point in the BQ running when no network connection exists. However, could you revert it from this PR and introduce it in another one? There, we can figure out if we need to add a new CustomerIOError.NoNetwork, for example.

Copy link
Contributor Author

@Shahroz16 Shahroz16 Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomerIOError.NoHttpRequestMade can be used for more then only no http requests being made.

Can you please mention what is it used for other than when a connection can't go through? because from the code it only looks when the network response returns with null i,e no other error code just plain null due to no connection.

This is confirmed by the test as well, when you do no connection mock, it retruns NoHttpRequestMade

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer is this change was removed and a ticket was made for this change. We may want to add this same behavior to the iOS SDK, we may want to rename data types, we may want to add more tests for this logic. There is enough work to do that I am not comfortable putting it into this PR.

This PR looks great besides this and is ready to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should hinder getting improvement out unless we have a strong reason not to do so, it's a very small change but with a big performance improvement. I am quite comfortable making this change because NoHttpRequestMade is only being set when the network responds back with null.

If you can kindly go through the code and express what makes you worry, please mention that I can respond to that concern but otherwise, renaming data types and iOS mirroring the same behavior shouldn't be the reason to block an improvement going out. The risk involved if this error shows up is it will skip an iteration unless the next time the connection goes through.

I can request the other squad members to check out the update as well if that makes you feel more confident.

Copy link
Contributor Author

@Shahroz16 Shahroz16 Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrehan27 since its android could you also go through this change when you get chance for more 👁️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other usage for NoHttpRequestMade at the moment. So the change looks good, no reason to block unless a we find a use case that fails and can be verified by tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my point as well, this is a serious performance concern and i am comfortable with the check and the code backs it. But i would still add more tests for it to improve confidence in later releases.

@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Merging #251 (80abdd7) into main (2d32664) will increase coverage by 0.03%.
Report is 2 commits behind head on main.
The diff coverage is 97.56%.

@@             Coverage Diff              @@
##               main     #251      +/-   ##
============================================
+ Coverage     50.80%   50.84%   +0.03%     
  Complexity      249      249              
============================================
  Files           108      108              
  Lines          2781     2779       -2     
  Branches        364      361       -3     
============================================
  Hits           1413     1413              
+ Misses         1250     1249       -1     
+ Partials        118      117       -1     
Files Changed Coverage Δ
...main/java/io/customer/sdk/queue/QueueRunRequest.kt 98.00% <97.56%> (+7.61%) ⬆️

... and 1 file with indirect coverage changes

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Build available to test
Version: shahroz-fix-bq-outofstack-SNAPSHOT
Repository: https://s01.oss.sonatype.org/content/repositories/snapshots/

Copy link
Contributor

@levibostian levibostian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, this looks like a good change, but I am not confident that it works yet.

GitHub annotations highlights a couple places where there is code that does not get executed by tests (meaning possible missing tests) and some of the code in the loop I think isn't correct.

This is a critical part of the code base, execution of the BQ. I strongly suggest that more tests are written for this run function to cover all the possible edge cases.

logger.debug("queue task $taskStorageId run failed $error")

when (error as? CustomerIOError) {
is CustomerIOError.HttpRequestsPaused, is CustomerIOError.NoHttpRequestMade -> {
Copy link
Contributor

@levibostian levibostian Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚧 (blocker comment)

CustomerIOError.NoHttpRequestMade can be used for more then no network connection. I can see us introducing a bug in the future by accident because the way NoHttpRequestMade is named, it doesn't imply it has to only be used when no network connection is established.

I do agree that there is no point in the BQ running when no network connection exists. However, could you revert it from this PR and introduce it in another one? There, we can figure out if we need to add a new CustomerIOError.NoNetwork, for example.

sdk/src/main/java/io/customer/sdk/queue/QueueRunRequest.kt Outdated Show resolved Hide resolved
@levibostian levibostian changed the title fix: stack-overflow memory issues fix: stack-overflow caused by BQ recursion Sep 8, 2023
@Shahroz16
Copy link
Contributor Author

Shahroz16 commented Sep 11, 2023

@levibostian Thanks for looking into it,

GitHub annotations highlights a couple places where there is code that does not get executed by tests (meaning possible missing tests)

The only highlight seems towards log statements it seems.

This is a critical part of the code base, execution of the BQ. I strongly suggest that more tests are written for this run function to cover all the possible edge cases.

Can you explain what specific tests are you looking for? I have added some more but asking because we didn't change any logic except move from recursion to iterative approach. So, what gave you confidence in the previous implementation with the current test suite, but now makes you feel less confident?

@levibostian
Copy link
Contributor

@Shahroz16 Great questions.

The latest test that you added for when a task cannot be found in storage was 1 missing test case. That's a great test case to add, thank you.

Github annotations is telling me that there is 1 more missing branch for this conditional statement. Adding a test for that scenario would be great.

You do have a good point in that this is a refactor so existing tests should be all that we need. In my original comment I could have explained better that even though this is a refactor, I don't want to ship this code until the test suite is improved upon. If there are missing tests, I wanted to make sure that we add tests to the existing suite to feel more confident in the code.

I recently found a missing test case in this exact same code block in the iOS SDK. When I added the missing test case, I actually found a bug. This makes me want to double check our test suite around this code in the Android SDK to make sure we trust it. I believe this PR is a good opportunity to do that.

I reviewed the current test suite and it looks good to me except:

  • this conditional statement missing a test
  • Non-blocker for this PR, QueueIntegrationTests - could we add a test to reproduce this stackoverflow error? Then make sure that this refactor fixes the issue?

@Shahroz16
Copy link
Contributor Author

@levibostian added more tests.

Regarding,

could we add a test to reproduce this stackoverflow error? Then make sure that this refactor fixes the issue?

I did try, but wasn't able to do it because it depends on multiple factors like heap size and limiting it isn't that straight forward with unit tests.

@levibostian
Copy link
Contributor

Tests are great! Thanks!

Oh, I just realized there is 1 more conversation to have resolved. I do consider this a blocker for merging this PR.

@Shahroz16
Copy link
Contributor Author

@levibostian just noticed my comment to it was never posted so did it again, it says 2 days ago for me that's why I originally added it. Just making sure, you can see it now.

Copy link
Contributor

@levibostian levibostian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve if the no network request logic is removed from this PR.

@Shahroz16 Shahroz16 merged commit 365a5b6 into main Sep 15, 2023
30 checks passed
@Shahroz16 Shahroz16 deleted the shahroz/fix-bq-outofstack branch September 15, 2023 12:43
github-actions bot pushed a commit that referenced this pull request Sep 15, 2023
### [3.6.6](3.6.5...3.6.6) (2023-09-15)

### Bug Fixes

* stack-overflow caused by BQ recursion ([#251](#251)) ([365a5b6](365a5b6))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash on 1 user but he seems blocked on this crash
3 participants