Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework #2608 with #2661 for testing on physical systems #2686

Closed

Conversation

abraunegg
Copy link
Owner

  • Rework Refactor OneDriveApi implementation #2608 with OneDrive Client Version v2.5.0-rc1 #2661 for testing on physical systems
  • Specifically testing these issues from RC1:
    • Re-connect to WiFi and the client will crash with the below error messages for each thread
    • Multiple functions handle errors incorrectly that do not return after retrying the function. As a result, the validation check fails when returning from the recovering retry in the original failed case, even though the operation is actually successful. (Bug: Incorrect error handling #2601)
  • General usability against all other 100+ test use cases

* Initial add of PR2608
* Align to original  PR2608 work
* Update with working state
@abraunegg
Copy link
Owner Author

abraunegg commented Mar 30, 2024

Test Case 1: Disable WiFi whilst 'Processing API Response Bundle: ...' message occurs

  • Start application (either -s or -m .. does not matter)
  • When 'Processing API Response Bundle: ...' message occurs, disable WiFi
  • Wait for API timeout messages to occur & retry events
  • Reconnect WiFi

Expected Outcome:

  • Application continues on with fetching API bundle responses and continues application processing

Result:

  • Curl debugging is being printed, this needs to be negated and only printed in debug mode or --debug-https mode
  • When WiFi is re-connected, the next 'bundle' is fetched , however the application appears to 'hang' .. given that Internet access has been restored, the client should understand this and continue without issue. The client still thinks that there is a connectivity problem.

Screenshot from 2024-03-31 09-34-07

Screenshot from 2024-03-31 09-31-53

Post ef4e990

Screenshot from 2024-04-01 05-29-50

Screenshot from 2024-04-01 05-31-53

@abraunegg
Copy link
Owner Author

abraunegg commented Mar 30, 2024

Test Case 2: Flood API with requests to trigger a 429 event

  • Start application (either -s or -m .. does not matter)
  • Flood API with requests to trigger event
  • Determine the application handling

Expected Outcome:

  • Only that specific thread that caused / triggered the 429 should be held / paused

Result:

  • It appears that all threads are being paused, not the thread that triggered the 429 event

Screenshot from 2024-03-31 09-42-03

Screenshot from 2024-03-31 09-47-54

Need to compare this with RC1 and how this is handled

* Correct output that should be just debug only
@abraunegg
Copy link
Owner Author

Test Case 3: Disable WiFi whilst 'Downloading file: ' is occurring

  • Start application (either -s or -m .. does not matter)
  • When 'Downloading file: ' message occurs, disable WiFi
  • Wait for API timeout messages to occur & retry events
  • Reconnect WiFi

Expected Outcome

  • Each thread should detail that access has been interrupted
  • On re-connect, each thread should download the file and application should continue

Result:

  • System acts as expected

@abraunegg
Copy link
Owner Author

abraunegg commented Mar 30, 2024

Test Case 4: Use ^C to cancel application operations whilst 'Waiting for any existing threads to complete'

  • Start application (either -s or -m .. does not matter)
  • When 'Downloading file: ' message occurs, disable WiFi
  • Wait for API timeout messages to occur & retry events
  • Reconnect WiFi
  • Use ^C to cancel multiple times

Expected Outcome:

  • Application exits in the correct manner

Result:

  • Application segfaults

Screenshot from 2024-03-31 10-57-19

Fixed

* Change 'defaultDataTimeout' from 240 to 60 seconds
@abraunegg
Copy link
Owner Author

abraunegg commented Apr 3, 2024

Test Case 5: Deliberately edit 'refresh_token' to make it malformed

  • Edit 'refresh_token' and make it malformed by adding just a single new character into the string anywhere

Expected Outcome:

  • Application should handle malformed item and advise to perform a --reauth

Result:

PR2608
Screenshot from 2024-04-03 18-35-32

PR2661
Screenshot from 2024-04-03 18-41-32

Application crash due to malformed 'refresh_token' fixed via 9fa7494

* Update PR based on testing various scenarios
@abraunegg
Copy link
Owner Author

abraunegg commented Apr 5, 2024

Test Case 6: Run for extended period, deleting | creating new local data randomly in --monitor mode

  • Run in --monitor mode
  • Run script to generate random data at random intervals

Expected Outcome:

  • Memory should remain fairly stable for the process

Result:

  • Memory is leaking somewhere, fairly badly ... will need to run another 'valgrind' process against this PR to determine where this is leaking from in comparison to RC1
  • Memory usage with RC1 is much better (lower, no noticeable|significant memory leakage due to prior vlagrind work)

Fixed

Fixed via:

@abraunegg
Copy link
Owner Author

abraunegg commented Apr 5, 2024

Test Case 7: Run for extended period, deleting | creating new local data randomly in --monitor mode

  • Run in --monitor mode
  • Run script to generate random data at random intervals

Expected Outcome:

  • Process should not hang / crash and continue operating during the extended run

Result:

  • Client stopped processing inotify events / appeared hung after a few hours of operation. Not related to memory leak, plenty of memory (physical and swap left on system). A single CTRL^C and the client exited to the terminal, meaning that something internally crashed, without an error message or anything.
  • Need to validate and verify with RC1 as to what occurs there as this was not seen in RC1 extended (>24hrs) constant operational running

General Flags: Flags that are always needed, such as warnings (-w) and directory inclusion (-J.), are added outside of the conditionals.
Debug Flags:
For DMD: When debugging is enabled (DEBUG=yes), the -debug flag for including debug code and -gs for generating standalone debug symbols are added.
For LDC or other compilers: -d-debug for debugging and -gc for generating debugging information are added similarly when debugging is enabled.
Optimization Flag:
The -O flag is only added when debugging is not enabled. This ensures that the program is compiled with optimizations only when it is not in debug mode.
* Merge latest RC1 into this PR
* Refine code changes from #2608 based on memory allocation and usage
* Update code to align to v2.4.25 exception.httpStatusCode handling
* Correct ^C handling
* Correct ^C exit value
Add specific example around using 'sync_list' and only including a child of a parent in the 'sync_dir'
* fix up spacing
* Rework entire #2608 due to errors, memory leaks and handling
* Fix 429 'retry-after' header searching to use actual value, not 120 all the time
* General cleanup of code from #2608
* Make sure we return any used memory back to the OS on exit
* Remove thread_joinAll()
* Remove temp logging output
* Update PR
* Fix resetSyncFailures()
* Fix functions to display memory usage pre|post garbage collection
* Add function to query Resident Set Size (RSS)
* OneDrive API Instance Cleanup - Shutdown API, free curl object and memory
* OneDrive API Instance Cleanup - Shutdown API, free curl object and memory
* Update when an API instance is released and when it is destroyed
* Add logging to determine curl pool usage
@abraunegg
Copy link
Owner Author

@bpozdena

If possible - please can you test this PR in your environment in the next few days ?

This PR will become the basis for v2.5.0-rc2

Specifically, looking for you to test and evaluate:

  • socket usage / reuse
  • curl engine usage / reuse
  • anything else socket / curl / dns related which was causing you exhaustion in your environment
  • looking for also usage | performance difference between onedrive v2.5.0-rc1-36-g0f012b9 and this PR that you might determine or uncover. For me, with all my soak testing (>24 hrs) running + usage I am not seeing a performance detriment between the two, but would like your feedback.

Notes

  • When using --monitor at the end of each 'loop' currently all curl engines are cleared to ensure a clean state for the next loop.
  • I would also add display_memory = "true" to your config file so that you can see what memory is being used
  • There is currently extra logging output going on regarding pool size etc

* Update so that the available processPool worker threads equals the number of threads set in the application configuration
* Reinstate variable cleanup post testing
* Update ^C Handling during upload|download operations
abraunegg added a commit that referenced this pull request Apr 28, 2024
* Release files for 2.5.0-rc2
* Code changes from 2.5.0-rc1 --> 2.5.0-rc2 (#2686)
* Update docs regarding Ubuntu 24.04
@abraunegg
Copy link
Owner Author

Closed due to #2709

@abraunegg abraunegg closed this Apr 28, 2024
@abraunegg
Copy link
Owner Author

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Repository owner locked and limited conversation to collaborators May 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant