Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post to Groups API can crash IIS server #5259

Closed
1 task
SteveSwaringen opened this issue Jan 4, 2023 · 7 comments
Closed
1 task

Post to Groups API can crash IIS server #5259

SteveSwaringen opened this issue Jan 4, 2023 · 7 comments
Labels
Fixed in v16.3 Type: Discussion General discussion more than bug reports or feature requests. Type: Enhancement Feature requests.

Comments

@SteveSwaringen
Copy link
Contributor

SteveSwaringen commented Jan 4, 2023

Prerequisites

A Picture Is worth a Thousand Words

Description

Under certain circumstances (related to the complexity of the web of group type associations involved in the request), using the POST /api/Groups endpoint can crash the IIS server.

Steps to Reproduce

Running stock Rock 1.13.7 (hotfix-1.13.7)
Start with a virgin database (create empty database, let Rock initialize it when it spins up)
Log in as admin/admin
Let all the dust settle
Admin Tools -> Power Tools -> API Docs
Select the Groups API, POST /api/Groups endpoint (second one down from the top)
Create a group of type 31 ("Fundraising Opportunity") under parent group 41 (General Groups). TIME THIS EXECUTION from when you click "Try it out!" until the response comes back.
{"IsSystem":false,"ParentGroupId":41,"GroupTypeId":31,"Name":"Test 1","Description":"Initial Configuration"}
In my test environment (Rock running on a conventional laptop under Visual Studio, database hosted on a remote SQL server) this took about 2 seconds.

Note that the type of group we created here is not supposed to be valid as a child of a General type group (as the General Groups parent is). Ideally, the API would reject this request (the API is, after all, considered "inside the castle walls"). (I don't have a problem with this behavior. The API should only be used by developers familiar with the how Rock works. Pointing this out because it is important to the next step of the test scenario.)

Make Fundraising Opportunity a valid child group type of Weekly Service Check-in group type. (This group type is chosen for the test because it has a lot of allowed child group types. From testing in a sandbox derived from our production system (with a much more complex web of associated group types), the problem we're studying here compounds exponentially with the size of the web of associated groups that the type of group we're creating is tied into. The Weekly Service Check-in type has the largest (though still relatively small) web of types in the default structure.)
Clear Cache. It is not obvious why this is necessary or exactly how it interacts with the scenario, but if you don't clear cache the configuration won't yet be broken. Later, when you remove this group type association to repair things, if you don't clear cache, the configuration won't yet be fixed. Feel free to test this yourself. I expect that this is an important symptom to understand in isolating the root cause.
Create a group of type 31 ("Fundraising Opportunity") under parent group 41 (General Groups) (same as before). TIME THIS EXECUTION from when you click "Try it out!" until the response comes back.
{"IsSystem":false,"ParentGroupId":41,"GroupTypeId":31,"Name":"Test 3","Description":"Fundraising Opportunity now a valid child of Weekly Service Check-in groups"}
In my test environment (Rock running on a conventional laptop under Visual Studio, database hosted on a remote SQL server) this took 32 seconds, 16x longer than the original execution.

In our production environment, when we added Fundraising Opportunity as a valid child type of Event Registration (which has a large number of valid child types, as well as several valid parents (both seem to be important to the issue)), it took longer than the three minute timeout our third-party vendor (Focus Missions) was allowing for the API to complete. IIS Manager showed that the pool thread in fact never terminated. Each time they retried the request (every three minutes), another pool thread was consumed and never terminated. Within an hour, our IIS server cratered--becoming largely non-response well before that.

For completeness, reverse the process.
Remove Fundraising Opportunity as a valid child group type of the Weekly Service Check-in group type.
Clear cache.
Create a group of type 31 ("Fundraising Opportunity") under parent group 41 (General Groups) (same as before). TIME THIS EXECUTION from when you click "Try it out!" until the response comes back.
{"IsSystem":false,"ParentGroupId":41,"GroupTypeId":31,"Name":"Test 4","Description":"Fundraising Opportunity no longer a valid child of Weekly Service Check-in groups"}
In my test environment, it again takes only about 2 seconds to complete successfully.

After many hours stepping through executing code and logging SQL events, I suspect this is not a bug in the traditional sense. It feels more like a gross inefficiency resulting from a reliance on Entity Framework for complex data structures. Nevertheless, when we can do something that is considered normal and appropriate within the Rock schema (allowing a group of one type to be a child of a group of another type), if that can crash our server that's an inefficiency that needs to be addressed.

When the Rock API gets the request to create the group, it starts by exploring the requested group type. At this point, code execution is lost in untraceable .Net Entity Framework code. Tracing through the SQL queries shows that Rock is exploring the web of group type associations, but it does not appear to get lost in any kind of recursion in this process. One interesting symptom that I observed is that after a point (not clear on what triggers this), Rock starts asking first for the details on group type 0 (there is no group type 0) and then asking for a count of groups of type 0 (there are no groups of type 0). It will often repeat this pair of requests in sequences of four or eight pairs, interspersed with queries continuing to explore the web of associated group types until it finally finishes trolling the web. Then it will continue with the pair of group type 0 queries ad infinitim. Until it takes the server down.

I did find some code that looks like it would construct an entity framework query (the SQL is definitely being constructed by EF) to count groups of a particular type but a breakpoint set on this method never triggers. (To be clear, I don't regard these queries necessarily as a bug--I did observe they also happen in the case where the command succeeds--but their proliferation in the failure case may be a useful symptom.)

Expected behavior:

Group should be promptly created.

Actual behavior:

Takes a long time to create group, and in cases of complex group type structures may time out and even crash IIS server.

Versions

  • Rock Version: [1.13.7]
  • Client Culture Setting: [en-US]
@ethan-sparkdevnetwork
Copy link
Contributor

Hello @SteveSwaringen I am having a problem reproducing this issue in 13.7, 14.1, and the future 15.0 release. Given your instructions the first group insert takes 222 ms, subsequent ones averaged 30ms. I used System.Diagnostics.Stopwatch to get precise timing.

ApiController POST for Type Rock.Model.Group took 222ms
ApiController POST for Type Rock.Model.Group took 28ms
ApiController POST for Type Rock.Model.Group took 32ms
ApiController POST for Type Rock.Model.Group took 29ms
ApiController POST for Type Rock.Model.Group took 27ms
ApiController POST for Type Rock.Model.Group took 29ms
ApiController POST for Type Rock.Model.Group took 33ms

Can you check if "Enable Indexing" for that group type is checked? If it is enabled can you disable it and run the check again? Also do you have any plugins installed?

@SteveSwaringen
Copy link
Contributor Author

SteveSwaringen commented Jan 11, 2023 via email

@ethan-sparkdevnetwork
Copy link
Contributor

ethan-sparkdevnetwork commented Jan 31, 2023

Hello @SteveSwaringen. Sorry this fell off my radar. The Enable Indexing I referred to had to do with search indexing. This is configured on the Group Type. In the GroupTypeDetail block under the General Category there is a checkbox "Enable Indexing". In v15.0 I changed this to say "Enable Universal Search Indexing" to clarify its function.

This issue sounds like a search indexing problem we resolved in the upcoming v15.0 release. The issue we initially noticed was very high resource usage and lingering threads when running the Data Automation job. We found the logic that starts search indexing for an entity was not efficient and could reindex the same entity many times. I refactored that logic in commit cb06cdd. This affected all models that implement IRockIndexable, but I only noticed problems with Group and GroupMember.

This behavior seems to line up with what you described in this issue if the GroupTypes you are using have that option enabled.

@SteveSwaringen
Copy link
Contributor Author

SteveSwaringen commented Jan 31, 2023 via email

@ethan-sparkdevnetwork
Copy link
Contributor

@SteveSwaringen No, I was suggesting the opposite, if you are running search indexing for the GroupType(s) turning it off might mitigate the problem.

I spent a lot of time yesterday trying to reproduce this and running SQL Profiler to see what could be happening. I was not able to get the problem, even with a very large number of Group Types, Groups, and a lot of Associations.

Since you can't justify spending more time on this and have a workaround, I'll table this issue for now. There are a lot of performance improvements coming in v15.0 those might help with what you are seeing with adding a Group.

@sparkdevnetwork-service sparkdevnetwork-service added Type: Enhancement Feature requests. Type: Discussion General discussion more than bug reports or feature requests. labels Apr 27, 2023
@jonedmiston
Copy link
Member

Closing as there seems to be a work around and we're not able to reproduce.

@jasonhendee
Copy link
Contributor

We finally got to the bottom of this. It has do do with some automatic model "validation" that ASP.NET does behind the scenes against the Group object that's passed to the API. This validation can ultimately lead to an infinite loop when it gets to the Group's Group Type.

Thanks for reporting, @SteveSwaringen. Please let us know if you see this issue occur in your environment after applying this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fixed in v16.3 Type: Discussion General discussion more than bug reports or feature requests. Type: Enhancement Feature requests.
Projects
None yet
Development

No branches or pull requests

5 participants