Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36771 - introduce sqlite timeout tuning #25

Merged
merged 1 commit into from
Sep 26, 2023

Conversation

ianballou
Copy link
Member

Adds sqlite DB timeout tuning and bumps the default up to 30000 milliseconds. The sequel library's default is 5000 milliseconds.

The goal is to stop SQLite3::BusyException errors from happening so easily.

30000 is somewhat arbitrary -- 30 seconds should be long enough to serve massive sqlite transactions. It should also be a short enough period of time to catch any actual lockups that should report back as an error.

To test:

  1. In Katello, sync 100 container repositories:
#!/bin/bash

for i in {1..100}
do
  hammer repository create --name "container-$i" --content-type docker --url https://quay.io --docker-upstream-name prometheus/busybox --download-policy on_demand --product-id 311
done
  1. Sync those repositories to a smart proxy.

  2. Open up two Foreman consoles and try overloading the database by constantly uploading new container repo information to the smart proxy:

while 1 == 1 do
  ::SmartProxy.find(2).sync_container_gateway
end                
  1. Try turning sqlite_timeout down to 1 to reproduce the error.

  2. Try turning the sqlite_timeout up to 30000 to see the error go away.

@theforeman-bot
Copy link

Issues: #36771

@ianballou
Copy link
Member Author

Related installer PR: theforeman/puppet-foreman_proxy#813

@ehelms
Copy link
Member

ehelms commented Sep 22, 2023

If you don't mind, remind me:

  1. Why container gateway needs a database?
  2. What the root cause of the timeouts is?

Are users hitting resource constraints? Are users exceeding the performance that SQLite can supply?

@ianballou
Copy link
Member Author

Why container gateway needs a database?

The container gateway caches auth information. It knows which users are logged in and the container repositories that they have access to. It also knows which container repositories have no auth at all.

What the root cause of the timeouts is?

If the container gateway is getting frequent requests when a user has many many container repositories, it seems there can be times when SQLite actions conflict and the incoming request has to wait longer than the default 5 seconds.

One example of a conflicting request might be if multiple users are logging in at the same time. When a user logs in, we wipe out their allowed repo list (if any) and replace it. So, SQLite might throw a busy exception when userB needs to have the DB cleared and updated when userA started first.

These operations should be very fast since it's not a lot of data, but I suppose it could be an issue if many users are logging in at once (especially for an automation scenario).

I was thinking if we needed some sorta locking mechanism, but it felt like the SQLite busy mechanism took care of that for me.

@ehelms
Copy link
Member

ehelms commented Sep 25, 2023

Since we are talking about caching, was Redis ever considered originally? Does the requirement for a locking mechanism push the requirement for a SQL database? If so, did we consider re-using the existing Postgresql that is always present in our setups?

These questions are less of a you must re-do everything to solve this problem, as this short term fix is just fine. These questions are more of a should we consider any architectural changes in lieu of this.

@ianballou
Copy link
Member Author

Since we are talking about caching, was Redis ever considered originally? Does the requirement for a locking mechanism push the requirement for a SQL database? If so, did we consider re-using the existing Postgresql that is always present in our setups?

Redis wasn't considered as far as I can remember. I think the first thing that was considered was just writing to a file, but the need to handle concurrent requests made that seem like a poor choice.

I remember voicing an opinion to use Postgres back when we designed this, but there was pushback on having the smart proxy rely on a big complex database rather than some small file (sqlite DB) that, if deleted, could be pretty easily regenerated. Postgres is guaranteed to be there since Pulp is a requirement.

@ehelms
Copy link
Member

ehelms commented Sep 26, 2023

Thanks for the background, looks good to me for you to merge and release.

@ianballou ianballou merged commit adc6195 into Katello:main Sep 26, 2023
2 checks passed
@ianballou ianballou deleted the 36771-sqlite-timeout branch September 26, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants