Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure this app is thread-safe #2

Closed
bernardopires opened this issue Jul 7, 2012 · 19 comments
Closed

Make sure this app is thread-safe #2

bernardopires opened this issue Jul 7, 2012 · 19 comments

Comments

@bernardopires
Copy link
Owner

This is being used right now in production on a small project and I have made an attempt to make it thread-safe, but I'm a complete beginner at this subject. Any help on this would be HIGHLY appreciated. Can someone please check if the custom postgresql_backend is thread-safe? If there is a way to write a test for this, it would be awesome.

@ghost ghost assigned bernardopires Jul 7, 2012
@kissgyorgy
Copy link
Contributor

Ok, so I don't know threads very deeply yet, but I asked this on #django-dev IRC channel:

19:00 < walkman> hi guys!
19:00 < walkman> I'm learning about threading in python and am interested if DatabaseWrappers are thread locals in Django?
19:01 < walkman> so if I set something in the DatabaseWrapper init, it will different for every new request?
19:03 < walkman> I want to set extra attributes on my custom backend (for postgresql schemas) and want to make sure if I set 'schema' in the DatabaseWrapper init, the same schema wont appear for somebody else :)

and got this response:

20:09 < akaariai> walkman: when you get a connection from django.db.connections each thread will get different connection. Technically you can then pass that connection to different thread, but nothing in Django does that (and 3rd party apps shouldn't ever do that either).
20:09 < akaariai> so, assigning schema to DatabaseWrapper should be safe.

(akaariai is one of the core developers) So I guess it will be safe after I send the patch for #93 !

@kissgyorgy
Copy link
Contributor

I have been thinking about his a bit:

  • Modifying request (attaching attributes to it) is obviously thread-safe, it's trivial 😄
  • Attaching attributes to the custom DatabaseWrapper is thread-safe, see comment above from core developer.
  • We don't get and pass around connections to another thread.
  • We don't use any global variable to store or modify any connection- or tenant data. Every tenant, schema data is stored on the DatabaseWrapper, which is fine.
  • sync_schemas and migrate_schemas operates and stores data on their own objects, and tinkering with connection in a thread safe manner, but I don't know the consequences of changing INSTALLED_APPS at runtime. This might be a problem!
    Imagine if two users want to create tenants at the same exact time! if settings are global within a Django process, they will rewrite each others settings!!! Do you know something about this @bernardopires ?

@bernardopires
Copy link
Owner Author

Hi Walkman,

yes, it is most likely problematic changing INSTALLED_APPS at runtime! The scenario where two users create a tenant at the same time may be somewhat unlikely, but there's a very likely scenario: a user is using your app the same time a tenant is being created. The behavior for the user using your app would be unpredictable (only TENANT_APPS will be available and he might have been using the public part of your app, which needs SHARED_APPS). This is why we avoided directly changing INSTALLED_APPS. Instead, what we do is set to an attribute (model._meta.managed = False) on the models to trick Django into thinking that this model doesn't have to be synced. This shouldn't have any side effects on people using the app at the same as a tenant is being created. I am however not entirely sure what happens if two persons try to create a tenant at the same time. Do you know if model._meta.managed is global? Doesn't look like it though...

Cheers

@kissgyorgy
Copy link
Contributor

Unfortunately, South doesn't use this (It just modifies INSTALLED_APPS), so if migrations are involved, this is very much a problem!

@bernardopires
Copy link
Owner Author

Well, I'm not sure if we can do anything about it...maybe some sort of global lock when a tenant is being created and south is used? This lock would only block new tenants from being created, users will still be allowed to navigate freely

@kissgyorgy
Copy link
Contributor

Yeah, I thought about the same! But to make it totally safe, it should block every new request to the Django process until that migration is ready... because a migration is modifying the global state of the whole Django process! Maybe redirect the request to a static template which tells something is going on, try again, but it would not be very nice.
If this is an unacceptable problem for somebody, a new tenant signup should only create the tenant model and do the DB syncing process manually.
Please reopen so others can see either.

@bernardopires bernardopires reopened this Nov 15, 2013
@kissgyorgy
Copy link
Contributor

I have an idea: attach a patched South version next to this app so it would work without problems. It should be easy because the app modification method already written in sync_schemas so it could be applied to South very easily. Keeping it up to date is not a problem at all, because that part of South probably is not changing very much, so it would be a small change with the necessary benefits! And in Django 1.7 south will be totally integrated, and South2 (the backport for Django 1.6) will probably solve this in a better way.

@kissgyorgy
Copy link
Contributor

I discussed this on the #django IRC channel and the only good solution for this is to put the syncing process into a worker!

19:49 < walkman> anybody knows answer to this? What are the consequences of changing settings.INSTALLED_APPS at runtime? is settings global in Django? so if one request change it, the change will be seen in other request?
19:49 <+apollo13> walkman: completelly unsupported
19:49 <+apollo13> walkman: as in don't do; if you have to ask here
19:49 < walkman> I know, but why South does it? :D
19:49 <+apollo13> south does?
19:49 < walkman> yes
19:50 < walkman> changing INSTALLED_APPS setting before syncdb
19:50 < walkman> and after does the migrations
19:50 < tbaxter> walkman: I've never messed with installed_apps, but I have modified other settings on the fly, and it was not a good idea.
19:50 <+apollo13> walkman: well how does that relate to other requests than :)
19:51 < walkman> I want to make runtime migrations
19:51 <+apollo13> no you don't
19:51 < walkman> yes, I want :) because we have a postgresql schema-aware application
19:51 < tbaxter> walkman: maybe tell us the problem you're trying to solve. Maybe there's a better way.
19:51 < walkman> and we can create new schemas on the fly
19:51 < walkman> and apply migrations on the fly
19:52 < walkman> when creating a new schema
19:52 <+apollo13> walkman: oO
19:52 <+apollo13> walkman: I don't even wanna know why
19:52 < walkman> multi-tenant application with postgresql schemas
19:52 <+apollo13> still doesn't explain it
19:52 < walkman> so every new client will have separate everything
19:53 < walkman> separate auth, separate users, etc
19:53 < walkman> so it has to call syncdb and migrations
19:53 < tbaxter> so what you're really after is mult-tenancy?
19:53 <+apollo13> walkman: so? use a worker which creates that stuff
19:53 <+apollo13> but not inside the web process/request
19:53 < walkman> does that make any difference?
19:53 <+apollo13> syncdb + migrations can easily take more than a second; and you just lost 50% of your customers
19:54 <+apollo13> walkman: well south just won't work sensibly inside a request
19:54 <+apollo13> it was never ment to work there
19:54 <+apollo13> and it will break

@kissgyorgy
Copy link
Contributor

I suggest removing the database syncing part (create_schema from TenantMixin) and document this behavior instead! As django pros says it can pretty fuck up things, and even in Django documentation clearly states "never change settings on the fly"

@bernardopires
Copy link
Owner Author

Hey Walkman,

thanks for doing all this research about the topic! Syncing the tenant on a separate worker seems like a very good idea.

I think we don't have to remove anything, but rather ensure that the relevant parts (editing the settings and syncing the tenant) happen in a new process. It'd be nice if we can somehow start and wait until this new process is done, so that behavior stays pretty much the same. That is, tenant creation still happens synchronously -- we could add an option so that it happens asynchronously.

It'd be nice if we could also guarantee that this new change doesn't break anyone's code -- I myself have used django-tenant-schemas on shared hosting and it worked perfectly. I'm not sure if this would work on such an environment.

Thanks again for everything, you have been a big help! 👍

@kissgyorgy
Copy link
Contributor

I looked into this even more and I almost sure even changing Model._meta.managed is not safe, because Django uses the Borg pattern for the appcache, which means every instance of the models will have the same state in the same process!

@bernardopires
Copy link
Owner Author

Nice catch! I guess we won't have to worry about that anymore when we start syncing tenants on a separate process. I apologize that I haven't been able to review your and the other patches, I'm a little short on time until the end of the year. I promise I'll catch up as soon as possible. Thanks for all the help!

@owais
Copy link
Contributor

owais commented May 7, 2014

@bernardopires @walkman Any progress on this? I'm available to help out if needed.

Also, I was thinking about another approach to setting schemas on connection using a connection pool. We could ship a connection pool that would map and pool connections to domain url or customer ID. So basically, all tenant sites will have their own (or set of) dedicated connections to talk to the DB and we won't be doing set/unset tenant dance before/after every request. This way there will NEVER be any risk of accidentally changing the schema on the connection. I don't know how much of an advantage this bring to the table but I guess it was worth a shout. S

@bernardopires
Copy link
Owner Author

Hey owais,

I'm not familiar with how connection pools work, but wouldn't this mean there'd need to be at least one open connection per tenant? How would this work with servers that have multiple processes and multiple threads?

I think the biggest issue right now is that tenants are being created (and therefore INSTALLED_APPS is being altered) on normal web processes / requests, which can result in unpredictable behavior if another request is made simultaneously while INSTALLED_APPS is altered. I haven't had the time to fix this, but the solution, as @walkman has posted here, is to process the tenant creation in a separate worker. It could even be done synchronously and this issue would be solved. This shouldn't be too complicated and I encourage you to take a stab at this if you feel like so! If you have any doubts please let me know. Thanks for your help!

@klynch
Copy link
Contributor

klynch commented Jun 4, 2014

Aside from the syncdb issues, are there any known issues related to using this with gunicorn + gevent? Has anyone attempted this yet?

I am trying to get a more precise understanding of how gevent operates, but from the surface it appears that the DatabaseWrapper is still unique to each thread, so there shouldn't be an inherent threading issue. Does anyone have any more insight regarding this?

@bernardopires
Copy link
Owner Author

I used it in production for a long time with apache and had no problems at all. There were some weird bugs like search_path persisting after the request is done, but that has been fixed (it gets sets to public at the beginning of the request). This issue is about calling syncdb (for example when a tenant is being created) live on production which could result on undefined behaviour if someone else is using the website simultaneously.

@cliffremote
Copy link

I'm trying to follow this thread and from what I read it appears that two Clients can not be created at the same time. If that is correct I don't know how this solution could be used on even a moderately busy site. Is the reason that a new Client creation alters the database to add a schema and that is not a safe operation?

@bernardopires
Copy link
Owner Author

Is a Client an user or a tenant? If its just an user than there is no issue at all, it's completely safe. As mentioned above, the current problem is that when creating a new tenant, INSTALLED_APPS is being altered inside a regular web request process, meaning that if any other thread inside the process is used for serving another user, the behavior is undefined. If you set your server to use multiprocessors but single threaded on each one of them, this should be a non-issue. A better solution would be to run the creation of a tenant on a separate process. I might work on this soon.

@cliffremote
Copy link

Thanks again! Yes, that would be ideal to create tenants on a separate process.

bernardopires added a commit that referenced this issue Nov 3, 2014
Use old style indexing in format strings #2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants