New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channels #6419

Closed
wants to merge 32 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@andrewgodwin
Copy link
Member

andrewgodwin commented Apr 6, 2016

Adds Channels into Django as django.channels, including tests and documentation ported from the standalone channels repository.

@andrewgodwin andrewgodwin force-pushed the andrewgodwin:channels branch from 02914a0 to d3fdf29 Apr 6, 2016

while True:
try:
with transaction.atomic():
message = self.channel_model.objects.filter(channel__in=channels).order_by("id").first()

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

Maybe a select_for_update() should be added here? As is, multiple consumers are likely to fetch the same message. I'm not sure what guarantees that the same message isn't fetched multiple times.

We could make the db channel implementation better (skip locked in PostgreSQL for example), but it is probably enough to just make it slow but correct.

This comment has been minimized.

@andrewgodwin

andrewgodwin Apr 6, 2016

Member

My impression was that transaction.atomic() would be enough to provide the isolation here (without it, I was definitely seeing multiple-fetch, and with it, that's never happened).

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

Nope, that isn't enough. There is nothing blocking selecting the same message multiple times. On PostgreSQL you can test with begin; select id from test_table order by id limit 1; delete from test_table where id = <fetched_id>; in multiple psql sessions. When the first session commits, the second session unblocks and deletes nothing.

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

Hmmh, if you are using REPEATABLE READ or SERIALIZABLE isolation, then the second session is likely going to fail instead of returning nothing. I believe select_for_update is still the right thing to do.

This comment has been minimized.

@andrewgodwin

andrewgodwin Apr 6, 2016

Member

Agreed; added it.

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

I tested this, and on PostgreSQL if using REPEATABLE READ with select_for_update() you'll get serialization error on the select instead of delete (which is slightly better). If using READ COMMITTED, then you'll get the next row in the second session which is actually the wanted behavior.

It is possible (though unlikely) that even MySQL behaves sanely with select_for_update().

with transaction.atomic():
message = self.channel_model.objects.filter(channel__in=channels).order_by("id").first()
if message:
self.channel_model.objects.filter(pk=message.pk).delete()

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

Here we should assert that exactly one message was deleted.

models.py as we don't want to make it for most installs.
"""
# Make the model class
class Message(models.Model):

This comment has been minimized.

@akaariai

akaariai Apr 6, 2016

Member

Could we have a separate app for this model? If you want to use the DB channel, include that app in your settings.

Alternatively we could define this in models.py with:

if db_channel_in_use(settings):
    class Message(models.Model):
        ...

This comment has been minimized.

@andrewgodwin

andrewgodwin Apr 6, 2016

Member

Honestly, maybe - initially I wanted to make this as easy to use as possible, but at this point the database backend is almost shipped as a fallback (I was even tempted to remove it entirely as it's so slow, but I figure we still have the sqlite3 backend for databases for ease-of-use reasons)

Not sure how we could dynamically surface it in models.py based on settings, as you can technically instantiate the layers without having them done in a setting (e.g. during tests).

@akaariai

This comment has been minimized.

Copy link
Member

akaariai commented Apr 6, 2016

I quickly checked the db layer, and I think it is good enough for development after a couple of changes.

I've been wondering if it would be possible to have a bypass of the channel layer for huge responses. Using the channels when the reader of the message is fixed seems like wasted resources. Some ideas are:

  • Directly connect to Daphne and stream the response over HTTP.
  • Send back a message stating "you can read the response from this URL".
    This is premature optimization without benchmark results... Any ETA for those?
@andrewgodwin

This comment has been minimized.

Copy link
Member

andrewgodwin commented Apr 6, 2016

How would you bypass, exactly? As the Django worker, you have no idea where the protocol server is on the network (and if your opsec is done right, you can't talk to it unless you explicitly open up the ports), and the two programs don't share a filesystem or local shared memory necessarily.

In my limited testing, the in-memory channel actually outperformed WSGI (mostly due to larger chunking and because it interleaved the processes better). I haven't done any benchmarks of the database layer, as it's not intended for any serious use, and the redis layer I've only done latency benchmarks of (where it tops out at around 150 req/sec with just one Daphne and one Django process)

if django.VERSION[1] > 9:
django.setup(set_prefix=False)
else:
django.setup()

This comment has been minimized.

@charettes

charettes Apr 6, 2016

Member

You can remove this version check.

class Meta:
apps = Apps()
app_label = "channels"
db_table = "django_channel_groups"

This comment has been minimized.

@charettes

charettes Apr 6, 2016

Member

'django_channels_group'?

This comment has been minimized.

@andrewgodwin

andrewgodwin Apr 6, 2016

Member

Renamed, and the channel model too for consistency

@akaariai

This comment has been minimized.

Copy link
Member

akaariai commented Apr 6, 2016

Yes, if the protocol server can't talk to the worker server directly, then there can be no bypass.

The interesting case is how will redis perform with large responses.

@andrewgodwin

This comment has been minimized.

Copy link
Member

andrewgodwin commented Apr 6, 2016

Yes, this approach will never be as good as WSGI for large responses, but then neither of them are as good as raw C code either. My initial tests showed throughput to be pretty good, but I haven't get tried shoving a few gigabytes down several responses at once and seeing what happens (I suspect Redis might run out of memory and just dump messages depending on how it's configured)

@akaariai

This comment has been minimized.

Copy link
Member

akaariai commented Apr 6, 2016

It would be really nice to have some docs about what to do when you need large responses.

It might be that a message containing "read response from this URL" approach might actually work. For instance, you could stream large static files that way if the protocol server has access to the static files. Or, you could have a proxy server in between the worker and the protocol server dedicated to streaming data between the servers.

@andrewgodwin

This comment has been minimized.

Copy link
Member

andrewgodwin commented Apr 6, 2016

It's certainly possible you could have nginx do the thing where you send a header in the response and it uses that as a path to the file to actually stream.

I don't see a lot changing around our general advice though - don't stream huge files from Python, do it from a proxy server in front of gunicorn/uwsgi/daphne/etc. Small static files from things like Whitenoise are fine, we just don't want (and never have wanted) people reading through several hundred megabytes of video when other software/CDNs are much better at that.

"""
# If the routing was a string, import it
if isinstance(routing, six.string_types):
module_name, variable_name = routing.rsplit(".", 1)

This comment has been minimized.

@berkerpeksag
return None, None

def new_channel(self, pattern):
assert isinstance(pattern, six.text_type)

This comment has been minimized.

@berkerpeksag

berkerpeksag Apr 7, 2016

Contributor

If these asserts are important in runtime, I guess replacing them with TypeErrors and ValueErrors would be better.

This comment has been minimized.

@andrewgodwin

andrewgodwin Apr 7, 2016

Member

They're not important, they're more just last-chance checks. I still code with the view that asserts are something optional people can turn of with the -O flag, even though nobody actually uses that...


def send(self, content):
if not isinstance(content, dict):
raise ValueError("You can only send dicts as content on channels.")

This comment has been minimized.

@berkerpeksag

berkerpeksag Apr 7, 2016

Contributor

TypeError would be better since line 69 checks the type of content.

This comment has been minimized.

@andrewgodwin

@andrewgodwin andrewgodwin force-pushed the andrewgodwin:channels branch from b3f2e54 to f76e0fd Apr 27, 2016

@@ -47,6 +47,10 @@
entry_points={'console_scripts': [
'django-admin = django.core.management:execute_from_command_line',
]},
install_requires=[
'asgiref>=0.10',
'daphne>=0.11',

This comment has been minimized.

@mlavin

mlavin May 4, 2016

Contributor

Both of these packages have little to no test coverage. It concerns me that they are going to be core dependencies to Django.

This comment has been minimized.

@andrewgodwin

andrewgodwin May 4, 2016

Member

asgiref has extensive test coverage on the parts we are using (in fact, one of its main jobs is to provide the test confirmance suite). Daphne is less so - see my post to django-developers.

This comment has been minimized.

@mlavin

mlavin May 4, 2016

Contributor

Yes this discussion is better held on django-developers.

Deploying
=========

Deploying applications using Channels requires a few more steps than a normal

This comment has been minimized.

@timgraham

timgraham May 4, 2016

Member

Do you want to treat "Channels" as a proper noun or not? The capitalization seems inconsistent throughout the docs but maybe I missed some distinction. I think I'd say no given other precedent like "model" and "expression". Happy to make these changes while reviewing the docs but wanted to ask first.

This comment has been minimized.

@andrewgodwin

andrewgodwin May 4, 2016

Member

I think we should probably lower-case it; I'll modify the auto-patcher that translates this all over from Channels source code to make the change.

Getting Started with Channels
=============================

Now, let's get to writing some consumers. If you've not read it already,

This comment has been minimized.

@timgraham

timgraham May 4, 2016

Member

I think this transition/intro could be better. Maybe we could call the doc something like: "Advanced tutorial: Intro to channels" and then give a better overview about the goals of this tutorial.

@@ -0,0 +1,7 @@

This comment has been minimized.

@timgraham

timgraham May 4, 2016

Member

This file seems unused.

@@ -10,3 +10,5 @@ pytz > dev
selenium
sqlparse
tblib
asgiref
daphne

This comment has been minimized.

@timgraham

timgraham May 4, 2016

Member

If these are going in setup.cfg, then I guess I should modify the Jenkins build script to pip install -e path/to/checkout so we don't need to duplicate them here. Agreed?

This comment has been minimized.

@andrewgodwin

andrewgodwin May 4, 2016

Member

If you mean setup.py, then yes, let's not duplicate them. I had to get this in there to get the PR test runner working.


Default: Value of :setting:`SESSION_ENGINE`

Allows you to specify an alternative session backend to use for per-channel

This comment has been minimized.

@mlavin

mlavin May 4, 2016

Contributor

This should note that django.contrib.sessions.backends.signed_cookies cannot be used. Related that if SESSION_ENGINE is using django.contrib.sessions.backends.signed_cookies then this must be set to use a different engine. Not sure if this is the right place for that note.

This comment has been minimized.

@andrewgodwin

andrewgodwin added some commits May 4, 2016

@@ -69,7 +82,9 @@ def handle(self, *args, **options):
if not settings.DEBUG and not settings.ALLOWED_HOSTS:
raise CommandError('You must set settings.ALLOWED_HOSTS if DEBUG is False.')

self.use_ipv6 = options['use_ipv6']
self.verbosity = options.get("verbosity", 1)

This comment has been minimized.

@timgraham

timgraham May 10, 2016

Member

Please cleanup options usage as done in 1845bc1 throughout the new management commands and in the edits.


class ChannelLayerManager(object):
"""
Takes a settings dictionary of backends and initialises them on request.

This comment has been minimized.

@timgraham

timgraham May 10, 2016

Member

For new code, we're using PEP257 verb style "Take....",
initialises -> initializes

"""
Transfers user from HTTP session to channel session.
"""
if auth.BACKEND_SESSION_KEY in from_session and \

This comment has been minimized.

@timgraham

timgraham May 10, 2016

Member

we prefer using parentheses to avoid backslashes

from django.channels import DEFAULT_CHANNEL_LAYER, channel_layers
from django.utils import six


This comment has been minimized.

@timgraham

timgraham May 10, 2016

Member

You might want @python_2_unicode_compatible on some of these classes that have __str__().

return routing

@classmethod
def normalise_re_arg(cls, value):

This comment has been minimized.

@timgraham

timgraham May 10, 2016

Member

ise -> ize ;-)

@andrewgodwin

This comment has been minimized.

Copy link
Member

andrewgodwin commented May 10, 2016

As per my post to django-developers (https://groups.google.com/forum/#!topic/django-developers/QRd4v7OErT8), this patch is withdrawn for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment