Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fixed #21179 -- Added a small section in the "Outputting CSV with Django... #2358

Closed
wants to merge 8 commits into from

2 participants

@zedr

This is my attempt to fix #21179.

The example shows how generators can be used with the csv.writer class stream large CSV files.

I have added a few comments on this example on the ticket's page: https://code.djangoproject.com/ticket/21179

zedr added some commits
@zedr zedr Fixed #21179 -- Added a small section in the "Outputting CSV with Dja…
…ngo" page that suggests using the StreamingHttpResponse class

The example shows how generators can be used with the csv.writer class stream large CSV files.
9473565
@zedr zedr Merge branch 'master' of git://github.com/django/django
3d26680
docs/howto/outputting-csv.txt
((11 lines not shown))
+
+ import csv
+
+ from django.http import StreamingHttpResponse
+
+ class Echo(object):
+ """An object that implements just the write method of the file-like
+ interface.
+ """
+ def write(self, value):
+ """Write the value by returning it, instead of storing in a buffer."""
+ return value
+
+ def some_streaming_csv_view(request):
+ """A view that streams a large CSV file."""
+ rows = (["Row {0}".format(idx), str(idx)] for idx in xrange(100))
@bmispelon Collaborator

xrange is Python2 only. I believe our documentation has started to transition to Python3 by default so this should be changed. You could also use django.utils.six which provides a compatibility layer.

@bmispelon Collaborator

100 items is not very impressive. How about a billion instead?

@zedr
zedr added a note

I changed this to 65536, which is the maximum number of rows for many popular spreadsheet programs on 32 bit systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
docs/howto/outputting-csv.txt
@@ -54,6 +54,36 @@ mention:
about escaping strings with quotes or commas in them. Just pass
``writerow()`` your raw strings, and it'll do the right thing.
+Streaming large files
+~~~~~~~~~~~~~~~~~~~~~
+If you need to work with very large files, you might want to consider using Django's
@bmispelon Collaborator

When dealing with large static files, you should actually not be using Django in the first place.

I'd reword it to something like "When working with views that can generate big responses, ..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
docs/howto/outputting-csv.txt
@@ -54,6 +54,36 @@ mention:
about escaping strings with quotes or commas in them. Just pass
``writerow()`` your raw strings, and it'll do the right thing.
+Streaming large files
+~~~~~~~~~~~~~~~~~~~~~
+If you need to work with very large files, you might want to consider using Django's
+:class:`django.http.StreamingHttpResponse` objects instead.
+
+In this example, we want to make full use of Python generators to efficiently
+handle the assembly and transmission of a large CSV files::
@bmispelon Collaborator

It should either be "a large CSV file" or "large CSV files"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@zedr zedr Updated and improved fix for #21179
I've updated the text following several suggestions by bmispelon, and also made the code Python 3 compatible.
642d9b3
@zedr

I've updated the pull request.

docs/howto/outputting-csv.txt
@@ -54,6 +54,37 @@ mention:
about escaping strings with quotes or commas in them. Just pass
``writerow()`` your raw strings, and it'll do the right thing.
+Streaming large files
+~~~~~~~~~~~~~~~~~~~~~
+When dealing with views that generate very big responses, you might want to consider using Django's
+:class:`django.http.StreamingHttpResponse` objects instead.
+
+In this example, we want to make full use of Python generators to efficiently
+handle the assembly and transmission of a large CSV file::
+
+ import csv
+
+ from django.utils.six.moves import xrange
@zedr
zedr added a note

Python 3 compat

@bmispelon Collaborator

Since Python3 is the default, using from django.utils.six.moves import range would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
docs/howto/outputting-csv.txt
((12 lines not shown))
+ import csv
+
+ from django.utils.six.moves import xrange
+ from django.http import StreamingHttpResponse
+
+ class Echo(object):
+ """An object that implements just the write method of the file-like
+ interface.
+ """
+ def write(self, value):
+ """Write the value by returning it, instead of storing in a buffer."""
+ return value
+
+ def some_streaming_csv_view(request):
+ """A view that streams a large CSV file."""
+ rows = (["Row {0}".format(idx), str(idx)] for idx in xrange(65536))
@zedr
zedr added a note

65536 is the maximum number of rows allowed for a sheet by most 32 bit spreadsheet applications.

@bmispelon Collaborator

A comment as to why this number was chosen would be good to have.

@zedr
zedr added a note

Good point. I'll add it. Thanks for suggesting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
zedr added some commits
@zedr zedr Update outputting-csv.txt
Further improved the fix for #21179, by adding a comment that explains why the number 65536 (the highest number that can be represented by a 16 bit unsigned integer) was chosen for the example.
cd5c020
@zedr zedr Update outputting-csv.txt
Improved the fix for #21179, by switching from xrange() to range(), and restating the import as a Python 2 compatibility import.
88a9588
@zedr zedr Fixed #22085 - Add a feature for setting non-expiring keys as the def…
…ault.

This feature allows the default `TIMEOUT` Cache argument to be set to `None`,
so that Cache instances can set a non-expiring key as the default,
instead of using the default value of 5 minutes.

Previously, this was possible only by passing `None` as an argument to
the set() method of objects of type `BaseCache` (and subtypes).
9a00e4a
@zedr zedr Removed the import of `DEFAULT_CACHE_ALIAS` that redefines a previous…
… import

The GetCacheTests test case import the constant `DEFAULT_CACHE_ALIAS`
inside one of its test methods, redefining a previous import. Removing
this second import does not break the test.
f4432b9
@zedr zedr Merge branch 'master' of https://github.com/zedr/django into t22117
c77f3eb
@bmispelon
Collaborator

There's some commits in there that belonged to another pull request of yours (#2365).

The easiest way to fix this would probably be to close this pull request and open a new one based off a clean branch (see https://docs.djangoproject.com/en/1.6/internals/contributing/writing-code/working-with-git/#working-on-a-ticket if you need some pointers) .

@zedr

Sorry about that. I'll cherry pick the right commits and re-submit a pull request

@bmispelon
Collaborator

No worries. Now you know first-hand why we always recommend to work on a branch :)

@bmispelon bmispelon closed this
@zedr

Resubmitted as a new pull request: #2397

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 22, 2014
  1. @zedr

    Fixed #21179 -- Added a small section in the "Outputting CSV with Dja…

    zedr authored
    …ngo" page that suggests using the StreamingHttpResponse class
    
    The example shows how generators can be used with the csv.writer class stream large CSV files.
  2. @zedr
  3. @zedr

    Updated and improved fix for #21179

    zedr authored
    I've updated the text following several suggestions by bmispelon, and also made the code Python 3 compatible.
Commits on Feb 23, 2014
  1. @zedr

    Update outputting-csv.txt

    zedr authored
    Further improved the fix for #21179, by adding a comment that explains why the number 65536 (the highest number that can be represented by a 16 bit unsigned integer) was chosen for the example.
  2. @zedr

    Update outputting-csv.txt

    zedr authored
    Improved the fix for #21179, by switching from xrange() to range(), and restating the import as a Python 2 compatibility import.
  3. @zedr

    Fixed #22085 - Add a feature for setting non-expiring keys as the def…

    zedr authored
    …ault.
    
    This feature allows the default `TIMEOUT` Cache argument to be set to `None`,
    so that Cache instances can set a non-expiring key as the default,
    instead of using the default value of 5 minutes.
    
    Previously, this was possible only by passing `None` as an argument to
    the set() method of objects of type `BaseCache` (and subtypes).
  4. @zedr

    Removed the import of `DEFAULT_CACHE_ALIAS` that redefines a previous…

    zedr authored
    … import
    
    The GetCacheTests test case import the constant `DEFAULT_CACHE_ALIAS`
    inside one of its test methods, redefining a previous import. Removing
    this second import does not break the test.
  5. @zedr
This page is out of date. Refresh to see the latest.
View
9 django/core/cache/backends/base.py
@@ -52,10 +52,11 @@ def get_key_func(key_func):
class BaseCache(object):
def __init__(self, params):
timeout = params.get('timeout', params.get('TIMEOUT', 300))
- try:
- timeout = int(timeout)
- except (ValueError, TypeError):
- timeout = 300
+ if timeout is not None:
+ try:
+ timeout = int(timeout)
+ except (ValueError, TypeError):
+ timeout = 300
self.default_timeout = timeout
options = params.get('OPTIONS', {})
View
34 docs/howto/outputting-csv.txt
@@ -54,6 +54,40 @@ mention:
about escaping strings with quotes or commas in them. Just pass
``writerow()`` your raw strings, and it'll do the right thing.
+Streaming large files
+~~~~~~~~~~~~~~~~~~~~~
+When dealing with views that generate very big responses, you might want to consider using Django's
+:class:`django.http.StreamingHttpResponse` objects instead.
+
+In this example, we want to make full use of Python generators to efficiently
+handle the assembly and transmission of a large CSV file::
+
+ import csv
+
+ from django.utils.six.moves import range
+ from django.http import StreamingHttpResponse
+
+ class Echo(object):
+ """An object that implements just the write method of the file-like
+ interface.
+ """
+ def write(self, value):
+ """Write the value by returning it, instead of storing in a buffer."""
+ return value
+
+ def some_streaming_csv_view(request):
+ """A view that streams a large CSV file."""
+ # Generate a sequence of rows. The range is based on the maximum number of
+ # rows that can be handled by a single sheet in most spreadsheet
+ # applications.
+ rows = (["Row {0}".format(idx), str(idx)] for idx in range(65536))
+ pseudo_buffer = Echo()
+ writer = csv.writer(pseudo_buffer)
+ response = StreamingHttpResponse((writer.writerow(row) for row in rows),
+ content_type="text/csv")
+ response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
+ return response
+
Handling Unicode
~~~~~~~~~~~~~~~~
View
82 tests/cache/tests.py
@@ -6,6 +6,7 @@
import os
import re
+import copy
import shutil
import tempfile
import threading
@@ -15,7 +16,8 @@
from django.conf import settings
from django.core import management
-from django.core.cache import cache, caches, CacheKeyWarning, InvalidCacheBackendError
+from django.core.cache import (cache, caches, CacheKeyWarning,
+ InvalidCacheBackendError, DEFAULT_CACHE_ALIAS)
from django.db import connection, router, transaction
from django.core.cache.utils import make_template_fragment_key
from django.http import HttpResponse, StreamingHttpResponse
@@ -1175,7 +1177,7 @@ def test_custom_key_validation(self):
class GetCacheTests(IgnorePendingDeprecationWarningsMixin, TestCase):
def test_simple(self):
- from django.core.cache import caches, DEFAULT_CACHE_ALIAS, get_cache
+ from django.core.cache import caches, get_cache
self.assertIsInstance(
caches[DEFAULT_CACHE_ALIAS],
get_cache('default').__class__
@@ -1204,6 +1206,82 @@ def test_close_deprecated(self):
self.assertTrue(cache.closed)
+DEFAULT_MEMORY_CACHES_SETTINGS = {
+ 'default': {
+ 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
+ 'LOCATION': 'unique-snowflake',
+ }
+}
+NEVER_EXPIRING_CACHES_SETTINGS = copy.deepcopy(DEFAULT_MEMORY_CACHES_SETTINGS)
+NEVER_EXPIRING_CACHES_SETTINGS['default']['TIMEOUT'] = None
+
+
+class DefaultNonExpiringCacheKeyTests(TestCase):
+ """Tests that verify that settings having Cache arguments with a TIMEOUT
+ set to `None` will create Caches that will set non-expiring keys.
+
+ This fixes ticket #22085.
+ """
+ def setUp(self):
+ # The 5 minute (300 seconds) default expiration time for keys is
+ # defined in the implementation of the initializer method of the
+ # BaseCache type.
+ self.DEFAULT_TIMEOUT = caches[DEFAULT_CACHE_ALIAS].default_timeout
+
+ def tearDown(self):
+ del(self.DEFAULT_TIMEOUT)
+
+ def test_default_expiration_time_for_keys_is_5_minutes(self):
+ """The default expiration time of a cache key is 5 minutes.
+
+ This value is defined inside the __init__() method of the
+ :class:`django.core.cache.backends.base.BaseCache` type.
+ """
+ self.assertEquals(300, self.DEFAULT_TIMEOUT)
+
+ def test_caches_with_unset_timeout_has_correct_default_timeout(self):
+ """Caches that have the TIMEOUT parameter undefined in the default
+ settings will use the default 5 minute timeout.
+ """
+ cache = caches[DEFAULT_CACHE_ALIAS]
+ self.assertEquals(self.DEFAULT_TIMEOUT, cache.default_timeout)
+
+ @override_settings(CACHES=NEVER_EXPIRING_CACHES_SETTINGS)
+ def test_caches_set_with_timeout_as_none_has_correct_default_timeout(self):
+ """Memory caches that have the TIMEOUT parameter set to `None` in the
+ default settings with have `None` as the default timeout.
+
+ This means "no timeout".
+ """
+ cache = caches[DEFAULT_CACHE_ALIAS]
+ self.assertIs(None, cache.default_timeout)
+ self.assertEquals(None, cache.get_backend_timeout())
+
+ @override_settings(CACHES=DEFAULT_MEMORY_CACHES_SETTINGS)
+ def test_caches_with_unset_timeout_set_expiring_key(self):
+ """Memory caches that have the TIMEOUT parameter unset will set cache
+ keys having the default 5 minute timeout.
+ """
+ key = "my-key"
+ value = "my-value"
+ cache = caches[DEFAULT_CACHE_ALIAS]
+ cache.set(key, value)
+ cache_key = cache.make_key(key)
+ self.assertNotEquals(None, cache._expire_info[cache_key])
+
+ @override_settings(CACHES=NEVER_EXPIRING_CACHES_SETTINGS)
+ def text_caches_set_with_timeout_as_none_set_non_expiring_key(self):
+ """Memory caches that have the TIMEOUT parameter set to `None` will set
+ a non expiring key by default.
+ """
+ key = "another-key"
+ value = "another-value"
+ cache = caches[DEFAULT_CACHE_ALIAS]
+ cache.set(key, value)
+ cache_key = cache.make_key(key)
+ self.assertEquals(None, cache._expire_info[cache_key])
+
+
@override_settings(
CACHE_MIDDLEWARE_KEY_PREFIX='settingsprefix',
CACHE_MIDDLEWARE_SECONDS=1,
Something went wrong with that request. Please try again.