ModelPaginator and current version of SQLAlchemy #116

ods · 2013-03-28T13:57:38Z

It's a pity, but now Query.count() in SQLAlchemy produces very inefficient SQL. See https://bitbucket.org/zzzeek/sqlalchemy/commits/9e7714afcaf5adc4100bce25416c0651ed13644f for details. We have to find a solution how to deal with it. Using patched SA doesn't seem a good way.

The text was updated successfully, but these errors were encountered:

ods · 2013-03-28T14:15:58Z

Temporal solution is to patch query.py:

--- a/sqlalchemy/orm/query.py   Wed Mar 27 21:07:15 2013 +0400
+++ b/sqlalchemy/orm/query.py   Thu Mar 28 17:47:05 2013 +0400
@@ -2542,44 +2542,116 @@ class Query(object):
                 kwargs.get('offset') is not None or
                 kwargs.get('distinct', False))

+    #def count(self):
+    #    """Return a count of rows this Query would return.
+
+    #    This generates the SQL for this Query as follows::
+
+    #        SELECT count(1) AS count_1 FROM (
+    #            SELECT <rest of query follows...>
+    #        ) AS anon_1
+
+    #    .. versionchanged:: 0.7
+    #        The above scheme is newly refined as of 0.7b3.
+
+    #    For fine grained control over specific columns
+    #    to count, to skip the usage of a subquery or
+    #    otherwise control of the FROM clause,
+    #    or to use other aggregate functions,
+    #    use :attr:`~sqlalchemy.sql.expression.func` expressions in conjunction
+    #    with :meth:`~.Session.query`, i.e.::
+
+    #        from sqlalchemy import func
+
+    #        # count User records, without
+    #        # using a subquery.
+    #        session.query(func.count(User.id))
+
+    #        # return count of user "id" grouped
+    #        # by "name"
+    #        session.query(func.count(User.id)).\\
+    #                group_by(User.name)
+
+    #        from sqlalchemy import distinct
+
+    #        # count distinct "name" values
+    #        session.query(func.count(distinct(User.name)))
+
+    #    """
+    #    col = sql.func.count(sql.literal_column('*'))
+    #    return self.from_self(col).scalar()
+
+    # XXX Old count() implementation from SA revision 6135:032e956e982e
     def count(self):
         """Return a count of rows this Query would return.

-        This generates the SQL for this Query as follows::
-
-            SELECT count(1) AS count_1 FROM (
-                SELECT <rest of query follows...>
-            ) AS anon_1
-
-        .. versionchanged:: 0.7
-            The above scheme is newly refined as of 0.7b3.
-
-        For fine grained control over specific columns
-        to count, to skip the usage of a subquery or
-        otherwise control of the FROM clause,
-        or to use other aggregate functions,
-        use :attr:`~sqlalchemy.sql.expression.func` expressions in conjunction
-        with :meth:`~.Session.query`, i.e.::
-
-            from sqlalchemy import func
-
-            # count User records, without
-            # using a subquery.
-            session.query(func.count(User.id))
-
-            # return count of user "id" grouped
-            # by "name"
-            session.query(func.count(User.id)).\\
-                    group_by(User.name)
-
-            from sqlalchemy import distinct
-
-            # count distinct "name" values
-            session.query(func.count(distinct(User.name)))
+        For simple entity queries, count() issues
+        a SELECT COUNT, and will specifically count the primary
+        key column of the first entity only.  If the query uses 
+        LIMIT, OFFSET, or DISTINCT, count() will wrap the statement 
+        generated by this Query in a subquery, from which a SELECT COUNT
+        is issued, so that the contract of "how many rows
+        would be returned?" is honored.
+
+        For queries that request specific columns or expressions, 
+        count() again makes no assumptions about those expressions
+        and will wrap everything in a subquery.  Therefore,
+        ``Query.count()`` is usually not what you want in this case.
+        To count specific columns, often in conjunction with 
+        GROUP BY, use ``func.count()`` as an individual column expression
+        instead of ``Query.count()``.  See the ORM tutorial
+        for an example.

         """
-        col = sql.func.count(sql.literal_column('*'))
-        return self.from_self(col).scalar()
+        should_nest = [self._should_nest_selectable]
+        def ent_cols(ent):
+            if isinstance(ent, _MapperEntity):
+                return ent.mapper.primary_key
+            else:
+                should_nest[0] = True
+                return [ent.column]
+
+        return self._col_aggregate(sql.literal_column('1'), sql.func.count,
+            nested_cols=chain(*[ent_cols(ent) for ent in self._entities]),
+            should_nest = should_nest[0]
+        )
+
+    def _col_aggregate(self, col, func, nested_cols=None, should_nest=False):
+        context = QueryContext(self)
+
+        for entity in self._entities:
+            entity.setup_context(self, context)
+
+        if context.from_clause:
+            from_obj = list(context.from_clause)
+        else:
+            from_obj = context.froms
+
+        if self._enable_single_crit:
+            self._adjust_for_single_inheritance(context)
+
+        whereclause  = context.whereclause
+
+        if should_nest:
+            if not nested_cols:
+                nested_cols = [col]
+            else:
+                nested_cols = list(nested_cols)
+            s = sql.select(nested_cols, whereclause, 
+                        from_obj=from_obj, use_labels=True,
+                        **self._select_args)
+            s = s.alias()
+            s = sql.select(
+                [func(s.corresponding_column(col) or col)]).select_from(s)
+        else:
+            s = sql.select([func(col)], whereclause, from_obj=from_obj,
+            **self._select_args)
+
+        if self._autoflush and not self._populate_existing:
+            self.session._autoflush()
+        return self.session.scalar(s, params=self._params,
+            mapper=self._mapper_zero())
+    # XXX End of old count() implementation

     def delete(self, synchronize_session='evaluate'):
         """Perform a bulk delete query.

riffm · 2013-03-28T14:37:47Z

@ods can we subclass instead of patching?

ods · 2013-03-29T08:05:26Z

@riffm The only difference I see is that subclassing gives us false feeling that we can safely update library with new version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelPaginator and current version of SQLAlchemy #116

ModelPaginator and current version of SQLAlchemy #116

ods commented Mar 28, 2013

ods commented Mar 28, 2013

riffm commented Mar 28, 2013

ods commented Mar 29, 2013

ModelPaginator and current version of SQLAlchemy #116

ModelPaginator and current version of SQLAlchemy #116

Comments

ods commented Mar 28, 2013

ods commented Mar 28, 2013

riffm commented Mar 28, 2013

ods commented Mar 29, 2013