New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy model primary key column attribute name must match column name #78

Closed
hartror opened this Issue Aug 6, 2013 · 7 comments

Comments

Projects
None yet
5 participants
@hartror

hartror commented Aug 6, 2013

So we've got a set of SQLAlchemy models like so (don't ask why, it makes me sad).

class Test(Base):
    __tablename__ = "test"

    id = sqlalchemy.Column('id_', sqlalchemy.Integer, primary_key=True)

When using a factory for the class we get:

Traceback (most recent call last):
  File "<pyshell#0>", line 16, in <module>
    TestFactory.build()
  File "/usr/local/lib/python2.7/dist-packages/factory/base.py", line 434, in build
    attrs = cls.attributes(create=False, extra=kwargs)
  File "/usr/local/lib/python2.7/dist-packages/factory/base.py", line 316, in attributes
    force_sequence=force_sequence,
  File "/usr/local/lib/python2.7/dist-packages/factory/containers.py", line 263, in build
    sequence = self.factory._generate_next_sequence()
  File "/usr/local/lib/python2.7/dist-packages/factory/base.py", line 287, in _generate_next_sequence
    cls._next_sequence = cls._setup_next_sequence()
  File "/usr/local/lib/python2.7/dist-packages/factory/alchemy.py", line 37, in _setup_next_sequence
    pk = getattr(model, model.__mapper__.primary_key[0].name)
AttributeError: type object 'Test' has no attribute 'id_'

A full example:

from sqlalchemy.ext.declarative import declarative_base
import sqlalchemy
import factory.alchemy

Base = declarative_base()

class Test(Base):
    __tablename__ = "test"

    id = sqlalchemy.Column('id_', sqlalchemy.Integer, primary_key=True)

class TestFactory(factory.alchemy.SQLAlchemyModelFactory):
    FACTORY_FOR = Test
    FACTORY_SESSION = None

TestFactory.build()
@hartror

This comment has been minimized.

Show comment
Hide comment
@hartror

hartror Aug 27, 2013

SHA: 738e673 removes the primary key functionality and the id from the factory in the tests. This means the database is given responsibility for setting automatic/sequence keys. This avoids the case I reported above as well as other cases where there are multi-column primary keys which would confuse the code I removed. Cases such as where there are multiple integer primary keys where only one of them is serial or where one of them is an integer and the other a non-integer.

A couple of the tests are broken in my example but that is because the intended behaviour has now changed so the assumptions these test make seem wrong to me. However I left them broken because I suspect there might be some concept in factory boy these are covering, specifically the use case of StandardFactory.reset_sequence().

hartror commented Aug 27, 2013

SHA: 738e673 removes the primary key functionality and the id from the factory in the tests. This means the database is given responsibility for setting automatic/sequence keys. This avoids the case I reported above as well as other cases where there are multi-column primary keys which would confuse the code I removed. Cases such as where there are multiple integer primary keys where only one of them is serial or where one of them is an integer and the other a non-integer.

A couple of the tests are broken in my example but that is because the intended behaviour has now changed so the assumptions these test make seem wrong to me. However I left them broken because I suspect there might be some concept in factory boy these are covering, specifically the use case of StandardFactory.reset_sequence().

@rbarrois

This comment has been minimized.

Show comment
Hide comment
@rbarrois

rbarrois Aug 27, 2013

Member

Thanks for the PR.

I see two issues around this whole setup_next_sequence code:

  1. It tends to perform SQL queries while calling build(), which is not exactly expected behavior: a build() shouldn't touch the database at all
  2. Removing it altogether increases the risk of conflicts when setting unique fields from factory.Sequence

I'll try to find a way to introspect the schema so that factory_boy can find out:

  • If there's an auto-increment column, use it
  • Otherwise, if there is a single column, numerical PK, use it
  • In all other cases, don't even try to find a Sequence.

Actually, I have the same kind of issues with DjangoModelFactory.

Member

rbarrois commented Aug 27, 2013

Thanks for the PR.

I see two issues around this whole setup_next_sequence code:

  1. It tends to perform SQL queries while calling build(), which is not exactly expected behavior: a build() shouldn't touch the database at all
  2. Removing it altogether increases the risk of conflicts when setting unique fields from factory.Sequence

I'll try to find a way to introspect the schema so that factory_boy can find out:

  • If there's an auto-increment column, use it
  • Otherwise, if there is a single column, numerical PK, use it
  • In all other cases, don't even try to find a Sequence.

Actually, I have the same kind of issues with DjangoModelFactory.

@hartror

This comment has been minimized.

Show comment
Hide comment
@hartror

hartror Aug 28, 2013

What are the reasons that this should be factory_boy's responsibility? Shouldn't users be responsible for ensuring they craft and use factories in a way that doesn't create conflicts with unique fields?

hartror commented Aug 28, 2013

What are the reasons that this should be factory_boy's responsibility? Shouldn't users be responsible for ensuring they craft and use factories in a way that doesn't create conflicts with unique fields?

@steder

This comment has been minimized.

Show comment
Hide comment
@steder

steder Jan 7, 2014

A colleague and I just ran into this same issue (we have aliased primary keys) and it seems like we're able to work around it with a simple subclass of SQLAlchemyModelFactory, namely:

class OurFactory(SQLAlchemyModelFactory):    
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        max_id = getattr(cls, "max_id", 0)
        cls.max_id = max_id + 1
        return cls.max_id

This seems to satisfy the requirement in the common case (either an auto-increment or single numerical PK exists).

This seems to mitigate issues 1 and 2 by not issuing unnecessary queries and it helps prevent conflicts.

The downside of this is that it isn't thread-safe, but I don't think the old implementation was thread-safe either so perhaps that's not an issue. That said, wrapping this is a mutex could be easy enough.

If this is a reasonable fix I'd happily create a PR including it.

steder commented Jan 7, 2014

A colleague and I just ran into this same issue (we have aliased primary keys) and it seems like we're able to work around it with a simple subclass of SQLAlchemyModelFactory, namely:

class OurFactory(SQLAlchemyModelFactory):    
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        max_id = getattr(cls, "max_id", 0)
        cls.max_id = max_id + 1
        return cls.max_id

This seems to satisfy the requirement in the common case (either an auto-increment or single numerical PK exists).

This seems to mitigate issues 1 and 2 by not issuing unnecessary queries and it helps prevent conflicts.

The downside of this is that it isn't thread-safe, but I don't think the old implementation was thread-safe either so perhaps that's not an issue. That said, wrapping this is a mutex could be easy enough.

If this is a reasonable fix I'd happily create a PR including it.

@amoffat

This comment has been minimized.

Show comment
Hide comment
@amoffat

amoffat Jan 7, 2014

@steder and i poked at it some more, this looks more like what you were trying to achieve originally, but this way, it works for columns that have been aliased. @steder and i talked more about the previous solution, and it doesn't really make sense to have the class keep track of max_id, in case records get inserted from somewhere other than factory_boy.

from sqlalchemy.sql import column
from sqlalchemy import func


class BetterThanMikes(SQLAlchemyModelFactory):    
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        session = cls.FACTORY_SESSION
        model = cls.FACTORY_FOR
        pk = column(model.__mapper__.primary_key[0].name)
        max_pk = session.query(model, func.max(pk)).one()[0]

        if isinstance(max_pk, int):
            return max_pk + 1 if max_pk else 1
        else:
            return 1

amoffat commented Jan 7, 2014

@steder and i poked at it some more, this looks more like what you were trying to achieve originally, but this way, it works for columns that have been aliased. @steder and i talked more about the previous solution, and it doesn't really make sense to have the class keep track of max_id, in case records get inserted from somewhere other than factory_boy.

from sqlalchemy.sql import column
from sqlalchemy import func


class BetterThanMikes(SQLAlchemyModelFactory):    
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        session = cls.FACTORY_SESSION
        model = cls.FACTORY_FOR
        pk = column(model.__mapper__.primary_key[0].name)
        max_pk = session.query(model, func.max(pk)).one()[0]

        if isinstance(max_pk, int):
            return max_pk + 1 if max_pk else 1
        else:
            return 1
@airdrummingfool

This comment has been minimized.

Show comment
Hide comment
@airdrummingfool

airdrummingfool Apr 29, 2014

I started using @amoffat's solution above (thanks!), but ran into an error where sometimes func.max(pk) would be ambiguous:

OperationalError: (OperationalError) (1052, "Column 'id' in field list is ambiguous") [... large SELECT statement ...] 

I modified the function to explicitly specify the model's table along with the pk column, and it seems to work:

from factory.alchemy import SQLAlchemyModelFactory
from sqlalchemy.sql import column
from sqlalchemy import func


class PKFixedSQLAlchemyModelFactory(SQLAlchemyModelFactory):
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        session = cls.FACTORY_SESSION
        model = cls.FACTORY_FOR
        pk = column(model.__mapper__.primary_key[0].name)
        max_pk = session.query(model, func.max('{}.{}'.format(model.__tablename__, pk))).one()[0]

        if isinstance(max_pk, int):
            return max_pk + 1 if max_pk else 1
        else:
            return 1

airdrummingfool commented Apr 29, 2014

I started using @amoffat's solution above (thanks!), but ran into an error where sometimes func.max(pk) would be ambiguous:

OperationalError: (OperationalError) (1052, "Column 'id' in field list is ambiguous") [... large SELECT statement ...] 

I modified the function to explicitly specify the model's table along with the pk column, and it seems to work:

from factory.alchemy import SQLAlchemyModelFactory
from sqlalchemy.sql import column
from sqlalchemy import func


class PKFixedSQLAlchemyModelFactory(SQLAlchemyModelFactory):
    @classmethod
    def _setup_next_sequence(cls, *args, **kwargs):
        session = cls.FACTORY_SESSION
        model = cls.FACTORY_FOR
        pk = column(model.__mapper__.primary_key[0].name)
        max_pk = session.query(model, func.max('{}.{}'.format(model.__tablename__, pk))).one()[0]

        if isinstance(max_pk, int):
            return max_pk + 1 if max_pk else 1
        else:
            return 1

rbarrois added a commit that referenced this issue Nov 16, 2014

Remove automagic pk-based sequence setup
Related to issues #78, #92, #103, #111, #153, #170

The default value of all sequences is now 0; the automagic
``_setup_next_sequence`` behavior of Django/SQLAlchemy has been removed.

This feature's only goal was to allow the following scenario:

1. Run a Python script that uses MyFactory.create() a couple of times
   (with a unique field based on the sequence counter)
2. Run the same Python script a second time

Without the magical ``_setup_next_sequence``, the Sequence counter would be set
to 0 at the beginning of each script run, so both runs would generate objects
with the same values for the unique field ; thus conflicting and crashing.

The above behavior having only a very limited use and bringing various
issues (hitting the database on ``build()``, problems with non-integer
or composite primary key columns, ...), it has been removed.

It could still be emulated through custom ``_setup_next_sequence``
methods, or by calling ``MyFactory.reset_sequence()``.
@rbarrois

This comment has been minimized.

Show comment
Hide comment
@rbarrois

rbarrois Mar 26, 2015

Member

This issue has been fixed in 13d310f, and will be part of the 2.5.0 release.

Member

rbarrois commented Mar 26, 2015

This issue has been fixed in 13d310f, and will be part of the 2.5.0 release.

@rbarrois rbarrois closed this Mar 26, 2015

skytreader added a commit to skytreader/alexandria that referenced this issue Jul 23, 2015

Make more asserts pass.
Had to give up on getting the factory_boy-generated object to give me an ID and
I just had to query the whole thing again. :|

See also:
- FactoryBoy/factory_boy@738e673
- FactoryBoy/factory_boy#78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment