Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #1991 causes DoesNotExist error during row iteration #2115

Closed
brandond opened this issue Feb 27, 2020 · 12 comments
Closed

Fix for #1991 causes DoesNotExist error during row iteration #2115

brandond opened this issue Feb 27, 2020 · 12 comments

Comments

@brandond
Copy link

I have a rather large cross-model join query that started throwing an exception after updating from 3.9.6 to 3.13.1. This appears to be caused by the fix for #1991 .

The query is:

        for inc in (self.vulnerabilities
                        .select(fn.COALESCE(Team.guid, Vulnerability.guid).alias('guid'),
                                fn.COALESCE(Team.remedy_group, self.name).alias('name'),
                                Plugin.id,
                                fn.COUNT(Vulnerability.host_id).alias('count'),
                                fn.JSON_GROUP_ARRAY(Vulnerability.host_id).python_value(str).alias('host_ids'))
                        .join(Plugin, src=Vulnerability)
                        .join(Query, src=Vulnerability)
                        .join(Host, src=Vulnerability)
                        .join(Team, join_type=JOIN.LEFT_OUTER, on=(Team.name == team_name), src=Host)
                        .where(Query.process << INCIDENT_PROCESS_PLUGIN)
                        .where(Host.state != HOST_STATE_TERMINATED)
                        .group_by(fn.COALESCE(Team.guid, Vulnerability.guid),
                                  fn.COALESCE(Team.remedy_group, self.name),
                                  Plugin.id)
                        .order_by(fn.COALESCE(Team.guid, Vulnerability.guid),
                                  fn.COALESCE(Team.remedy_group, self.name),
                                  Plugin.id)):
            # do something

The resulting query SQL that gets logged is:

('SELECT COALESCE("t1"."guid", "t2"."guid") AS "guid", COALESCE("t1"."remedy_group", ?) AS "name", "t3"."id", COUNT("t2"."host_id") AS "count", JSON_GROUP_ARRAY("t2"."host_id") AS "host_ids" FROM "vulnerability" AS "t2" INNER JOIN "plugin" AS "t3" ON ("t2"."plugin_id" = "t3"."id") INNER JOIN "query" AS "t4" ON ("t2"."query_id" = "t4"."id") INNER JOIN "host" AS "t5" ON ("t2"."host_id" = "t5"."ip_address") LEFT OUTER JOIN "team" AS "t1" ON ("t1"."name" = (SELECT "t1"."name" FROM "team" AS "t1" WHERE (("t1"."name" = TRIM(json_extract("t5"."tags", ?))) OR ("t1"."remedy_group" = TRIM(json_extract("t5"."tags", ?)))) LIMIT ?)) WHERE ((("t2"."remediation_group_id" = ?) AND ("t4"."process" IN (?, ?))) AND ("t5"."state" != ?)) GROUP BY COALESCE("t1"."guid", "t2"."guid"), COALESCE("t1"."remedy_group", ?), "t3"."id" ORDER BY COALESCE("t1"."guid", "t2"."guid"), COALESCE("t1"."remedy_group", ?), "t3"."id"', [u'Server Operations', u'$.TechOwnerName', u'$.RemedyGroup', 1, 764, u'standard', u'compliance', u'terminated', u'Server Operations', u'Server Operations'])

The exception is:

  File "/var/task/ticketlib/remediation.py", line 232, in get_known_tickets
    Plugin.id)):
  File "/var/task/peewee.py", line 4287, in next
    self.cursor_wrapper.iterate()
  File "/var/task/peewee.py", line 4206, in iterate
    result = self.process_row(row)
  File "/var/task/peewee.py", line 7450, in process_row
    if instance not in set_keys and dest not in set_keys \
  File "/var/task/peewee.py", line 6483, in __hash__
    return hash((self.__class__, self._pk))
  File "/var/task/peewee.py", line 6374, in get_id
    return getattr(self, self._meta.primary_key.safe_name)
  File "/var/task/peewee.py", line 5382, in __get__
    for field_name in self.field_names])
  File "/var/task/peewee.py", line 4332, in __get__
    return self.get_rel_instance(instance)
  File "/var/task/peewee.py", line 4327, in get_rel_instance
    raise self.rel_model.DoesNotExist
HostDoesNotExist  

It's failing in the iterator, without any iterations of the for loop running. I am guessing I somehow have a Vulnerability with a host_id set to a value that doesn't have a corresponding PK in the Host table, but I can't even try to track that down because it's blowing up the iterator immediately.

@coleifer
Copy link
Owner

SELECT
  COALESCE("t1"."guid", "t2"."guid") AS "guid",
  COALESCE("t1"."remedy_group", ?) AS "name",
  "t3"."id",
  COUNT("t2"."host_id") AS "count",
  JSON_GROUP_ARRAY("t2"."host_id") AS "host_ids" 
FROM "vulnerability" AS "t2" 
INNER JOIN "plugin" AS "t3" ON ("t2"."plugin_id" = "t3"."id") 
INNER JOIN "query" AS "t4" ON ("t2"."query_id" = "t4"."id") 
INNER JOIN "host" AS "t5" ON ("t2"."host_id" = "t5"."ip_address") 
LEFT OUTER JOIN "team" AS "t1" ON ("t1"."name" = (
  SELECT "t1"."name" FROM "team" AS "t1" 
  WHERE (
    ("t1"."name" = TRIM(json_extract("t5"."tags", ?))) OR 
    ("t1"."remedy_group" = TRIM(json_extract("t5"."tags", ?)))) LIMIT ?
)) 
WHERE ((
  ("t2"."remediation_group_id" = ?) AND 
  ("t4"."process" IN (?, ?))) AND
   ("t5"."state" != ?)) 
GROUP BY 
  COALESCE("t1"."guid", "t2"."guid"), 
  COALESCE("t1"."remedy_group", ?), 
  "t3"."id" 
ORDER BY COALESCE("t1"."guid", "t2"."guid"), COALESCE("t1"."remedy_group", ?), "t3"."id"

@coleifer
Copy link
Owner

Does one of those models have a primary key that is a foreign-key to a different table?

Is there any way you can provide a minimal way to reproduce this issue? At the very least subtract from this example until you can isolate the minimal query that reproduces the problem?

@coleifer
Copy link
Owner

I am guessing I somehow have a Vulnerability with a host_id set to a value that doesn't have a corresponding PK in the Host table

How would this happen? The whole thing with referential integrity is that this should not occur.

@coleifer
Copy link
Owner

coleifer commented Feb 29, 2020

Going through the stack trace, I'll show you my thoughts:

We're checking if an instance belongs to a set/dict, so we need its hash, which consists of the class and primary-key value:

  File "/var/task/peewee.py", line 6483, in __hash__
    return hash((self.__class__, self._pk))

The _pk accessor is a property that gets the value of the primary-key field. Note that it uses primary_key.safe_name. When the primary-key is also a foreign-key, this accessor will the be "object_id_name", which allows us to access the raw value of the foreign-key, rather than potentially issuing a query to resolve the foreign-key to an instance.

  File "/var/task/peewee.py", line 6374, in get_id
    return getattr(self, self._meta.primary_key.safe_name)

It looks like you've got a composite primary-key...Why do people do this?!

  File "/var/task/peewee.py", line 5382, in __get__
    for field_name in self.field_names])

One part of the composite pk is also a foreign-key?

  File "/var/task/peewee.py", line 4332, in __get__
    return self.get_rel_instance(instance)

Missing the part we need:

  File "/var/task/peewee.py", line 4327, in get_rel_instance
    raise self.rel_model.DoesNotExist

@coleifer
Copy link
Owner

In your composite primary-key that also contains a foreign-key, try changing the foreign-key part to be the "safe-name" of the FK rather than the FK field itself:

# Change this:
primary_key = CompositeKey('host', 'other_field')

# To this:
primary_key = CompositeKey('host_id', 'other_field')

Try that out and let me know if it fixes the problem and works.

using composite primary keys ever

Active record ORMs do not work well with composite pks. Try using a regular autoincrement / uuid in the future and just put a unique constraint on the columns you need to guarantee are unique. You'll have a much better time.

Repository owner deleted a comment from AnthonyAgazzi-artal Mar 2, 2020
@coleifer
Copy link
Owner

coleifer commented Mar 2, 2020

@brandond - can you please take a look at my latest comment and provide additional info?

@Nurbel
Copy link

Nurbel commented Mar 2, 2020

I have been able to reproduce the issue by adding the following code to tests/models.py:

class Neighbourhood(TestModel):
    name = CharField()
    city = ForeignKeyField(City)
    class Meta:
        primary_key = CompositeKey('name', 'city')

class TestCompositePKwithFK(ModelTestCase):
    @requires_models(City, Neighbourhood)
    def test_join_query(self):
        city = City.create(name='city')
        neighbourhood = Neighbourhood.create(name='n1',city=city)

        query = (Neighbourhood
          .select(
              Neighbourhood.name,
              City.name
          )
            .join(City).where(
                Neighbourhood.name == 'n1'
         ) )
        
        res = query.get()
        self.assertEqual(res.name,'n1')
        self.assertEqual(res.city.name, 'city')

I am not sure of what you mean with the safe-name of the FK. Replacing "city" by "city_id" in the CompositeKey declaration causes a KeyError at database creation time

@brandond
Copy link
Author

brandond commented Mar 2, 2020

You're probably going to hate me for this, but here's a subsection of the model code. I have the business logic in a subclass of RemediationGroupBase, so self.vulnerabilities hits the backref to the Vulnerability model which does indeed have a composite pk.

class RemediationGroupBase(ModelBase):
    """Peewee model class representing persistent data associated with a Tenable asset group"""
    id = IntegerField(verbose_name='Group ID', primary_key=True)
    name = CharField(verbose_name='Group Name')
    guid = CharField(verbose_name='Template GUID')
    mid_tier = CharField(verbose_name='Remedy Mid-Tier')
    process = CharField(verbose_name='Process')

    class Meta:
        without_rowid = True
        table_name = 'remediation_group'

class Vulnerability(ModelBase):
    """Peewee model class representing a single Vulnerability finding"""
    last_seen = DateTimeField(verbose_name='Last Seen Date')
    first_seen = DateTimeField(verbose_name='First Seen Date')
    plugin_text = CompressedField(verbose_name='Plugin Text', compression_level=9)
    netbios_name = CharField(verbose_name='NetBIOS Name')
    mac_address = CharField(verbose_name='MAC Address')
    repository = CharField(verbose_name='Repository')
    port = IntegerField(verbose_name='Port')
    dns_name = CharField(verbose_name='DNS Name')
    protocol = CharField(verbose_name='Protocol')
    host = ForeignKeyField(verbose_name='Host', model=Host, backref='vulnerabilities')
    plugin = ForeignKeyField(verbose_name='Plugin', model=Plugin, backref='vulnerabilities')
    remediation_group = ForeignKeyField(verbose_name='Remediation Group', model=RemediationGroupBase, backref='vulnerabilities')
    query = ForeignKeyField(verbose_name='Query ID', model=Query, backref='vulnerabilities')
    guid = CharField(verbose_name='Query Template GUID')
    patched = BooleanField(verbose_name='Patched', default=False)

    class Meta:
        without_rowid = True
        primary_key = CompositeKey('host', 'port', 'protocol', 'plugin', 'first_seen')

I'm using a composite PK because I'm bulk-loading denormalized data from a 3rd party API and just use insert().on_conflict_replace() to deduplicate rows.

I'll try replacing 'host' with 'host_id', etc in the CompositeKey and see if that makes it happier.

@brandond
Copy link
Author

brandond commented Mar 2, 2020

As @Nurbel mentioned, using the safe name causes a KeyError on table creation:

  File "/home/ec2-user/.local/lib/python2.7/site-packages/ticketlib/models.py", line 178, in <module>
    Vulnerability.create_table(safe=True)
  File "/home/ec2-user/.local/lib/python2.7/site-packages/peewee.py", line 6529, in create_table
    cls._schema.create_all(safe, **options)
  File "/home/ec2-user/.local/lib/python2.7/site-packages/peewee.py", line 5679, in create_all
    self.create_table(safe, **table_options)
  File "/home/ec2-user/.local/lib/python2.7/site-packages/peewee.py", line 5534, in create_table
    self.database.execute(self._create_table(safe=safe, **options))
  File "/home/ec2-user/.local/lib/python2.7/site-packages/peewee.py", line 5492, in _create_table
    for field_name in meta.primary_key.field_names]
KeyError: 'plugin_id'

@coleifer
Copy link
Owner

coleifer commented Mar 2, 2020

Thanks for the info - I'll get to the bottom of this.

@brandond
Copy link
Author

brandond commented Mar 2, 2020

I'll give that change a try, but the tests have all failed?

coleifer added a commit that referenced this issue Mar 2, 2020
@coleifer
Copy link
Owner

coleifer commented Mar 2, 2020

I'll fix the tests, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants