Skip to content

Python: Inconsistent behaviour of the getAMember and getMember predicates #19297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tanguySnoeck opened this issue Apr 13, 2025 · 8 comments
Open
Labels
question Further information is requested

Comments

@tanguySnoeck
Copy link

tanguySnoeck commented Apr 13, 2025

Trying to access all nodes that represent an attribute of a Django subclass does not work as expected in https://github.com/HumanSignal/label-studio.

I am trying to detect a call to AsyncMigrationStatus.objects in the following block of code (https://github.com/HumanSignal/label-studio/blob/develop/label_studio/core/management/commands/show_async_migrations.py#L19):

def handle(self, *args, **options):
        org = options['organization']
        logger.debug(f"===> AsyncMigrationStatus for Organization {org if org > -1 else 'ALL'}")
        if org == -1:
            migrations = AsyncMigrationStatus.objects.all().order_by('project_id')
        else:
            migrations = AsyncMigrationStatus.objects.filter(project__organization_id=org)

        for m in migrations:
            logger.debug(f'{m.name} \t {m.created_at} \t Project <{m.project}> \t {m.status} \t {m.meta}')

        logger.debug(f"===> AsyncMigrationStatus for Organization {org if org > -1 else 'ALL'} printed")

Using the following query:

import python
import semmle.python.frameworks.Django
import semmle.python.ApiGraphs


from API::Node n 
where
n = PrivateDjango::DjangoImpl::DB::Models::Model::subclassRef().getAMember() and
select n, "this is a model attribute"

AsyncMigrationStatus is defined below (https://github.com/HumanSignal/label-studio/blob/develop/label_studio/core/models.py#L10):

class AsyncMigrationStatus(models.Model):
    meta = JSONField(
        'meta',
        null=True,
        default=dict,
        help_text='Meta is for any params for migrations, e.g.: project, filter or error message.',
    )

    project = models.ForeignKey(
        'projects.Project',
        related_name='asyncmigrationstatus',
        on_delete=models.CASCADE,
        null=True,
        help_text='Project ID for this migration',
    )

    name = models.TextField('migration_name', help_text='Migration name')

    STATUS_STARTED = 'STARTED'
    STATUS_IN_PROGRESS = 'IN PROGRESS'
    STATUS_FINISHED = 'FINISHED'
    STATUS_ERROR = 'ERROR'
    STATUS_CHOICES = (
        (STATUS_STARTED, 'Migration is started or queued.'),
        (STATUS_IN_PROGRESS, 'Migration is in progress. Check meta for job_id or status.'),
        (STATUS_FINISHED, 'Migration completed successfully.'),
        (STATUS_ERROR, 'Migration completed with errors. Check meta for more info.'),
    )
    status = models.CharField(max_length=100, choices=STATUS_CHOICES, null=True, default=None)

    created_at = models.DateTimeField(_('created at'), auto_now_add=True, help_text='Creation time')
    updated_at = models.DateTimeField(_('updated at'), auto_now=True, help_text='Last updated time')

    def __str__(self):
        return f'(id={self.id}) ' + self.name + (' at project ' + str(self.project) if self.project else ''))

This query returns many results across the codebase but does not flag this specific statement, along with others.

I've tried many different approaches to understand the root cause of the issue without success. Please let me know if I am somehow misusing getAMember.

Below are my specs:

  • codeQL CLI 2.21.0
  • codeQL VSCode extension 1.17.2
@tanguySnoeck tanguySnoeck added the question Further information is requested label Apr 13, 2025
@aibaars
Copy link
Contributor

aibaars commented Apr 14, 2025

The query looks sensible at first sight. You say you'd like to find AsyncMigrationStatus.objects, however, I don't see objects defined in class AsyncMigrationStatus, so perhaps that is the problem.

Have you checked that PrivateDjango::DjangoImpl::DB::Models::Model::subclassRef() returns the results you expect? It's usually quite helpful to select a sub-expression in a query and use the right click: Quick evaluate feature in the VSCode IDE.

@tanguySnoeck
Copy link
Author

tanguySnoeck commented Apr 14, 2025

Hi @aibaars , Thanks for the quick reply!
The objects attribute is the default manager added by Django to every Django model class (https://docs.djangoproject.com/en/5.2/topics/db/managers/), so codeQL should be able to flag it. Especially because it also does flag other locations in the codebase where this objects attribute is being accessed from a subclass of Django's Model.

I can also confirm that retrieving all subclasses of Django Model with

import python
import semmle.python.frameworks.Django
import semmle.python.ApiGraphs


from API::Node n 
where
n = PrivateDjango::DjangoImpl::DB::Models::Model::subclassRef() 
select n, "this is a model subclass"

correctly returns AsyncMigrationStatus in its ouput.

@aibaars
Copy link
Contributor

aibaars commented Apr 14, 2025

You're right.

You might want to use the following method, although I expect it to have the same problem as your query because it has the same code as what you wrote.

/**
* Gets a reference to the Manager (django.db.models.Manager) for the django Model `modelClass`,
* accessed by `<modelClass>.objects`.
*/
API::Node manager(API::Node modelClass) {
modelClass = Model::subclassRef() and
result = modelClass.getMember("objects")
}

@tanguySnoeck
Copy link
Author

This is the approach I started with, but it didn't work. After some troubleshooting, I was able to narrow down the issue to getMember and getAMember predicates.

Just as a sanity check, I ran the following query again, and can confirm it still does not work:

import python
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.frameworks.Django
import semmle.python.ApiGraphs

API::Node modelClass() {
    result = PrivateDjango::DjangoImpl::DB::Models::Model::subclassRef()
}

from API::Node n 
where
n = PrivateDjango::DjangoImpl::DB::Models::manager(modelClass())
select n, "this is a manager"

@yoff
Copy link
Contributor

yoff commented Apr 22, 2025

I think this is an instance of an internal issue we're tracking (https://github.com/github/codeql-team/issues/3836), since the import happens from inside 'core'.
This would then be the third user report, so we should probably prioritize that issue soon...

@tanguySnoeck
Copy link
Author

@yoff since this is an internal issue, I am unable to access it, would you mind sharing more details? I ran some experiments on my side, and codeQL is sometimes able to flag the pattern I am describing even if the import happens from inside a package called 'core'.
If you provide me with a description of the issue, I can try to recreate it and check if it applies to my project.

@tanguySnoeck tanguySnoeck changed the title Python: Inconsistent behaviour of the getAMember predicate Python: Inconsistent behaviour of the getAMember and getMember predicates Apr 23, 2025
@yoff
Copy link
Contributor

yoff commented Apr 23, 2025

@yoff since this is an internal issue, I am unable to access it

Sorry, I did not realize that, I am used to working "in the open". The issue description is this:

For a module m located inside A.B.C, we compute the "name to use when importing m" as A.B.C.m.
However, if the import statement is happening from inside A.B.Other, it will instead use the name C.m.
As such, we need to alter the "name to use when importing m" based on the location of the import statement.

So it is "relative imports from inside a sub-package" that will likely fail.

@tanguySnoeck
Copy link
Author

tanguySnoeck commented Apr 24, 2025

Thanks for the explanation. Even though I can feel like this is related to what I'm observing, this explanation unfortunately doesn't fit my issue perfectly.
What I'm seeing is that in some cases using C.m from inside A.B.Other works, but in others it doesn't.

I will detail below my experiment to show you what I mean - all test cases below are done on the show_async_migrations.py module from label-studio (https://github.com/HumanSignal/label-studio/blob/develop/label_studio/core/management/commands/show_async_migrations.py):

  1. I replace the content of the module with the following:
from webhooks.models import Webhook
choices=Webhook.objects

I then run the following query:

from API::Node n 
where
n = modelClass().getAMember()
and PrivateDjango::DjangoImpl::DB::Models::Model::subclassRef()= "show_async_migrations.py"
select n, "this is a model class"

Result: This doesn't work and the query shows no result

  1. I replace the content of the file with the following:
from organizations.models import Organization
choices=Organization.objects

and run the same query as above.

Result: This works, and the statement is correctly flagged

The module show_async_migrations.py is located under core.management.commands. In example 1 it is importing the Organization class from the module organizations.models.py, while in example 2 it is importing the Webhook class from the module webhook.models.py. Both modules are located at the root of the project so, under the label-studio folder.

This therefore goes against the description you provided as both modules are located on the same folder level, and yet, when imported from the same location, codeQL doesn't process them equally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants