Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow forcing a constant in query filters #25630

Open
stevendarby opened this issue Aug 20, 2021 · 19 comments
Open

Allow forcing a constant in query filters #25630

stevendarby opened this issue Aug 20, 2021 · 19 comments

Comments

@stevendarby
Copy link
Contributor

I've been looking for a way to not parameterise values used in query filters. One use-case for this could be to add TenantId filters to entities in a multi-tenant DB. Tenant datasets may differ wildly and parameter sniffing can make a query plan generated for one tenant not suitable for another. A query plan for each tenant could be achieved via non-parameterised filters.

Using this example use-case, I tried passing the tenant ID value used in the filters through this custom function that uses [NotParameterized], but the expression that comes into the HasTranslation function is still SqlParameterExpression and so still produces a parameter.

public int DeParameterize([NotParameterized] int tenantId) =>
    throw new NotSupportedException("For SQL translation only");

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder
        .HasDbFunction(GetType().GetMethod(nameof(DeParameterize))!)
        .HasTranslation(x => x.First());
}

I confirmed the above approach works if I use it in a manual filter at query time, just not when used in a query filter at model building time. It would be really handy to be able to do this. Is there another way?

Tried on EF Core 6.0 preview 7.

Full example
using System;
using System.Diagnostics;
using System.Linq;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query;
using Microsoft.Extensions.Logging;

var tenantId = 1;
using (var db = new BloggingContext())
{
    //db.Database.EnsureCreated();
    _ = db.Blogs.FirstOrDefault(x => x.TenantId == db.DeParameterize(tenantId));
}

using (var db = new BloggingContextWithFilter(tenantId))
{
    _ = db.Blogs.FirstOrDefault();
}


public class BloggingContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    public int DeParameterize([NotParameterized] int tenantId) =>
        throw new NotSupportedException("For SQL translation only");

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder
            .UseSqlServer(@"Server=.;Database=DeParameterize;Integrated Security=True")
            .EnableDetailedErrors()
            .EnableSensitiveDataLogging()
            .LogTo(x => Debug.WriteLine(x), LogLevel.Information);
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder
            .HasDbFunction(GetType().GetMethod(nameof(DeParameterize))!)
            .HasTranslation(x => x.First());
    }
}

public class BloggingContextWithFilter : BloggingContext
{
    private readonly int _tenantId;

    public BloggingContextWithFilter(int tenantId)
    {
        _tenantId = tenantId;
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>().HasQueryFilter(x => x.TenantId == DeParameterize(_tenantId));

        base.OnModelCreating(modelBuilder);
    }
}

public class Blog
{
    public int Id { get; set; }
    public int TenantId { get; set; }
}
@roji
Copy link
Member

roji commented Aug 20, 2021

Tenant datasets may differ wildly and parameter sniffing can make a query plan generated for one tenant not suitable for another. A query plan for each tenant could be achieved via non-parameterised filters.

Can you provide more concrete information on this? Assuming a simple integer tenant ID, it's hard to imagine a case where reusing the same query plan across tenants wouldn't be desirable, no?

@stevendarby
Copy link
Contributor Author

stevendarby commented Aug 20, 2021

@roji Here is some more info on parameter sniffing. The answer to this SO post - in particular section 4 and within that the "TenantID first" bullet point - talks about it specifically in the case of tenant IDs.

Essentially the plan may be optimised for the dataset of one tenant which may not be suitable for another. In general parameterisation is good, but when the cardinality of each table can greatly differ per tenant, disabling it for tenant filters can be beneficial.

There are also times when parameterisation has to be disabled because a server function can only take a literal, e.g. JSON_VALUE in older versions of SQL Server. This post walks through how to set up EF Core to call it without parameterising the 2nd value. I admit it's quite unlikely someone might want to use that in a global query filter using a DbContext instance value, but if they did it currently wouldn't work. It'd be nice to have parity there, if possible :)

@roji
Copy link
Member

roji commented Aug 20, 2021

Thanks, that makes sense.

@smitpatel EF.Constant?

@smitpatel
Copy link
Member

No. This requires something beyond NotParameterizedAttribute. If we don't create parameter then we cannot cache the query plan.

@roji
Copy link
Member

roji commented Aug 20, 2021

I think that could be a reasonable expected behavior for a restricted number of tenant IDs, especially given that something better could be a lot more difficult to implement. But anyway.

@smitpatel
Copy link
Member

We want to put constant for tenant id but the suggested API is not something we need. Those are 2 different things.

@stevendarby
Copy link
Contributor Author

stevendarby commented Aug 22, 2021

I had a play around with replacing the RelationalParameterBasedSqlProcessor with a custom processor to achieve this and was successful - I think? I wouldn't use this in production without a close review and much more testing, but could be a potential basis for a workaround for some. I still think it'd be neat to make the ability to intentionally 'deparameterize' certain values in query filters easier to do with more public APIs.

Code
public class BloggingContext : DbContext
{
    // No .NET ConcurrentHashSet so dictionary with dummy value?
    public static ConcurrentDictionary<SqlParameterExpression, bool> ParametersToMakeConstant { get; } = new();

    private readonly int _tenantId;

    public DbSet<Blog> Blogs { get; set; }

    public BloggingContext(int tenantId)
    {
        _tenantId = tenantId;
    }

    public int MakeConstant([NotParameterized] int value) =>
        throw new NotSupportedException("For SQL translation only");

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder
            .UseSqlServer(@"Server=.;Database=DeParameterize;Integrated Security=True")
            .EnableDetailedErrors()
            .EnableSensitiveDataLogging()
            .ReplaceService<IRelationalParameterBasedSqlProcessorFactory, CustomParameterBasedSqlProcessorFactory>()
            .LogTo(x => Debug.WriteLine(x), LogLevel.Information);
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>().HasQueryFilter(x => x.TenantId == MakeConstant(_tenantId));

        modelBuilder
            .HasDbFunction(GetType().GetMethod(nameof(MakeConstant))!)
            .HasTranslation(x =>
            {
                var sqlExpression = x[0];
                if (sqlExpression is SqlParameterExpression sqlParameter)
                {
                    ParametersToMakeConstant[sqlParameter] = true;
                }

                return sqlExpression;
            });
    }
}

public class Blog
{
    public int Id { get; set; }
    public int TenantId { get; set; }
}

public class CustomParameterBasedSqlProcessorFactory : SqlServerParameterBasedSqlProcessorFactory
{
    private readonly RelationalParameterBasedSqlProcessorDependencies _dependencies;

    public override RelationalParameterBasedSqlProcessor Create(bool useRelationalNulls)
        => new CustomParameterBasedSqlProcessor(_dependencies, useRelationalNulls);

    public CustomParameterBasedSqlProcessorFactory(RelationalParameterBasedSqlProcessorDependencies dependencies)
        : base(dependencies)
        => _dependencies = dependencies; // Stored in protected property on the base in newer version so won't need this in future
}

public class CustomParameterBasedSqlProcessor : SqlServerParameterBasedSqlProcessor
{
    public CustomParameterBasedSqlProcessor(
        RelationalParameterBasedSqlProcessorDependencies dependencies,
        bool useRelationalNulls) : base(dependencies, useRelationalNulls)
    {
    }

    protected override SelectExpression ExpandFromSqlParameter(
        SelectExpression selectExpression,
        IReadOnlyDictionary<string, object> parametersValues,
        out bool canCache)
    {
        selectExpression = base.ExpandFromSqlParameter(selectExpression, parametersValues, out canCache);

        var visitor = new ParameterToConstantExpressionVisitor(parametersValues);
        var updatedSelectExpression = (SelectExpression) visitor.Visit(selectExpression);

        // Don't cache if we made a parameter a constant. Not sure of performance implications of this...
        canCache = canCache && Equals(selectExpression, updatedSelectExpression);

        return updatedSelectExpression;
    }
}

public class ParameterToConstantExpressionVisitor : ExpressionVisitor
{
    private readonly IReadOnlyDictionary<string, object> _parameterValues;

    public ParameterToConstantExpressionVisitor(IReadOnlyDictionary<string, object> parameterValues)
    {
        _parameterValues = parameterValues;
    }

    public override Expression Visit(Expression node)
    {
        if (node is SqlParameterExpression sqlParameter &&
            // Could put these somewhere else to decouple from the context
            BloggingContext.ParametersToMakeConstant.ContainsKey(sqlParameter) &&
            _parameterValues.TryGetValue(sqlParameter.Name, out var value))
        {
            return new SqlConstantExpression(Expression.Constant(value), sqlParameter.TypeMapping);
        }

        return base.Visit(node);
    }
}

@ajcvickers ajcvickers added this to the Backlog milestone Aug 25, 2021
@stevendarby
Copy link
Contributor Author

stevendarby commented May 3, 2022

Re: the specific multitenant use case I gave, the information in #27922 suggests query plans in SQL Server are cached per user if the tables aren't fully qualified. So I may 'accidentally' already have what I want (as I have a user per tenant).

Just sharing this information in case it's useful for anyone else reading, but the general enhancement request might still be useful for other scenarios or providers.

@stevendarby
Copy link
Contributor Author

Just a quick note, I tried the new EF.Constant in a query filter and that doesn't work either, i.e. it's ignored and parameterised anyway.

using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;
using System;
using System.Linq;

using var context = new MyContext { TenantIds = [42] };
await context.Database.EnsureCreatedAsync();
_ = await context.Set<Blog>().ToListAsync();

public class MyContext : DbContext
{
    public required int[] TenantIds { get; init; } 

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer("Server=(localdb)\\mssqllocaldb;Database=AdHoc;Trusted_Connection=True;Encrypt=False")
            .LogTo(Console.WriteLine, LogLevel.Information);

    protected override void OnModelCreating(ModelBuilder modelBuilder)
        => modelBuilder.Entity<Blog>()
            .ToTable("Blogs")
            .HasQueryFilter(x => EF.Constant(TenantIds).Contains(x.TenantId));
}

public class Blog
{
    public int Id { get; set; }
    public int TenantId { get; set; }
}
SELECT [b].[Id], [b].[TenantId]
FROM [Blogs] AS [b]
WHERE [b].[TenantId] IN (
    SELECT [e].[value]
    FROM OPENJSON(@__ef_filter__TenantIds_0) WITH ([value] int '$') AS [e]
)

@stevendarby stevendarby changed the title Allow [NotParameterized] to work for query filters Allow forcing a constant in query filters Mar 11, 2024
@roji roji self-assigned this Mar 12, 2024
@roji
Copy link
Member

roji commented Mar 12, 2024

@stevendarby yes, this is a indeed a limitation of EF.Constant()...

When processing a normal query (not a query filter), EF.Constant() gets applied by the funcletizer, which does its work up-front, before the query cache lookup is performed. That means that the constant is integrated into the query tree, and would cause a cache miss and therefore the query would be compiled as a completely new/separate query (as if you actually put a constant in the tree instead of EF.Constant(whatever).

However, query filters are integrated into the tree later - after we've already gone through the query cache (in NavigationExpandingExpressionVisitor during preprocessing, to be precise). If EF.Constant() actually worked, the query tree - with the whatever constant value happened to be there on first execution - would get cached, and later executions would use the same constant value. Effectively query filters are integrated too late into the query tree (but unfortunately they can't be applied earlier, since query filters require knowledge of entities/the model, which we don't have in such early stages). To make this work would require another kind of mechanism which we don't currently have.

We should at least block usage of EF.Constant() (and [NotParamerized]) from query filters, so this is more obvious.

@roji
Copy link
Member

roji commented Mar 12, 2024

Note: we could change the implementation of EF.Constant(), so that it allows a parameter to be embedded in the query tree, but marks it to be constantized in RelationalParameterBasedSqlProcessor (this implements @stevendarby's suggestion above in a more correct). Since we'd be sniffing parameters, we'd have to disable the 2nd query cache (but that's generally acceptable).

Aside from allowing EF.Constant() to be used in query filters and precompiled queries, this would make it considerably more efficient, as we'd only be rerunning the 2nd part of the query pipeline for each execution, and not the entire pipeline (including shaper generation).

@roji
Copy link
Member

roji commented Mar 12, 2024

@stevendarby are you interested in taking a stab at the above?

@stevendarby
Copy link
Contributor Author

@roji I'll have a stab at it sometime over the next couple of weeks. Will post back if it's not working out 😄

@roji
Copy link
Member

roji commented Mar 17, 2024

@stevendarby OK 👍 This isn't trivial.

One thing that comes to mind, is that this would also need to be implemented in each provider (e.g. Cosmos), compared to the current implementation which works universally. I still think it makes a lot of sense though.

@stevendarby
Copy link
Contributor Author

@roji sorry, not had much time to look at this and the bit of time I have, like you said, it's not trivial- so bowing out 🙂

@roji
Copy link
Member

roji commented Apr 12, 2024

@stevendarby no problem, thanks for posting back. I've been thinking about modifying the EF.Constant implementation so as to make it a better per-query opt out of OPENJSON (for #32394), and this may connect to some other ideas as well. I (or someone else from the team) may end up implementing this for 9.0, we'll see...

@Karlssn

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@roji
Copy link
Member

roji commented May 3, 2024

Note: split out the reimplementation of EF.Constant to #33674.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants