Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overfetching problem when querying related entities and using intermediary projections #2373

Open
aradalvand opened this issue Sep 24, 2020 · 67 comments
Assignees
Labels
Area: Data Issue is related to filtering, sorting, pagination or projections Area: Projections 🌶️ hot chocolate 🔍 investigate Indicates that an issue or pull request needs more information. 📌 pinned
Projects
Milestone

Comments

@aradalvand
Copy link
Contributor

aradalvand commented Sep 24, 2020

Following #2365 (comment)
I have Book and Author domain classes + BookDto and AuthorDto classes, I want to expose an IQueryable<BookDto> for the books field on my query root type.
So here is the GetBooks method on the Query class: It simply projects Book into BookDto and returns an IQueryable<BookDto> so that Hot Chocolate could then do its magic on it.

[UseSelection]
public IQueryable<BookDto> GetBooks([Service]AppDbContext dbContext)
{
    return dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
        }
    });
}

Now, when you run the following query:

{
  books {
    name
    author {
      name
    }
  }
}

You would probably expect the following SQL to ultimately get generated:

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

However, instead, Hot Chocolate causes this one to get generated by EF Core:

SELECT [b].[Name], [a].[Id], [a].[Name], [a].[Age]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

Which retrieves all the fields on the author (including Age and Id in this case), even though the query requested only the Name.

At first, I thought this must be an EF Core issue, but then I executed the following code:

var queryableDto = dbContext.Books.Select(book => new BookDto
{
    Name = book.Name,
    Author = new AuthorDto
    {
        Id = book.Author.Id,
        Name = book.Author.Name,
        Age = book.Author.Age
    }
});
// The code that you would expect Hot Chocolate will generate for the above GraphQL query:
queryableDto.Select(bDto => new
{
    Name = bDto.Name,
    Author = new
    {
        Name = bDto.Author.Name,
    }
}).ToList();

And I realized it works as expected, and generates the following SQL which only selects the requested column (author name in this case):

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

Therefore, this must be related to Hot Chocolate, I believe.
This is really a serious problem for situations like mine when we don't want to expose IQueryable<TheEntityFrameworkModelClass> but rather a different type.

I'm using Hot Chocolate version 10.5.2
And I've tried this with both EF Core version 3.1.8 and 5.0.0 RC.

@aradalvand aradalvand changed the title Fetching unneccsary columns when using DTOs. Fetching unneccsary columns when projecting to a new class Sep 24, 2020
@PascalSenn
Copy link
Member

i will track this down in the new implementation for 11.

It should just work, because we prjoect like this:

{
  books {
    name
    author {
      name
    }
  }
}
.Select(x=> new BookDto
            {
                Name = x.Name,
                Author = new AuthorDto
                { 
                    Name = x.Author.Name,
                }
            })

@PascalSenn PascalSenn self-assigned this Sep 24, 2020
@PascalSenn PascalSenn added 🔍 investigate Indicates that an issue or pull request needs more information. 🌶 hot chocolate labels Sep 24, 2020
@PascalSenn PascalSenn added this to the HC-11.0.0 milestone Sep 24, 2020
@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn Ok, great to hear. Then we'll have to wait for v11 before we can deploy our app so that this issue is fixed until then. Thanks.

It should just work, because we prjoect like this:

Well, as I mentioned here, running the code that you said that Hot Chocolate would generate from the GraphQL query:

queryableDto.Select(bDto => new
{
    Name = bDto.Name,
    Author = new
    {
        Name = bDto.Author.Name,
    }
}).ToList();

would NOT generate the SQL that Hot Chocolate causes to be generated. I talked about this in the original comment and also here. So I think that's probably not exactly how the projection works in Hot Chocolate.
And I feel like this might be a little mistake or something somewhere, it's probably not that big of a deal on your side, I don't know, I'm just guessing! But it is, however, a very big deal for our side! Since all the data of a related entity gets fetched unnecessarily when you request even just one column. So, I'd appreciate it if you look into this in the coming days! Thank you so much.

@PascalSenn
Copy link
Member

Does this query generate the desired result?

queryableDto.Select(x=> new BookDto
            {
                Name = x.Name,
                Author = new AuthorDto
                { 
                    Name = x.Author.Name,
                }
            }).ToList();

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn Yes, that query would generate the following SQL:

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

which is perfect. It only retrieves the name of the book and the name of the author, as requested.

However, sending this GraphQL query to our Hot Chocolate server:

{
  books {
    name
    author {
      name
    }
  }
}

would generate this SQL instead:

SELECT [b].[Name], [a].[Id], [a].[Name], [a].[Age]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

which as you can see is unnecessarily fetching the Id and Age of the author, while they were not queried.

I can also create a public GitHub repo and push this project into it so you can clone it and look into it and into the generated queries yourself. A super-simple project with less than 15 classes in total.

@PascalSenn
Copy link
Member

Yes a repro would be great

Are there any custom resolvers on author?

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn Sure! I'll do it and let you know.
And no, nothing, it's a just class with a handful of properties. I've added nothing HotChocolate-specific to it at all.
I'll create the repo shortly ;)

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn Here you go, sir: https://github.com/AradAral/AspNetCoreGraphQL

Edit: I added some seed data for the database so if you want to clone the project you can just adjust the connection string and then run one initial migration, as you know, and then you can use the GraphQL endpoint and inspect the generated SQL queries. I also added a MyController, which manually does the querying; you can check the query that that generates too.
Thanks in advance.

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn: :)

Untitled Project

@PascalSenn
Copy link
Member

@AradAral cool, the repro is definitely helpful.
Checked the configuration didn't see anything obvious wrong with it.

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

@PascalSenn I just noticed something else which I think might be important:
I removed the UseSelection middleware from the GetBooks method, but it still gave me the author data if I queried it, which normally wouldn't happen unless you use the UseSelection middleware.
Do you think that can tell us anything about the issue?

@PascalSenn
Copy link
Member

Did you also remove it from the resolver declared in the QueryType?

It is probably always requested in this case, because Select statement that maps from object to dto projects it already

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 24, 2020

Yes, I actually removed the whole QueryType class and it's now just the Query class, without the UseSelection attribute on top of the resolver.
Now that the UseSelection middleware is not present irrespective of what field I query everything (including the author data) is retrieved.

@aradalvand
Copy link
Contributor Author

aradalvand commented Sep 26, 2020

@PascalSenn
I think I just found out what's causing the problem:
By building an expression visitor and inspecting the expression that Hot Chocolate passes to the Select method, I just realized that it (the expression Hot Chocolate generates) looks like this:

e => new BookDto()
    {
        Name = e.Name,
        Author = e.Author == null ? default(AuthorDto) : new AuthorDto() { Name = e.Author.Name }
    }

The important bit is where it's checking whether Author is null or not (e.Author == null ? ...), that, as it turns out, is exactly what's causing the whole author object to be fetched from the database. This works properly with EF Core domain classes, in other words, if e.Author was an entity class, but not with DTO classes like this.

So, now, what you do think can be done about this? Does Hot Chocolate allow us to disable null checks so that it leaves it up to me to check for nulls, or perhaps do the null check differently so as to avoid this, or really anything that could solve this problem somehow?
Thank you in advance!

@aradalvand
Copy link
Contributor Author

aradalvand commented Nov 10, 2020

Based on the analysis above, I'll close this issue.
For anyone curious, I eventually ended up writing a custom expression visitor that replaces any occurrence of e.Author == null that Hot Chocolate generates to e.Author.Id == null, so as to avoid the aforementioned over-fetching problem.

@aradalvand
Copy link
Contributor Author

@PascalSenn Hi!
I didn't know that, what change did you make to fix this? I thought there was nothing you could do about it.

@PascalSenn
Copy link
Member

Sorry i messed something up.
This could still be an issue

@PascalSenn
Copy link
Member

we just fixed UseProjections to do this:
Author = e.Author == null ? default(AuthorDto) : new AuthorDto() { Name = e.Author.Name }

I need to check what it does with reprojection

@aradalvand
Copy link
Contributor Author

@PascalSenn
But isn't Author = e.Author == null ? default(AuthorDto)... what HC already did in the previous versions? What has changed exactly?

@PascalSenn
Copy link
Member

PascalSenn commented Nov 10, 2020

Yes, but UseProjection did something different.
It returned null values.
I was thinking this issue was related to that, then i read more into it :)
sorry for the messup

@aradalvand
Copy link
Contributor Author

@PascalSenn Aha! Okay, got it!
No problem ;)

@gojanpaolo
Copy link

gojanpaolo commented Jan 6, 2021

It looks like there's still an issue on deeply nested entities.

[UseProjection]
public IQueryable<BookDto> GetBooks([Service]AppDbContext dbContext)
{
    return dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
            Address = new AddressDto
            {
                Street = book.Author.Address.Street
            }
        }
    });
}

where the Address is still joined when we only query like so

{
  books {
    name
    author {
      name
    }
  }
}

@aradalvand
Copy link
Contributor Author

aradalvand commented Jan 6, 2021

@gojanpaolo Hi!
It's actually not just about deeply nested objects. Whenever you have something like [Something] = new [Something]Dto { } in your projection that you then return in your resolvers, even one field getting requested by the GraphQL client on that [Something]Dto causes the whole object to be retrieved from the database. I explained why that happens here.

The reason why in your case Address is being retrieved is because, again, you're querying Author.Name and consequently the whole Author object is getting retrieved, which means everything inside of it is getting retrieved, including every nested object, like Address in your case.

@gojanpaolo
Copy link

I was expecting the projection to behave similar to

var dto = 
   dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
            Address = new AddressDto
            {
                Street = book.Author.Address.Street
            }
        }
    });

var result =
    dto.Select(bookDto => new
    {
        Name = bookDto.Name,
        Author = new
        {
            Name = bookDto.Author.Name
        }
    })
    .ToList();

which EF correctly translates to the SQL query I expected, i.e. no join on the Address.

I believe this is something we can fix either in hotchocolate/UseProjection itself, or in my graphql layer code.

@jannikbuschke
Copy link

Im also experiencing overfetching:

the query

users { id }

works fine and produces the following SQL

SELECT [u].[Id]
FROM [User] AS [u]

whereas this query

users {
  id
  author { name }
}

unecessary loads more data:

SELECT [u].[Id], [a].[Id], [a].[Name], [a].[UserId], [b].[Id], [b].[AuthorId], [b].[Title]
FROM [User] AS [u]
LEFT JOIN [Author] AS [a] ON [u].[Id] = [a].[UserId]
LEFT JOIN [Book] AS [b] ON [a].[Id] = [b].[AuthorId]
ORDER BY [u].[Id], [a].[Id], [b].[Id]
[UseProjection]
public IQueryable<User> Users([Service] DataContext ctx)
{
     return ctx.User
         .Include(v => v.Author)
         .ThenInclude(v => v.Books);
}

Using efcore 5 and hot chocolate 11

@aradalvand
Copy link
Contributor Author

aradalvand commented Apr 19, 2022

@PascalSenn

Because it doesnt solve the issue?

It does. That's what I'm currently doing — via a custom expression visitor, as I explained before — and it's working.

This will never be true e.Author.Id == null. Unless the ID of author is null and i doubt that this is the intend

This is just a way to tell EF Core to check whether or not an Author exists or not, since it doesn't understand e.Author itself, if you do the null check on that, it will just fetch the whole thing. But if you do .Id, it will look at the intermediate projection and find out what the Id property maps to, and consequently generate the right SQL query.
It does work.

@aradalvand
Copy link
Contributor Author

aradalvand commented Apr 19, 2022

@PascalSenn I can create a quick repro for you that demonstrates this, if you want. Let me know.

@PascalSenn
Copy link
Member

@aradalvand I added a unittests with a small playground in the pr #4970. Can you add it there?

@aradalvand aradalvand changed the title Fetching unneccsary columns when projecting to a new class Overfetching problem when querying related entities and using intermediary projections Apr 23, 2022
@stale stale bot added the ⌛ stale Nothing happened with this issue in quite a while label Jun 22, 2022
@stale stale bot removed the ⌛ stale Nothing happened with this issue in quite a while label Jun 22, 2022
@ChilliCream ChilliCream deleted a comment from stale bot Jun 22, 2022
@ChilliCream ChilliCream deleted a comment from aradalvand Jun 22, 2022
@kresimirlesic
Copy link

@PascalSenn @michaelstaib has there been any progress on this issue?

@dotarj
Copy link

dotarj commented Oct 15, 2022

@aradalvand In your comment on 10 November 2020 you mentioned that you wrote a custom expression visitor replacing occurrences of e.Author == null. I have a related problem in which I also need to modify the expression tree. Could you share a snippet of how you wrote this expression visitor and where you registered it?

@aradalvand
Copy link
Contributor Author

aradalvand commented Oct 15, 2022

@dotarj See https://gist.github.com/aradalvand/9b70e8bb455f5398affba610f2a25625

You would then need to do .AsTranslatable() before returning your IQueryable from your resolvers.

@kresimirlesic
Copy link

@michaelstaib any plans to get this one on the roadmap now that the v13 is live? :)

@tobias-tengler tobias-tengler added 🌶️ hot chocolate Area: Data Issue is related to filtering, sorting, pagination or projections labels Feb 9, 2023
@aradalvand
Copy link
Contributor Author

Hey, any updates on this? Won't you consider it for v14?

@michaelstaib
Copy link
Member

@aradalvand I added a unittests with a small playground in the pr #4970. Can you add it there?

I think this was the latest state ...

From our side this has no traction at the moment.

From my perspective on the GraphQL core I do not want a Key descriptor. I also think its not necessary ....

Can you follow up on pascals question?

@aradalvand
Copy link
Contributor Author

aradalvand commented Mar 29, 2023

@michaelstaib

I think this was the latest state ...

Not really, this was the latest state — you said:

This is exactly why we are so opposed to the key solution. I will talk to Shay from the EF team about this. HotChocolate.Data will get a major update for 13 and we have these issues on our list to solve.

I'd be curious to know if you actually did speak to Shay about this, and if so then what was the result?

I also wanted to point out #2365, if it's implemented it likely solves this whole class of problems, please consider that too. That was the very first issue I opened about this sort of thing, and I suspect it may be simpler to actually implement.

@michaelstaib
Copy link
Member

We did at some point discuss projections. But the focus was more on Keyset pagination and nested queries. Overall we did not revamp projections for Hot Chocolate 13 and postponed it for some point later down the road.

There is a new project from Microsoft that does true binding to the database. Called Azure Data API Builder which under the hood uses Hot Chocolate but does full projections.

Might this be an alternative for you?

For us projections at this point has no priority for this year as we are busy tackling other big topics like AOT, Fusion, Authorization, and related topics.

We still have the larger overhaul of projections on our mind but pushed it back to later this year beginning of next year. Not sure yet. Depends also always on the need.

@michaelstaib
Copy link
Member

Here is the link of to the project:
https://github.com/Azure/data-api-builder

@michaelstaib
Copy link
Member

michaelstaib commented Mar 29, 2023

You also could start a community project bringing in the missing parts... with type interceptors HC is highly extendable.
If you do so we can help guide you to get this working.

@aradalvand
Copy link
Contributor Author

aradalvand commented Mar 29, 2023

You also could start a community project bringing in the missing parts... with type interceptors HC is highly extendable.
If you do so we can help guide you to get this working.

Yeah I'm thinking about giving that a try.

Though HC's lower-level components are generally not documented that well so I need some help as you said, where can I ask questions? The Slack channel?

@michaelstaib
Copy link
Member

Slack channel is a good place ... we can have a chat together with pascal and charter a way forward.

@kresimirlesic
Copy link

Hey everyone, was any progress made on this topic?

@aradalvand
Copy link
Contributor Author

aradalvand commented Aug 29, 2023

Hi @kresimirlesic, see the latest dicussion at #6002 and specifically this comment of mine.
I managed to build a plugin-esque thing that accomplishes all of this, which I'm actually really satisfied with — you can check out the prototype here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Data Issue is related to filtering, sorting, pagination or projections Area: Projections 🌶️ hot chocolate 🔍 investigate Indicates that an issue or pull request needs more information. 📌 pinned
Projects
On-Deck
Pascal