Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL session MARS/DataReader is open problem #9330

Closed
tropcicstefan opened this issue Apr 30, 2021 · 26 comments · Fixed by #11536
Closed

GraphQL session MARS/DataReader is open problem #9330

tropcicstefan opened this issue Apr 30, 2021 · 26 comments · Fixed by #11536
Labels
Milestone

Comments

@tropcicstefan
Copy link
Contributor

tropcicstefan commented Apr 30, 2021

Describe the bug

Resolving graphQLFields of content part through DataLoader causes MARS or Datareader is open exception.

To Reproduce

Latest dev branch
Make content Part with two taxonomy fields(JournalistCategory, ArticleCategory). Make custom GraphQLObjectType that exposes those data as two Field<ListGraphType<ContentItemInterface>, ContentItem[]> and resolve those fields querying DB for linked taxonomy and parsing it to get that specific ContentItem. It appears as having two or more graphQLFields when resolving through ISession is too much

`

public class CustomCategoryObjectType : ObjectGraphType<CustomCategoryPart>
{
    public CustomCategoryObjectType(IStringLocalizer<CustomCategoryObjectType> S)
    {
        Name = "CustomCategory";
        Description = S["Alternative part definition"];
        Field<ListGraphType<ContentItemInterface>, ContentItem[]>()
            .Name("journalistMetadata")
            .Description("Exposed journalist taxonomy metadata")
            .PagingArguments()
            .ResolveAsync(context =>
            {
                var serviceProvider = context.ResolveServiceProvider();
                var accessor = serviceProvider.GetRequiredService<IDataLoaderContextAccessor>();
                var session = serviceProvider.GetService<ISession>();

                var contentItemLoader = accessor.Context.GetOrAddBatchLoader<string, ContentItem>("GetJournalistsById",
                    ci => LoadTaxContentItems(ci, session, Const.Taxonomy.Journalist));
                return contentItemLoader.LoadAsync(context.Page(context.Source.JournalistCategory.TermContentItemIds));
            });
        Field<ListGraphType<ContentItemInterface>, ContentItem[]>()
            .Name("categoryMetadata")
            .Description("Exposed category taxonomy metadata")
            .PagingArguments()
            .ResolveAsync(context =>
            {
                var serviceProvider = context.ResolveServiceProvider();

                var accessor = serviceProvider.GetRequiredService<IDataLoaderContextAccessor>();
                var session = serviceProvider.GetService<ISession>();

                var contentItemLoader = accessor.Context.GetOrAddBatchLoader<string, ContentItem>("GetCategoryById",
                    ci => LoadTaxContentItems(ci, session, Const.Taxonomy.Category));
                return contentItemLoader.LoadAsync(context.Page(context.Source.ArticleCategory.TermContentItemIds));
            });
    }

    private async Task<IDictionary<string, ContentItem>> LoadTaxContentItems(
        IEnumerable<string> contentItemIds,
        ISession session,
        string taxItemId)
    {
        if (contentItemIds is null || !contentItemIds.Any())
        {
            return new Dictionary<string, ContentItem>();
        }

        var tax = await session
            .Query<ContentItem, ContentItemIndex>(y =>
                y.ContentItemId == taxItemId
                && y.Published)
            .FirstOrDefaultAsync();

        if (tax is null)
        {
            return new Dictionary<string, ContentItem>();
        }

        var contentItemsLoaded = tax.As<TaxonomyPart>().Terms.Where(x => contentItemIds.Contains(x.ContentItemId)).ToList();

        return contentItemsLoaded.ToDictionary(k => k.ContentItemId, v => v);
    }
}

`
...call GraphQL with both fields in query

query MyQuery { article(first: 10) { customCategory { categoryMetadata { displayText } journalistMetadata { displayText } } } }

Expected behavior

#4844 if we should use locking resolver why aren't content picker field and list part using it. probably type with list part and cpf can cause the same exception

Splitting fields in separate parts makes this issue occur less frequently

@carlwoodhouse
Copy link
Member

carlwoodhouse commented Apr 30, 2021 via email

@tropcicstefan
Copy link
Contributor Author

But LockedAsyncFieldResolver cannot be used in combination with DataLoader, right?

@carlwoodhouse
Copy link
Member

it should be able too, all it does is force a lock to stop parallel db calls (so eliminating the mars issue) - however looking through the code as you say the list parts should use it.

Honestly its a long time since i looked at it so memory is a bit hazy .... and im away atm so cant look properly on phone. but can investigate when im back.

(again it will be fixed anyway by the graphqldotnet uplift we're working on which has much better first class di support)

@tropcicstefan
Copy link
Contributor Author

tropcicstefan commented Apr 30, 2021

it should be able too, all it does is force a lock to stop parallel db calls (so eliminating the mars issue)

Changing ResolveAsync to ResolveLockedAsync with DataLoaders appears to cause a deadlock because request just "hangs".

...looking forward for upgrade

@carlwoodhouse
Copy link
Member

carlwoodhouse commented Apr 30, 2021 via email

@tropcicstefan
Copy link
Contributor Author

tropcicstefan commented May 1, 2021

I tried passing context or service provider and in both cases error is still occurring.
I'll switch back to LockedAsyncFieldResolver but I had similar problem with it back in october with this piece of code, when querying content type that has two resolvers like below (with different query). Do you see any problem here?
It occurred only in production in busy period of they day, for news portal.

.ResolveLockedAsync(async g =>
      {
          var serviceProvider = g.ResolveServiceProvider();
          var queryService = serviceProvider.GetRequiredService<IQueryService>();
      
          var queryParametersDict = new Dictionary<string, object>() {
              { "articleAlias", g.GetArgument<string>("articleAlias") }
          };
          var queryParam = JsonConvert.SerializeObject(queryParametersDict);
      
          return await queryService.GetItemsQueryAsync(Query.SummarySections, queryParam);
      })
public class QueryService : IQueryService
    {
        private readonly IQueryManager _queryManager;

        public QueryService(IQueryManager queryManager)
        {
            _queryManager = queryManager;
        }

        public async Task<IEnumerable<ContentItem>> GetItemsQueryAsync(string queryName, string queryParam)
        {
            var query = await _queryManager.GetQueryAsync(queryName);

            if (query == null)
            {
                throw new ArgumentException("There is no query with that name");
            }

            var queryParametersDict = queryParam != null ?
                JsonConvert.DeserializeObject<Dictionary<string, object>>(queryParam)
                : new Dictionary<string, object>();

            var result = await _queryManager.ExecuteQueryAsync(query, queryParametersDict);
            var items = result.Items
                .Cast<ContentItem>();

            return items;
        }
    }

@sebastienros
Copy link
Member

Try to lock on the session object by yourself, and also use GetAwaiter().GetResult() in the LoadTaxContentItems method though this might cause a thread pool exhaustion on high loads. This looks like this method is not awaited by GraphQL.NEt which is something we had to deal with by the past, so hopefully the new version fixes it.

@Skrypt
Copy link
Contributor

Skrypt commented Oct 29, 2021

@tropcicstefan Is this issue still relevant?

@MikeKry
Copy link
Contributor

MikeKry commented Mar 20, 2022

So if this issue is still active and cant be fixed by adding LockedAsyncFieldResolver, won't we finish the PR for GQL4 #9087 which should address this?

Big problem is, that this issue can occur unexpectedly - it happens only in combination with some field which is using dataloader (e.g. TaxonomyField), so it may not be found in development of custom field and then it is complicated to find root cause of error, because error usually does not happen in your custom field, but points to TaxonomyField.. how should you find it when you have several custom fields..

I would love to help finish that, but I lack enough information about dataloaders etc. My primary focus is to use OC as headless and I feel that this should be resolved, not sure if most people use OC as monolith instead of headless?

@Skrypt
Copy link
Contributor

Skrypt commented Mar 20, 2022

Feel free to open a Pull Request that targets ag/graphql4 branch.

@MikeKry
Copy link
Contributor

MikeKry commented Mar 21, 2022

@Skrypt

I did some work on that PR too, but I am stuck, because there are some problems with dataloaders (n+1 issue, this problem..) that I do not have information about (how to test them e.g.) and seems like nobody will provide info - I do not know how to fix something I have no info about. Is there a way to contact someone who knows more about it directly, so I could potentially fix it?

@tropcicstefan
Copy link
Contributor Author

tropcicstefan commented Mar 21, 2022

@MikeKry
If you are referencing @carlwoodhouse about n+1 problem in your pr, I think he just said that dataloaders should be tested. You can test dataloaders with content item(articles) with content picker field(linked articles) and startup SQL Server profiler. In there you should see only one query regarding content picker field. So if you query for 10 articles there should be 1 query for linked articles with long where statement, and not 10 queries with short where :D

Regarding this issue:
My assumption was I need ResolveLockedAsync when using ISession. Truth is it depends, when you want to resolve field by field(introducing n+1), without dataloader, you should use ResolveLockedAsync, but with dataloaders you should use ResolveAsync and lock inside dataloader, in delegate when you use ISession.

So way to resolve this issue is either by:
a) updating docs and saying hey when writing custom graphql fields with dataloader lock on ISession, update current uses of dataloader i.e. ContentPickerField with locking.
b) write framework implementation of IDataLoader binded to YesSQL with lock on ISession

...would like to have another pair of eyes on my take

@carlwoodhouse
Copy link
Member

carlwoodhouse commented Mar 21, 2022 via email

@carlwoodhouse
Copy link
Member

carlwoodhouse commented Mar 21, 2022 via email

@MikeKry
Copy link
Contributor

MikeKry commented Mar 29, 2022

I'll try to look at it, in meantime I found another problem which should be looked upon while resolving this issue (but maybe I just need some lock here? trying to solve for now by using concurrent collections):

When calling ResolveAsync and using dataloder with casting to part:
image

I get concurrent exception

System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. 
   A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
    at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
         at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
            at OrchardCore.ContentManagement.ContentExtensions.Get(ContentElement contentElement, Type contentElementType, String name)
               at OrchardCore.ContentManagement.ContentExtensions.Get[TElement](ContentElement contentElement, String name)
                 at OrchardCore.ContentManagement.ContentItemExtensions.As[TPart](ContentItem contentItem)

It happens inside .As<> extension method,,

Edit:
This will be resolved by removing dataloaders

@sebastienros
Copy link
Member

Are data loaders invoked in parallel threads? That would explain the issue. If it's configurable it should be "serial". This is a common issue we had years ago with the defaults in GrapQL.NET and we explicitly use serial mode for some settings since then.

@sebastienros sebastienros added this to the backlog milestone Apr 7, 2022
@sebastienros
Copy link
Member

Moving to backlog since there is some work going on with upgrading the library.

@MikeKry
Copy link
Contributor

MikeKry commented Apr 8, 2022

@sebastienros

I looked into this during analysis while doing changes in GQL4 update

I think it might be also caused by cache - dataloaders by default return cached result for same key, so it is possible that it is being used by multiple functions.

There could be two solutions
a/ create full dataloader for GetOrAddPublishedContentItemByIdDataLoader and disable cache
b/ use some concurrent collection instead of loopuk in DataLoaderExtensions.LoadPublishedContentItemsAsync

If you agree, I can try one (or both) of them in #11499

@MikeKry
Copy link
Contributor

MikeKry commented Apr 11, 2022

I ran into another issue, which seems that it might not be related directly to dataloaders.. Happens only sometimes on production env. maybe when there are multiple requests at same time.

It is again related to .As<> (could there be some bug? I am running on OC 1.3), but no dataloaders involved inside LayerQueryObjectType. There are content pickers inside widgets, so it may be same issue, just hidden under this error message.

  "errors": [
    {
      "message": "GraphQL.ExecutionError: Error trying to resolve widgets.\r\n ---> System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.\r\n   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)\r\n   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)\r\n   at OrchardCore.ContentManagement.ContentExtensions.Get(ContentElement contentElement, Type contentElementType, String name)\r\n   at OrchardCore.ContentManagement.ContentExtensions.Get[TElement](ContentElement contentElement, String name)\r\n   at OrchardCore.ContentManagement.ContentItemExtensions.As[TPart](ContentItem contentItem)\r\n   at OrchardCore.Layers.GraphQL.LayerQueryObjectType.<>c__DisplayClass0_0.<.ctor>b__5(ContentItem item)\r\n   at System.Linq.Enumerable.WhereListIterator`1.MoveNext()\r\n   at GraphQL.Execution.ExecutionStrategy.SetArrayItemNodes(ExecutionContext context, ArrayExecutionNode parent)\r\n   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)\r\n   --- End of inner exception stack trace ---",
      "locations": [
        {
          "line": 61,
          "column": 5
        }
      ],
      "path": [
        "siteLayers",
        "0",
        "widgets"
      ],
      "extensions": {
        "code": "INVALID_OPERATION"
      }
    }
  ]

@jtkech
Copy link
Member

jtkech commented Apr 12, 2022

Yes if shared objects are mutated in concurrent requests they would need to be thread safe or would need e.g. to retrieve clones from the cache, but still possible that it happens in the same request scope if there are non awaited tasks, in that case scope objects can be still concurrently mutated.

As mentioned in other issues, most of the time we have MARS issues it is due to non awaited tasks. Related to GraphQl just saw the following using a Task.WhenAll(tasks), a WhenAll() waits that all tasks complete but tasks are not waiting for each others. So if related, would need to use e.g. a simple foreach loop.

public static Task<T[]> LoadAsync<TKey, T>(this IDataLoader<TKey, T> dataLoader, IEnumerable<TKey> keys)
{
var tasks = new List<Task<T>>();
foreach (var key in keys)
{
tasks.Add(dataLoader.LoadAsync(key));
}
return Task.WhenAll(tasks);
}

The above being used by ContentPickerFieldQueryObjectType, do you have ContentPickerFields?

Maybe related

@jtkech
Copy link
Member

jtkech commented Apr 12, 2022

Hmm, maybe a good chance that the usage of Task.WhenAll(tasks) in ContentPickerFieldQueryObjectType explains the MARS and collection accesses issues, this is the only place where I found a remaining Task.WhenAll(tasks).

I think it is worth to try, I may suggest a PR when I will have time

@MikeKry
Copy link
Contributor

MikeKry commented Apr 13, 2022

@jtkech

yes, for now I have seen it only in queries that contain (more than one) ContentPickerFields (or custom fields with same dataloader), so it might be this.

@tropcicstefan
Copy link
Contributor Author

That is not a solution because how dataloaders work and that method isn't called anywhere.
Look at #9769

The solution for this issue can be resolved in graphql version update branch with SerialDocumentExecuter like this

public class SerialDocumentExecuter : DocumentExecuter
{
    private static IExecutionStrategy ParallelExecutionStrategy = new ParallelExecutionStrategy(); 
    private static IExecutionStrategy SerialExecutionStrategy = new SerialExecutionStrategy();
    private static IExecutionStrategy SubscriptionExecutionStrategy = new SubscriptionExecutionStrategy();

    protected override IExecutionStrategy SelectExecutionStrategy(ExecutionContext context)
    {
        return context.Operation.OperationType switch
        {
            OperationType.Query => SerialExecutionStrategy,
            OperationType.Mutation => SerialExecutionStrategy,
            OperationType.Subscription => SubscriptionExecutionStrategy,
            _ => throw new InvalidOperationException($"Unexpected OperationType {context.Operation.OperationType}"),
        };
    }
}

@jtkech
Copy link
Member

jtkech commented Apr 15, 2022

@tropcicstefan

Yes the method I changed isn't called anywhere, this is the one defined just before that uses a Task.WhenAll() and that is called in ContentPickerFieldQueryObjectType, was too tired ;)

In fact I updated the wrong method, will update it tomorrow.

@jtkech
Copy link
Member

jtkech commented Apr 16, 2022

@tropcicstefan Okay I better understand now, thanks.

@tropcicstefan @MikeKry So I will revert #11536, see my comment #11548 (comment), also about a possible solution.

@MikeKry
Copy link
Contributor

MikeKry commented Apr 16, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants