Skip to content

Pagination is broken in presence of generators or multiple providers #68

@Torvin

Description

@Torvin

RequestHelper.QueryWithContinuation() doesn't correctly implement pagination logic. It retains the continuation parameters from the previous request. E.g. if the API returns these continuation objects in 2 consecutive requests:

"continue": {
    "clcontinue": "aaa",
    "continue": "||"
}
"continue": {
    "gcmcontinue": "bbb",
    "continue": "gcmcontinue||"
}

The next request sent by the library will have

"clcontinue": "aaa"
"gcmcontinue": "bbb",
"continue": "gcmcontinue||"

I.e. "clcontinue": "aaa" will still be sent, even though it shouldn't be there. This results in incomplete data.

I wasn't able to reproduce this issue with standard providers easily, but it's reproducible with this CategoryPropertyProvider I wrote to retrieve categories of the page (I don't know why but the original CategoryPropertyProvider is marked as internal):

class CategoryPropertyProvider : WikiPagePropertyProvider<CategoryPropertyGroup>
{
    public override string PropertyName => "categories";
    public int PaginationSize { get; set; }

    public override IEnumerable<KeyValuePair<string, object>> EnumParameters(MediaWikiVersion version)
    {
        yield return KeyValuePair.Create("clshow", "!hidden" as object);
        yield return KeyValuePair.Create("cllimit", PaginationSize as object);
    }

    public override CategoryPropertyGroup ParsePropertyGroup(JObject json)
    {
        return new CategoryPropertyGroup(json[PropertyName]?.Select(x => x.Value<string>("title")).ToArray());
    }
}

class CategoryPropertyGroup : WikiPagePropertyGroup
{
    public CategoryPropertyGroup(IReadOnlyList<string> categories)
    {
        Categories = categories ?? Array.Empty<string>();
    }

    public IReadOnlyList<string> Categories { get; }
}
static async Task Main()
{
    using var client = new WikiClient();
    var site = new WikiSite(client, "https://en.wikipedia.org/w/api.php");
    await site.Initialization;

    // set both limits to 1 to allow the bug manifest more easily
    var result = await new CategoryMembersGenerator(site)
    {
        CategoryTitle = "Category:Oceanian_cuisine",
        MemberTypes = CategoryMemberTypes.Page,
        PaginationSize = 1,
    }.EnumPagesAsync(new WikiPageQueryProvider
    {
        Properties =
        {
            new CategoryPropertyProvider() { PaginationSize = 1 }
        }
    }).Select(item => Tuple.Create(item.Title, item.GetPropertyGroup<CategoryPropertyGroup>())).ToArrayAsync();

   var test = result.Where(x => x.Item1 == "Australian cuisine").SelectMany(x => x.Item2.Categories).Count();
}

Here test will be 0 even though the real article has 2 categories.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions