Examinex escape incorrectly value for -1 #100

bielu · 2024-04-08T13:32:11Z

Hi @Shazwazza ,
When working on project where examinex was requested by client, I found similar bug to Novicell/Novicell.Examine.ElasticSearch#17
When passing my field value as -1, it is the lucene query retrieve by examinex:
+(+x__IndexType:content) +x__Published:y -hideFromSearch:1 +spath:1
which cause incorrect results as my spath has indexed -1!
In my elastic search provider i resolved it by escaping negative ints, not sure if that is correct approach for this however.

Edit: Forgot mentioned versions:
Umbraco 13.2.2
Examinex 6.0.0-beta.2

Shazwazza · 2024-04-08T14:59:21Z

The - is a reserved character in Lucene so the lucene parser will remove it unless you escape it for searching with prefixed backslashes.

Can you please advise on what field type you are searching on and what query you are using to search along with what data is in that field? I need this to try to replicate.

There are several unit tests in ExamineX for testing paths with -1 so need to know specifically what and how you are searching.

Thanks!

bielu · 2024-04-08T15:41:00Z

It is example what I do with query:

and there is field type:

and there is how it is index:

it is interesting that it works with normal examine on lucene and generate correctly -1 😂 but as soon it comes here it doesnt

Shazwazza · 2024-04-08T15:58:35Z

So spath is being indexed as a normal full text field type with a standard analyzer, so all stop words and punctuation, etc... are stripped out - is this what you are expecting?

Similarly, because the field type is full text and the analyzer is standard for that field, when it parses the query, it will strip off reserved words and punctuation (like the minus symbol). If you want to retain the minus symbol in your search query then you would need to escape the value, like request.ListingId.ToString().Escape()

But I don't know what the value for ListingId is, can you please specify? Is it always a specific number or is it a path value, etc... ? Perhaps standard analyzer isn't what you should be using on this field, it really depends on what this field is.

bielu · 2024-04-08T16:09:50Z

Listing id is always int, so values would be spath can be index as keyword, but it still not explain why switching to azure search makes difference between how query looks 🤔
i mean here, based on docs, the indexing and querying should be 1:1 with lucene examine. So i would expect both to return same lucene query :)

Shazwazza · 2024-04-08T16:43:36Z

I am assuming that the ToString() for your query, regardless of whether ExamineX is enabled or not will result in this:

+(+x__IndexType:content) +x__Published:y -hideFromSearch:1 +spath:1 ?

With the - character stripped out? It is the same query parser used in both so I would expect the query to be generated the same way.

I can help in investigating why this would work with Lucene as opposed to Azure Search if I can replicate.

Based on what your requirements are, I would suggest trying to escape the value passed to the query (i.e. request.ListingId.ToString().Escape() and see if that works in both engines, else I would think whitespace analyzer would work well. Another alternative, would be to use a keyword analyzer (i.e. Raw examine field) which is what the normal Umbraco Path field is indexed as and then use path queries that are escaped (that is ... if this field is actually like an Umbraco path).

bielu · 2024-04-08T17:04:22Z

@Shazwazza yes I didnt change anything around the code of this, when installing examinex, but I will need play around little mroe to figure out little more or change analyzer :)

Shazwazza · 2024-04-08T18:27:16Z

What is the exact value that is being sent to the index for spath? I see that it ends up as "-1 -20 1250" in the Azure Search index, but what is the value being sent there? Is it "-1,-20,1250"?

bielu · 2024-04-08T18:30:13Z

the exact submited value is "-1 -20 1250", what I need check if it is just a string or collection as I was sure I am submitting collection 🤔

Shazwazza · 2024-04-08T19:37:03Z

I put a couple of rudimentary tests together and the results are the same between Examine and ExamineX with Azure Search:

Examine Test:

[Test]
public void Can_Search_On_Negative_Numbers_With_Standard_Analyzer()
{
    var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);
    using (var luceneDir = new RandomIdRAMDirectory())
    using (var indexer = GetTestIndex(luceneDir, analyzer))
    {
        indexer.IndexItems(new[] {
            ValueSet.FromObject(1.ToString(), "content",
                new { spath = "-1 -20 1250" }),
            ValueSet.FromObject(2.ToString(), "content",
                new { spath = "-1 -20 1251" }),
            ValueSet.FromObject(3.ToString(), "content",
                new { spath = "-1 -20 1252" }),
            ValueSet.FromObject(4.ToString(), "content",
                new { spath = "-1 -20 1253" }),
        });

        var searcher = (BaseLuceneSearcher)indexer.Searcher;

        var query = searcher
            .CreateQuery(IndexTypes.Content)
            .Field("spath", "-1");

        Console.WriteLine(query);
        var results = query.Execute();

        Assert.AreEqual(4, results.TotalItemCount);
    }
}

ExamineX Test:

[Test]
public void Can_Search_On_Negative_Numbers_With_Standard_Analyzer()
{
    using (AzureIndex.ProcessNonAsync())
    {
        AzureIndex.IndexItems(new[] {
            ValueSet.FromObject(NewId(), "content",
                new { spath = "-1 -20 1250" }),
            ValueSet.FromObject(NewId(), "content",
                new { spath = "-1 -20 1251" }),
            ValueSet.FromObject(NewId(), "content",
                new { spath = "-1 -20 1252" }),
            ValueSet.FromObject(NewId(), "content",
                new { spath = "-1 -20 1253" }),
        });
    }

    var s = GetDelayedSearcher();

    var query = s
        .CreateQuery(IndexTypes.Content)
        .Field("spath", "-1");

    Console.WriteLine(query);
    var results = query.Execute();

    Assert.AreEqual(4, results.TotalItemCount);
}

In both cases, the query is parsed exactly the same: { Category: content, LuceneQuery: +spath:1 }

which is expected because the field is using StandardAnalyzer in both scenarios.

I'm not sure if this helps at all, but might help show what is different between these tests vs how your query/index is put together.

bielu · 2024-04-17T09:43:46Z

I debugged it and it is weird one, but found workaround so for now. It seems like for some reason it works when you index values as string, but not when you index them as collection 🤔

Shazwazza · 2024-04-17T13:26:00Z

Ok thanks, I will try to replicate indexing the values that way.

Shazwazza added the question Further information is requested label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examinex escape incorrectly value for -1 #100

Examinex escape incorrectly value for -1 #100

bielu commented Apr 8, 2024 •

edited

Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024 •

edited

Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024 •

edited

Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024

Shazwazza commented Apr 8, 2024

bielu commented Apr 17, 2024

Shazwazza commented Apr 17, 2024

Examinex escape incorrectly value for -1 #100

Examinex escape incorrectly value for -1 #100

Comments

bielu commented Apr 8, 2024 • edited Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024 • edited Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024 • edited Loading

Shazwazza commented Apr 8, 2024

bielu commented Apr 8, 2024

Shazwazza commented Apr 8, 2024

bielu commented Apr 17, 2024

Shazwazza commented Apr 17, 2024

bielu commented Apr 8, 2024 •

edited

Loading

bielu commented Apr 8, 2024 •

edited

Loading

bielu commented Apr 8, 2024 •

edited

Loading