Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search SDK: Cannot merge null values using Index<T>/IndexAsync<T> #1804

Closed
jsgoupil opened this issue Feb 18, 2016 · 12 comments
Closed

Search SDK: Cannot merge null values using Index<T>/IndexAsync<T> #1804

jsgoupil opened this issue Feb 18, 2016 · 12 comments
Assignees
Labels
Search Service Attention This issue is responsible by Azure service team.

Comments

@jsgoupil
Copy link

After more investigation, I found that the client serializer is overriding my NullValueHandling variable by setting it to Ignore.
This is highly problematic as if I try to do "IndexAsync" with null values, they simply do not go to the index! This is for the case or MergeOrUpload.

Basically, an upload with property set to null not being sent is fine. But the merge is not. I need to send the null!

@brjohnstmsft brjohnstmsft changed the title NullValueHandling is overriden to Ignore and should not Search SDK: NullValueHandling is overriden to Ignore and should not Feb 18, 2016
@brjohnstmsft brjohnstmsft self-assigned this Feb 18, 2016
@brjohnstmsft brjohnstmsft changed the title Search SDK: NullValueHandling is overriden to Ignore and should not Search SDK: Cannot merge null values using Index<T>/IndexAsync<T> Feb 18, 2016
@brjohnstmsft
Copy link
Member

One workaround would be to use Document instead of your own model type to merge changes. Not ideal, but if you're really stuck it is an alternative. In the meantime we'll think about how best to fix this.

@jsgoupil
Copy link
Author

@brjohnstmsft I think using a Document will do the same.
It ends up calling CreateSerializerSettings which will override the NullValueHandling.

@brjohnstmsft
Copy link
Member

It should work; we have test coverage for the Document case. See CanMergeDynamicDocuments in this file: https://github.com/Azure/azure-sdk-for-net/blob/AutoRest/src/Search/Search.Tests/Tests/IndexingTests.cs

@jsgoupil
Copy link
Author

@brjohnstmsft My apology, it is working yes. I thought Document was using a dynamic, but it is a Dictionary so it's treated a little different.

Here is some code for people who wants to convert their <T> into a Document

private Document GetDocument(T obj)
{
    var doc = new Document();
    foreach (var m in typeof(T).GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance))
    {
        doc.Add(m.Name, m.GetValue(obj, null));
    }
    return doc;
}

@brjohnstmsft brjohnstmsft self-assigned this Apr 12, 2016
brjohnstmsft added a commit to brjohnstmsft/azure-sdk-for-net that referenced this issue Apr 20, 2016
…odel

This is in response to the following issue:

Azure#1804

The new test shows how to set NullValueHandling on a per-property basis to
allow for merging of null values.
@brjohnstmsft
Copy link
Member

I did some testing and found a way to get the right merge semantics, at least on a per-field basis. The trick is to specify [JsonProperty(NullValueHandling = NullValueHandling.Include)] on any properties for which this should apply. See the referenced PR for examples.

We looked at more comprehensive ways of fixing this (e.g. -- introducing some type of Delta<T> class that knows when its properties have been set) and they all had fairly ugly tradeoffs. We're closing this for now unless we get more feedback that this is causing a lot of problems for people.

@jsgoupil
Copy link
Author

@brjohnstmsft Thanks for the update. This is exactly how I wanted to treat my null values, but back when I opened the bug, your serializer was somehow overriding my settings... maybe this changed, or I wasn't aware of this contract serializer.

@brjohnstmsft
Copy link
Member

@jsgoupil Nothing changed in the implementation, I just found a way to override NullValueHandling on a per-property basis instead of globally. It does require changes to the model class to add the [JsonProperty] attribute though.

@jsgoupil
Copy link
Author

@brjohnstmsft I found it finally, I'm not familiar with your code structure on github (it seems that most of the code is generated), but I found this by decompiling the DLL.

// Microsoft.Azure.Search.DocumentsOperations
public Task<AzureOperationResponse> IndexWithHttpMessagesAsync(IndexBatch batch, SearchRequestOptions searchRequestOptions = null, Dictionary<string, List> customHeaders = null, CancellationToken cancellationToken = default(CancellationToken))
{
...

JsonSerializerSettings settings = JsonUtility.CreateDocumentSerializerSettings(this.Client.SerializationSettings);

And this line overrides the NullValueHandler.
So instead of having to set this JsonProperty everywhere, if this code was not overriding the SerializationSettings, we could have it set at only one spot.

@brjohnstmsft
Copy link
Member

Yes, the behavior of CreateDocumentSerializerSettings is intentional and we need it to be the default for a couple reasons:

  1. For very wide indexes, we don't want to serialize many, many nulls. This would greatly increase the size of batches, consuming bandwidth and server resources.
  2. For merges where you forget to set a nullable property, it is much safer for the default behavior to be "ignore this field" instead of "overwrite this field with null".

Point 2 is especially important. That's why it's better to explicitly opt in to NullValueHandling.Include. I looked for a way to do this globally via SerializationSettings, but that isn't possible because Include is the default, so CreateDocumentSerializerSettings has no way of knowing whether it was set explicitly or not. That leaves the current workaround of setting it on a per-property basis via [JsonProperty].

@jsgoupil
Copy link
Author

jsgoupil commented Apr 21, 2016

@brjohnstmsft I am not sure I see the scenario... Let me explain this scenario.

Imagine you have this saved in Azure Search

class A {  
    public int? Z  
    public int? Y  
    public string X  
    public string V  
}

If I load a bunch of A from the search, do some manipulation on the properties, including setting some to NULL, my intent is definitely to propagate the correct data to the service. In your current case, you are saying I should have this heavy JsonProperty on every single nullable properties. I would never set a property to NULL because I don't want it to be updated on the server or improve performance. It makes no sense.

Now the scenario I think you are trying to cover is if I do a "Select" in the Search Parameter. In this case, if I request Z and Y, obviously I don't have X and V so them being sent with NULL would be incorrect. However, if I'm about to request only Z and Y, I should have another class so I don't carry null properties around in my app. So I would have

class APrime {  
    public int? Z  
    public int? Y  
}

In this case, once again, if I change Z to be null, and Y to be another integer, my intent is pretty clear what I'm trying to do.

This is why I'm proposing that if I set this NullValueHandling.Include globally, I am well aware of what I am doing. I can handle your points 1 and 2 by having my prime classes. I don't think the SDK should forbid me from doing this.

This null value handling is such a big deal for your SDK, you should not rely solely on JSON.net for deciding of this behavior. I propose that you should have your own setting for controlling such thing.

All properties on Azure Search are Nullable. Which means if I decide to have all my properties Nullable in my concrete class, it means I expect that I can set these values to NULL. I can't code a class with Nullable that I give to my developer and expect them to know that they are NOT ALLOWED to set it to NULL, because NULL actually means "performance". If my intent is to NOT have nullable properties, then I'll have a normal "int". Then if I want to control what is being sent to Azure, I will have prime classes like above.

Sure you can keep NullValueHandling.Ignore being the default, but I should be able to override it. The SDK is suppose to help not hinder.

@brjohnstmsft
Copy link
Member

@jsgoupil Your analysis is ignoring one important dimension: Merge versus Upload. If your intent is to set every single property to some value including possibly null, then you can do that today: Just use Upload instead of Merge.

The problem with Merge and NullValueHandling.Include would happen when somebody does this:

// I want to change the values of Z and X, but leave Y and V unchanged.
doc.Z = "new value";
doc.X = 123;

client.Documents.Index(IndexBatch.Merge(new[] { doc }));

// If we use NullValueHandling.Include, then Y and V get clobbered and data is lost.

To recap, if you want to set all the properties, use Upload. If you want to set only some properties, use Merge. If you need to set some of those properties explicitly to null, either use the untyped overload of Index/IndexAsync or use [JsonProperty] on your model class.

I agree that these options are not perfect, and that some way to override the NullValueHandling setting globally would be better. However, every scenario is at least possible today.

Other than this thread, we haven't had any customers report this as an issue. Right now we're focused on shipping a new preview version of the SDK that will include new features; It's unlikely we're going to add a global way to override NullValueHandling any time soon. That said, this is open source, so if you want you can contribute a PR. Let me know if you're interested in doing that, and if so I'll re-open the issue.

Likewise, if more customers let us know that this is a real pain point, we'll re-open this.

@jsgoupil
Copy link
Author

@brjohnstmsft I appreciate your time. I will start using Upload starting from now then. In my case, I was mistakenly using MergeOrUpload... I didn't know that MergeOrUpload would behave like Merge and not Upload (since it has both word in it.) My thought was that Upload would fail if it's already in the index. Not true... This is why I was using MergeOrUpload.

In the example that you wrote above. This is performing only "faster" if you actually loaded the doc without V and X. It is a bit confusing, but I see that all scenarios seem to be supported.

Thanks again 👍

@bsiegel bsiegel added the Service Attention This issue is responsible by Azure service team. label Sep 26, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Search Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

3 participants