Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding documents with empty string as value #7515

Closed
paulovictorv opened this issue Aug 29, 2014 · 9 comments
Closed

Finding documents with empty string as value #7515

paulovictorv opened this issue Aug 29, 2014 · 9 comments

Comments

@paulovictorv
Copy link

Hello,

I've been trying to find all documents that contains a field with an empty string value (_textContent) inside my index using the missing filter. But I think that this filter doesn't treat empty strings as a null value.

Is this a bug, or the intended behavior for this filter? If it is like this by design, may I suggest adding an empty filter, if that's even possible?

As a plus, I'll post the mapping that I'm currently using, the document that contains the empty string field and the query that I'm trying to run:

Mapping:

{
  "documents": {
    "mappings": {
      "document": {
        "properties": {
          "_contratante": {
            "type": "string"
          },
          "_dateFields": {
            "type": "nested",
            "properties": {
              "id": {
                "type": "string",
                "index": "not_analyzed"
              },
              "value": {
                "type": "date",
                "format": "dateOptionalTime"
              }
            }
          },
          "_indexadoPor": {
            "type": "string"
          },
          "_textContent": {
            "type": "string"
          },
          "_textFields": {
            "type": "nested",
            "properties": {
              "id": {
                "type": "string",
                "index": "not_analyzed"
              },
              "value": {
                "type": "string"
              }
            }
          },
          "_tipoDocumento": {
            "type": "string"
          }
        }
      }
    }
  }
}

Document:

{
  "_index": "documents",
  "_type": "document",
  "_id": "xxx",
  "_version": 1,
  "found": true,
  "_source": {
    "_id": "xxx",
    "_contratante": "xxx",
    "_tipoDocumento": "xxx",
    "_indexadoPor": "xxx",
    "_dateFields": [
      {
        "id": "538730ece4b0d13600208d7a:2",
        "value": 1404183600000
      }
    ],
    "_textFields": [
      {
        "id": "538730ece4b0d13600208d7a:0",
        "value": "xxx"
      },
      {
        "id": "538730ece4b0d13600208d7a:1",
        "value": "xxx"
      },
      {
        "id": "538730ece4b0d13600208d7a:3",
        "value": ""
      },
      {
        "id": "538730ece4b0d13600208d7a:4",
        "value": "xxxx"
      }
    ],
    "_textContent": ""
  }
}

Query:

{
  "query": {
    "filtered": {
      "filter": {
        "missing": {
          "field": "_textContent"
        }
      }
    }
  }
}

Thanks a lot for the amazing tool developed here!

@hxuanji
Copy link

hxuanji commented Sep 1, 2014

Hi Paulo and everyone,

I just ran a few tests and also have same questions about it.
The situation can be produced by the gist: https://gist.github.com/hxuanji/d941c21fc75648ce7ba4

On the ES 1.3.2, the empty string cannot be find and the gist above can only find one document {"name": null}. And its explanation is
ConstantScore(cache(NotFilter(cache(BooleanFilter(_field_names:name)))))

But on the ES 1.0.1, the final result of the gist can both find the empty-string and null-value documents. Also its explanation is
ConstantScore(cache(NotFilter(cache(BooleanFilter(name:[* TO *]))))).

It seems the parsing rule changed, _field_names seems not consider some special null cases on the ES 1.3.2 ?!
And this _field_names added from #5659.

Any ideas?

@jpountz
Copy link
Contributor

jpountz commented Sep 1, 2014

I think this is a duplicate of #7348 ?

@hxuanji
Copy link

hxuanji commented Sep 1, 2014

Hi @jprante,

Yes, I think so.

@paulovictorv
Copy link
Author

Does adding an empty filter sounds silly? I think there's some good use cases for this.

@clintongormley
Copy link

Here's a workaround:

PUT t/t/1
{
  "textContent": ""
}

PUT t/t/2
{
  "textContent": "foo"
}

GET t/t/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "textContent"
          }
        }
      ],
      "must_not": [
        {
          "wildcard": {
            "textContent": "*"
          }
        }
      ]
    }
  }
}

Not optimal, but it works

@davidhesson
Copy link

You're a life saver @clintongormley

@JnBrymn-EB
Copy link

I want to do the inverse of this:

"query": {
  "bool": {
    "filter": {
      "wildcard": {
      "name": "*"
    }
  }
}

And it appears to work. Any document with a name matches and any doc with null or '' name does not match. But I don't know why it works! I thought that the wildcard filter expands * to some number of terms so that if I have, say, 9999 terms in the name field, if I index a document with name = "zzzzzzzzzzzzebra" then that will be well past the term expansion and it shouldn't match. But it does. Why?

Somehow I guess it's doing the constant_score thing here and it's using the variant that doesn't hit the 1024 clause error. But how does that work? How can you look up anything in the index w/o having a should clause with all the terms inside of it?

@clintongormley ?

@apuppy
Copy link

apuppy commented Sep 25, 2020

Now it's 2020. Still not found a better solution for "field is not the empty string".
Used the following solution, but is this the really good practice?

{
"wildcard":{"field_name":"*"}
}

@ViktorMasnyi
Copy link

ViktorMasnyi commented Jan 24, 2022

hi, bool query with regexp works well for me
for example:

"query":{
    "bool":{
      "must_not": [
        {
          "regexp": {
            "yourFiledName.keyword": ""
          }
        }
      ]
    }
  },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants