Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow search templates stored in an index to be retrieved and used at search time #5921

Closed
wants to merge 22 commits into from

Conversation

GaelTadh
Copy link
Contributor

Allow running search templates stored in an index at search time

This change allows search templates stored in an index to be used at search time.
You can run an indexed template by

GET /_search/template
{
    'templatename' : '/index/language/id',
    'params' : {
        'param1' : 'foo'
    }
}

Running the template will fail if the template cannot be found, if the templatename
starts with a '/' but does not conform to the /index/type/id triple
If the named template cannot be found an error will be raised. Currently the
templates are NOT cached. We also use TransportGetAction directly instead of using
Client to avoid a circular dependency between Client and ScriptService

See #5637, #5484

@s1monw s1monw added the review label Apr 23, 2014
@dadoonet
Copy link
Member

Thinking about it loud: wondering if we should not store templates under a .template type as we do for percolation?

WDYT?

@GaelTadh
Copy link
Contributor Author

Aren't precolates bounded to a single index ? Templates operate across indices. I think perhaps storing templates in a .template index might be better. This also allows us to maintain specifying the language of the template script in the type.

@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

well at this point it doesn't matter which type you store it under. I was thinking about this too to get rid of the /index/type/id entirely and just us the id but the query parser has no notion of an index at this point. If we store it under .template what other than consistency with the percolator are we gaining? I don't think this perfect what we have here but before I start restricting it I want to understand what we gain. I'd love to simplify it though!

@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

Aren't precolates bounded to a single index ? Templates operate across indices. I think perhaps storing templates in a .template index might be better. This also allows us to maintain specifying the language of the template script in the type.

no percolates are bound to a type since 1.0 you can have it in any index. Yet I think you raised a good point here we need to know which lang the script is for this to be really useful. I think we should ask for a certain format here like:

{
  'script' : {} | ""
  'lang' : "optional"
}

that way we get information in the right place?

@GaelTadh
Copy link
Contributor Author

So embed the language in the template itself ? Doesn't this then distance us from on disk templates, whose types are defined by the file extension (analogous to a type) ?

@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

I don't see any relation to on-disk templates here to be honest. I mean we can go that path but I wonder what happens to scripts that are not JSON but just a String? You also need to know the lang when you do the request which is odd IMO.

given that it might be a good thing to use .script as the type and let folks only specify the index & the actual ID

@GaelTadh
Copy link
Contributor Author

If we mask the type from the templates does that mean we will need specialized infrastructure to store, update and delete templates ? Or will we just document that they need to be indexed with a .script type ? I can see a use case around versioning templates, is the right place to version in the type or the id ?

@clintongormley
Copy link

Few points here:

  • templates (which are a sub-class of script) span indices, so should be stored in an index, rather than in a type
  • we need to indicate what type of script/template a document contains, so the _type seems a natural place for that
  • users shouldn't be able to insert an arbitrary document into any index and use that as a script - instead, scripts should only be executable if they come from a particular index. that way an admin could prevent unauthorized users from creating scripts

I would suggest having a special index called .scripts, and using the _type to indicate the type of script that a document contains, eg:

PUT /.scripts/mvel/hello_world
{
    "script": "return \"hello world\""
}

PUT /.scripts/template/hello_world
{
    "template": { "query": { "match": { "title": "hello world" }}}
}

The template field may contain a JSON object (as per the above example) or a string, but the contents of the field does not need to be indexed. We should have a default mapping for the .scripts index like this:

PUT /.scripts
{
  "mappings": {
    "_default_": {
      "properties": {
        "script": { "enabled": false },
        "template": { "enabled": false }
      }
    }
  }
}

That way, script or template would accept a string or an object but wouldn't try to parse it any further. The contents of that field wouldn't be searchable, but that's fine. Users can always add metadata in the same way they do for percolator docs.

Then there is the question of how to refer to these templates or scripts. Currently, we can store scripts or templates on disk and just pass the name of the file in the template/script parameter, eg:

GET /_search/template
{
    "template": "storedTemplate" ,
    "params": {
        "query_string": "search for these words"
    }
}

While this can work with a small set of files stored on disk, it won't work when we can have potentially unlimited scripts/templates in an index. Especially with scripts, it could be tricky to distinguish between a filename, a type/id and a dynamic script.

We have some precedents for this, but the precedents are not consistent, eg we have the terms lookup mechanism:

"terms": {
    "some_field": {
        "index": "some_index",
        "type": "some_type",
        "id": "some_id",
        "path": "some.path"
    }
}

Then with geoshapes, we have:

"geo_shape": {
    "some_field": {
        "indexed_shape": {
            "index": "some_index",
            "type": "some_type",
            "id": "some_id",
            "path": "some.path"
        }
    }
}

(I think the geo_shape syntax could be changed to conform with the terms syntax, to make it consistent).

The equivalent for scripts could be:

GET /_search
{
  "script_fields": {
    "inline_script": {
      "script": "return 'foo'"
    },
    "indexed_script": {
      "lang": "mvel",     # uses "mvel" as the _type
      "id": "hello_world"
    },
    "script_in_file": {
      "lang": "mvel",
      "file": "hello_world"  # or "hello_world.mvel"
    }
  }
}

And for templates, we have inline templates:

# as string
GET /_search/template
{
  "template": "{\"query\":{\"match_all\":{}}}"
}

# as JSON object
GET /_search/template
{
  "template": {
    "query": {
      "match_all": {}
    }
  }
}

Templates stored in a file:

GET /_search/template
{
  "file": "some_template"
}

Or templates stored in the .scripts index:

GET /_search/template
{
  "id": "some_template"
}

# or just:
GET /_search/template/some_template

@GaelTadh
Copy link
Contributor Author

++ to storing into a .scripts index and using the types to define the type of script/template. Do we have a list of allowed types that we can validate against ?

@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

@GaelTadh what do you mean by allowed types?

@GaelTadh
Copy link
Contributor Author

@s1monw nm I found it. I meant supported script engines.

@GaelTadh
Copy link
Contributor Author

GaelTadh commented May 6, 2014

@clintongormley do you think we should refuse to delete the .scripts index ? Always create it ? Or just create it on demand when a user creates an indexed template for the first time ?

@clintongormley
Copy link

@GaelTadh I'd say we treat it like a normal index that we create on demand and can be deleted.

Possibly we should have a built-in index template with defaults of (eg) 1 primary shard and 0-all replicas?

@s1monw
Copy link
Contributor

s1monw commented May 13, 2014

++ for the index template

}
}
}
URL defaultMappingUrl = getMappingUrl(indexSettings, environment, defaultMappingLocation,"default-mapping.json","org/elasticsearch/index/mapper/default-mapping.json");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fact that you move that boilerplate code out. Yet I think you can simply keep the refactoring into getMappingUrl and drop all the rest here. Instead you can just use the script template as the default template like:

final URL defaultMappingUrl;
if (this.index.getName().equals(ScriptService.SCRIPT_INDEX) {
 defaultMappingUrl = getMappingUrl(indexSettings, environment, defaultMappingLocation,"script-index-defaults.json","org/elasticsearch/index/mapper/script-index-defaults.json");
} else {
  defaultMappingUrl = getMappingUrl(indexSettings, environment, defaultMappingLocation,"default-mapping.json","org/elasticsearch/index/mapper/default-mapping.json");
}

since we effectively change the default template we can just overload it here. That means we don't need to duplicate the mapping in the code below neither we need to add the if in line 411

Add ParseFields to ScriptService and use them to parse script json fields in requests.
Remove templateId from SearchRequest and use templateName with a ScriptType.
Add readTo and writeTo to ScriptService.ScriptType.
Add set<ScriptType>Script to UpdateRequestBuilder instead of passing in ScriptType in setScript.
Various cleanups and small fixes.

See elastic#5921
Make TermsStatsFacetParser use the same ParseFields as the other parsers.
This was missed in the intial pass of the conversion to ParseFields.

See elastic#5921
…on on PUT

Indexed scripts are now cached in the same manner as inline scripts. This does mean that we will hit the index
everytime an indexed script is requested, but it also means that it doesn't matter how the script gets into the
index, either via the script/template API or via the indexing API the cache will be updated.
When scripts/templates are added via the API validation occurs and only scripts that are compilable will be indexed.
Fixed some bugs around non-json templates.
Added some API tests to verify validation.

See elastic#5921
Some interfaces had changed in master. Needed to update the Rest actions.

See elastic#5921
Mvel has been removed and replaced by groovy. Needed to update the tests that relied on mvel to use groovy instead.

See elastic#5921
Added Java Client interfaces to mirror the REST endpoints and switched the rest endpoints to
use the java client interfaces.
Script languages are now validated at the endpoint/request level.
Moved all script services to ScriptService, validation now happens here.

See elastic#5921
Implemented some changes after review. Notably created DelegatingActionListener to
allow easier wrapping of responses and listeners.

See elastic#5921
Throw exception when/if private ctor is called.

See elastic#5921
This commit creates two new base classes.
DelegatingActionListener
This class is used when a transport action wants to issue a new request but delagate the responses to its Listener.
HandledTransportAction
This class is implements the common pattern of subclassing BaseTransportRequestHandler and registering the handler into the
TransportService.

See elastic#5921
Some clean up to remove read/write(sharedString).

See elastic#5921
TransportDeleteIndexScriptAction was referencing DeleteRequest instead of
DeleteIndexedScriptRequest. This made many tests angry.

See elastic#5921
Integrating changes from review process.
Removing sharedString usages.
Adding some javadoc strings.
Removing dead imports and some un-used member variables in the RestEndpoints.

See elastic#5921
@GaelTadh GaelTadh closed this in e79b708 Jul 14, 2014
GaelTadh added a commit that referenced this pull request Jul 14, 2014
…cripts/template from an index.

This change allow elasticsearch users to store scripts and templates in an index for use at search time.
Scripts/Templates are stored in the .scripts index. The type of the events is set to the script language.
Templates use the mustache language so their type is be "mustache".
Adds the concept of a script type to calls to the ScriptService types are INDEXED,INLINE,FILE.
If a script type of INDEXED is supplied the script will be attempted to be loaded from the indexed, FILE will
look in the file cache and INLINE will treat the supplied script argument as the literal script.
REST endpoints are provided to do CRUD operations as is a java client library.
All query dsl points have been upgraded to allow passing in of explicit script ids and script file names.
Backwards compatible behavior has been preserved so this shouldn't break any existing querys that expect to
pass in a filename as the script/template name. The ScriptService will check the disk cache before parsing the
script.

Closes #5921 #5637 #5484
@clintongormley clintongormley changed the title Allow search templates stored in an index to be retrieved and used at se... Scripting: Allow search templates stored in an index to be retrieved and used at search time Jul 16, 2014
karmi added a commit to elastic/elasticsearch-ruby that referenced this pull request Jul 22, 2014
@clintongormley clintongormley changed the title Scripting: Allow search templates stored in an index to be retrieved and used at search time Allow search templates stored in an index to be retrieved and used at search time Jun 6, 2015
@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Indexed Scripts/Templates labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants