Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling all users #44

Closed
sksamuel opened this issue Dec 3, 2013 · 16 comments
Closed

Calling all users #44

sksamuel opened this issue Dec 3, 2013 · 16 comments
Assignees
Labels
Milestone

Comments

@sksamuel
Copy link
Collaborator

sksamuel commented Dec 3, 2013

With version 1 due out soon, I want to lock down the DSL for our own 1.0 release.
Does anyone have any suggestions for improvements, things they'd like to change, breaking backwards compatibility.

@ghost ghost assigned sksamuel Dec 3, 2013
@sksamuel
Copy link
Collaborator Author

One thing I'm not sure which I prefer is whether to have generic execute methods, or overloaded specifically named ones.

Eg,

client.execute {
      ...
    }

vs

client.update {
    ....
    }

The latter is shorter, but the problem is structuring the DSL to support things where you only have an id. eg, in the former

client.execute {
      update(14).in("scifi/starwars").doc(anyOldObject).docAsUpsert
    }

the update is the keyword that is used to begin the update DSL. If it was just

client.update {
      14.in("scifi/starwars").doc(anyOldObject).docAsUpsert
    }

Then you have to overload an int with multiple implicits (for other DSL types) and then it becomes ambiguous.

@caoilte
Copy link

caoilte commented Dec 15, 2013

As discussed in #51 the current indexing DSL uses the "source" operator to allow you to index json (and other) documents directly but is a little confusing because in Elastic Search parlance "_source" is a field that stores a full copy of the underlying document (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html).

It would be less confusing (for me anyway) if elastic4s renamed "source" to be "doc" as this is closer to the way elastic search talks about the process of indexing (which is covered in the "document APIs" section of the documentation).

On client.update vs client.execute the only points I would add is,

  1. it would be good to keep the single op API consistent with the bulk api. "execute" does this as you can drop in replace it with "batch" I assume.
  2. i think IDs are strings so you would actually need to overload that (or both). Another option is
client.update {
      id(14).in("scifi/starwars").doc(anyOldObject).docAsUpsert
    }

@sksamuel
Copy link
Collaborator Author

The problem with overloading the strings, or providing an id method is that you'd use the same id thing for delete, update, get, etc and the compiler won't know which one you want (it would need to support some king of crazy method call lookahead).

@ppat
Copy link
Contributor

ppat commented Jan 30, 2014

One thing I'd like to see is to be able to work with data structures that may or may not have values. Options, empty collections, etc.

If we have:

        val userId: Long = 111111111L
        val since: Option[DateTime] = None
        val exclusions: Array[Long] = Array()
        bool {
          must ( termFilter("user_id", userId) )
          must ( rangeFilter("date").gte(since)) )
          not ( termsFilter("record_id", exclusions: _*) )
        }

So if we have an empty array for exclusions, it won't add the not ( termsFilter ) as it doesn't make sense. Similary, the optional date is None so the must ( rangeFilter ) does not need to be added. Also it would be nice if we can automatically convert to ES standard datetime from JodaTime or java.util.Date (but that I'm not very interested in). Without this functionality, it clutters up an otherwise really nice and easy to read DSL with too many conditionals. Thoughts?

Also AND/OR filters should be nice as they have different performance characteristics where sometimes they make more sense than BOOL. I'd be happy to help out and submit a pull request for any/all of these, but want to hear your thoughts.

@sksamuel
Copy link
Collaborator Author

I think that's a great idea. I think the empty array stuff should be mostly handled by ES already, but I could be wrong on that. Definitely support for Option and Dates isn't supported. Similarly most Scala collections aren't supported, as at the moment we tend to force people to use var args through : _*

@ppat
Copy link
Contributor

ppat commented Mar 20, 2014

@sksamuel, I'd be happy to take a crack at this. Sorry it took a while to get around to it (mostly because we're only upgrading to 1.x now from 0.90).

@sksamuel
Copy link
Collaborator Author

Yeah that would be wicked thanks.

On 20 March 2014 18:13, ppat notifications@github.com wrote:

@sksamuel https://github.com/sksamuel, I'd be happy to take a crack at
this. Sorry it took a while to get around to it (mostly because we're only
upgrading to 1.x now from 0.90).

Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-38202060
.

@ppat
Copy link
Contributor

ppat commented Mar 20, 2014

There are some quirks we should discuss though. We should happen if an optional value is not present? Should that filter or query segment not be added? Does that make sense in all scenarios? What are the exceptions?

  must (
    termFilter("user_id", userId),
    termFilter("name", name)    
    termFilter("email", email)
  )

In this case, if userId is None, we can remove that termFilter. But what happens all three values are None, should the parent level must filter be also be not included? Does that even make sense at that point query wise?

I'll think more about it but if you or anyone else has ideas, feel free to share.

@sksamuel
Copy link
Collaborator Author

I think the java client will ignore it if you put in an empty filter. Eg if
all 3 terms were none we'd be setting an empty array, or shouldn't set
anything at all.

On 20 March 2014 18:17, ppat notifications@github.com wrote:

There are some quirks we should discuss though. We should happen if an
optional value is not present? Should that filter or query segment not be
added? Does that make sense in all scenarios? What are the exceptions?

must (
termFilter("user_id", userId),
termFilter("name", name)
termFilter("email", email)
)

In this case, if userId is None, we can remove that termFilter. But what
happens all three values are None, should the parent level must filter be
also be not included? Does that even make sense at that point query wise?

I'll think more about it but if you or anyone else has ideas, feel free to
share.

Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-38202586
.

@k0ala
Copy link

k0ala commented May 27, 2014

If the call for improvements is still open, the thing that I would like the most is more functionality on the result handling side of things. i.e., once my operation (written in beautiful dsl) has completed, it would be great to be able to inspect the results just as easily, without having to deal with the ES Java API too much.

@sksamuel
Copy link
Collaborator Author

The reason I never did responses in Scala originally was because I found
the Java ones to be fairly good already. Although that was back in release
0.19. I think it was release 1.0 when a lot of the names started to be
getXXX rather than just xxx like Scala. Which ones in particular are you
unhappy with ?

On 27 May 2014 10:44, k0ala notifications@github.com wrote:

If the call for improvements is still open, the thing that I would like
the most is more functionality on the result handling side of things. i.e.,
once my operation (written in beautiful dsl) has completed, it would be
great to be able to inspect the results just as easily, without having to
deal with the ES Java API too much.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-44254632
.

@k0ala
Copy link

k0ala commented May 27, 2014

For instance, I find the results from aggregation searches a bit cumbersome to inspect.

@sksamuel
Copy link
Collaborator Author

I don't use them personally in my projects so have not ran across your frustrations. Can you give me a concrete example of before and your ideal after?

@k0ala
Copy link

k0ala commented May 27, 2014

Frustrations is too big a word :) I just think it reduces the beauty of the dsl a bit.

For instance, consider the following nested aggregation, getting for each country the most frequent terms in city descriptions:

val freqWords = client.execute { 
    search in "places/cities" aggs { 
        aggregation terms "by_country" field "country" aggs {
            aggregation terms "frequent_words" field "content"
        }
    } 
} 

Currently I typically have to find out about elasticsearch internals, and do some casting:

freqWords onComplete {
    case Success(value) => { 
        val byCountry = value.getAggregations.get("by_country").asInstanceOf[Terms] 
        byCountry.getBuckets.asScala foreach (country => {
            val countryWords = country.getAggregations.get("frequent_words").asInstanceOf[Terms] 
            print(country.getKey + ": ")
            countryWords.getBuckets.asScala foreach (countryWord => print(countryWord.getKey + " "))
            println("")
        })  
    }
    case Failure(excep) => { println("Query failed"); excep.printStackTrace }
}

The following would have a nicer feel:

freqWords onComplete {
    case Success(value) => { 
        value.aggs("by_country").buckets foreach (country => {
            print(country.key + ": ")
            country.aggs("frequent_words").buckets foreach (countryWord => print(" " + countryWord.key)
            println("")
        })  
    }
    case Failure(excep) => { println("Query failed"); excep.printStackTrace }
}

Although perhaps I am doing something wrong (relatively new to Scala and very new to elasticsearch).

@sksamuel
Copy link
Collaborator Author

I'll have a play.

@sksamuel sksamuel reopened this May 27, 2014
@sksamuel
Copy link
Collaborator Author

Actually you can help by making a new ticket with details of what you'd like to see like this and then I'll add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants