New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full text searching support on the documents? #39

Open
jeremydmiller opened this Issue Oct 21, 2015 · 14 comments

Comments

Projects
None yet
7 participants
@jeremydmiller
Contributor

jeremydmiller commented Oct 21, 2015

More research on this one.

@jeffdoolittle

This comment has been minimized.

Show comment
Hide comment
@jeffdoolittle

jeffdoolittle Oct 21, 2015

Contributor

Part 2 of of the @robconery series on postgres as document store may be relevant to this issue:

http://rob.conery.io/category/postgres-document-api/

Contributor

jeffdoolittle commented Oct 21, 2015

Part 2 of of the @robconery series on postgres as document store may be relevant to this issue:

http://rob.conery.io/category/postgres-document-api/

@danbarua

This comment has been minimized.

Show comment
Hide comment
@danbarua

danbarua Nov 11, 2015

Contributor

You could do this the same way you've done searchable fields, just persist it as a tsvector.

Contributor

danbarua commented Nov 11, 2015

You could do this the same way you've done searchable fields, just persist it as a tsvector.

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Nov 11, 2015

Contributor

@danbarua Yeah, but I haven't thought about how you might use or expose that from the client. You always have the ability to go straight at SQL I guess. Any suggestions on that angle?

Contributor

jeremydmiller commented Nov 11, 2015

@danbarua Yeah, but I haven't thought about how you might use or expose that from the client. You always have the ability to go straight at SQL I guess. Any suggestions on that angle?

@danbarua

This comment has been minimized.

Show comment
Hide comment
@danbarua

danbarua Nov 11, 2015

Contributor

I'll have a play with it and see if I can come up with anything.

Contributor

danbarua commented Nov 11, 2015

I'll have a play with it and see if I can come up with anything.

@danbarua

This comment has been minimized.

Show comment
Hide comment
@danbarua

danbarua Nov 11, 2015

Contributor

Just leaving this here as a reminder:

class Foo
{
  [FullText]
  public string Name {get; set;}
}

db.Query<Foo>.Where(x => x.Name.MatchesTextQuery("your_full_text_query"))
Contributor

danbarua commented Nov 11, 2015

Just leaving this here as a reminder:

class Foo
{
  [FullText]
  public string Name {get; set;}
}

db.Query<Foo>.Where(x => x.Name.MatchesTextQuery("your_full_text_query"))
@robconery

This comment has been minimized.

Show comment
Hide comment
@robconery

robconery Nov 11, 2015

You have to persist the values at save - unfortunately triggers won't work on documents (which is a bummer). What we did for Moebius (Elixir db tool) was to define a function that would index a set of fields;

db(:users)
  |> searchable([:name, :email])
  |> save(...)

I would imagine here you could do the same with an attribute on the document itself:

[SearchableDocument("name","email")]
class Foo {
  //...
}

The when save is called check the attributes and update them in a separate call after save:

-- spin up this SQL string based on the attributes
update foos set(search) = to_tsvector(concat(name, ' ', email, ' '));

Anyway - that's what we do :).

robconery commented Nov 11, 2015

You have to persist the values at save - unfortunately triggers won't work on documents (which is a bummer). What we did for Moebius (Elixir db tool) was to define a function that would index a set of fields;

db(:users)
  |> searchable([:name, :email])
  |> save(...)

I would imagine here you could do the same with an attribute on the document itself:

[SearchableDocument("name","email")]
class Foo {
  //...
}

The when save is called check the attributes and update them in a separate call after save:

-- spin up this SQL string based on the attributes
update foos set(search) = to_tsvector(concat(name, ' ', email, ' '));

Anyway - that's what we do :).

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Nov 11, 2015

Contributor

@robconery Thanks for the input Rob.

In Marten's internals, we've got a class called DocumentMapping that models everything about how to persist a document type (what's the id, which fields are duplicated for searching, etc), and from that model Marten codegen's the db table, upsert sproc, and even the C# glue code that goes from a document object to calling the right sproc.

In this case I think we'd add the full text searchable fields to the DocumentMapping model and enhance the codegen stuff to pass that along to upsert sprocs that also set up the full text mapping.

Contributor

jeremydmiller commented Nov 11, 2015

@robconery Thanks for the input Rob.

In Marten's internals, we've got a class called DocumentMapping that models everything about how to persist a document type (what's the id, which fields are duplicated for searching, etc), and from that model Marten codegen's the db table, upsert sproc, and even the C# glue code that goes from a document object to calling the right sproc.

In this case I think we'd add the full text searchable fields to the DocumentMapping model and enhance the codegen stuff to pass that along to upsert sprocs that also set up the full text mapping.

@robconery

This comment has been minimized.

Show comment
Hide comment
@robconery

robconery Nov 11, 2015

Sounds like it fits right in - nice job :).

robconery commented Nov 11, 2015

Sounds like it fits right in - nice job :).

@danbarua

This comment has been minimized.

Show comment
Hide comment
@danbarua

danbarua Nov 12, 2015

Contributor

That's pretty much the approach I'm trying 👍

Contributor

danbarua commented Nov 12, 2015

That's pretty much the approach I'm trying 👍

@jeremydmiller jeremydmiller modified the milestone: v0.9 Mar 8, 2016

@jakejscott

This comment has been minimized.

Show comment
Hide comment
@jakejscott

jakejscott Mar 19, 2016

Here's another approach that uses Trigram's and the pg_trgm extension http://www.postgresql.org/docs/current/static/pgtrgm.html

Gitlab are using it to speed up like matches
https://about.gitlab.com/2016/03/18/fast-search-using-postgresql-trigram-indexes/

jakejscott commented Mar 19, 2016

Here's another approach that uses Trigram's and the pg_trgm extension http://www.postgresql.org/docs/current/static/pgtrgm.html

Gitlab are using it to speed up like matches
https://about.gitlab.com/2016/03/18/fast-search-using-postgresql-trigram-indexes/

@jeremydmiller jeremydmiller modified the milestone: v0.9 Apr 4, 2016

@phillip-haydon

This comment has been minimized.

Show comment
Hide comment
@phillip-haydon

phillip-haydon Jun 18, 2016

Contributor

WIP - API Design:

User implements

    public class Photo
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }
        public string[] Tags { get; set; }

        //...
    }

    public class PhotoSearch : Searchable<Photo>
    {
        public string ByNameAndDescription { get; set; }

        public string ByTag { get; set; }
    }

Marten implementation

    public class Searchable<T> where T : class, new()
    {
        public IList<SearchResult<T>> Results { get; set; }

        public Func<SearchResult<T>, object> OrderBy { get; set; } = x => x.Rank;

        public Directions Direction { get; set; } = Directions.Descending;
    }

    public abstract class SearchResult<T>
    {
        public T Row { get; set; }

        public decimal Rank { get; set; }
    }

    public enum Directions
    {
        Ascending,
        Descending
    }

Schema configuration

_.Schema.For<Photo>()
 .Searchable<PhotoSearch>(x =>
 {
    x.Vector(f => f.ByNameAndDescription)
     .With(f => f.Name, Weighting.High)
     .With(f => f.Description, Weighting.Low);

    x.Vector(f => f.ByTag)
     .With(f => f.Tags);
 });

Example query

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .And(x => x.ByTag, term);


var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);

Example usage of result

foreach (var photo in result)
{
    photo.Row.Name ...
    photo.Row.Tags ...

    photo.Rank // search result rank
}
Contributor

phillip-haydon commented Jun 18, 2016

WIP - API Design:

User implements

    public class Photo
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }
        public string[] Tags { get; set; }

        //...
    }

    public class PhotoSearch : Searchable<Photo>
    {
        public string ByNameAndDescription { get; set; }

        public string ByTag { get; set; }
    }

Marten implementation

    public class Searchable<T> where T : class, new()
    {
        public IList<SearchResult<T>> Results { get; set; }

        public Func<SearchResult<T>, object> OrderBy { get; set; } = x => x.Rank;

        public Directions Direction { get; set; } = Directions.Descending;
    }

    public abstract class SearchResult<T>
    {
        public T Row { get; set; }

        public decimal Rank { get; set; }
    }

    public enum Directions
    {
        Ascending,
        Descending
    }

Schema configuration

_.Schema.For<Photo>()
 .Searchable<PhotoSearch>(x =>
 {
    x.Vector(f => f.ByNameAndDescription)
     .With(f => f.Name, Weighting.High)
     .With(f => f.Description, Weighting.Low);

    x.Vector(f => f.ByTag)
     .With(f => f.Tags);
 });

Example query

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .And(x => x.ByTag, term);


var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);

Example usage of result

foreach (var photo in result)
{
    photo.Row.Name ...
    photo.Row.Tags ...

    photo.Rank // search result rank
}
@mdissel

This comment has been minimized.

Show comment
Hide comment
@mdissel

mdissel Jun 18, 2016

Contributor

Nice! Can you still query on other non-fulltext fields in the Photo class (in your sample)?

Contributor

mdissel commented Jun 18, 2016

Nice! Can you still query on other non-fulltext fields in the Photo class (in your sample)?

@phillip-haydon

This comment has been minimized.

Show comment
Hide comment
@phillip-haydon

phillip-haydon Jun 18, 2016

Contributor

Hmmm good point, I didn't consider that.

The problem I'm trying to solve is:

You can assign weighting against different terms as part of the tsvector index, but you can't specify a property in the tsvector

Meaning you cannot say:

Search against the Name field within the tsvector

So by creating a searchable doc we can have multiple tsvector columns for different search queries. The tsvectors can be rather small so this shouldn't be an issue to have a few extra columns.

But to have the API searchable I've tried to string it together. I guess it's now a case of exposing T so that we can also include normal criteria, something like:

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Where(x => x.Create > DateTime.UtcNow.AddDays(-10))
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);
Contributor

phillip-haydon commented Jun 18, 2016

Hmmm good point, I didn't consider that.

The problem I'm trying to solve is:

You can assign weighting against different terms as part of the tsvector index, but you can't specify a property in the tsvector

Meaning you cannot say:

Search against the Name field within the tsvector

So by creating a searchable doc we can have multiple tsvector columns for different search queries. The tsvectors can be rather small so this shouldn't be an issue to have a few extra columns.

But to have the API searchable I've tried to string it together. I guess it's now a case of exposing T so that we can also include normal criteria, something like:

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Where(x => x.Create > DateTime.UtcNow.AddDays(-10))
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);
@mdissel

This comment has been minimized.

Show comment
Hide comment
@mdissel

mdissel Jun 18, 2016

Contributor

Another option I'm missing is the stemming part. Where can we configure the stemming option that should be used (and the 'simple' stemming to disable stemming)

A nice to have is the ts_headline option ;)

Contributor

mdissel commented Jun 18, 2016

Another option I'm missing is the stemming part. Where can we configure the stemming option that should be used (and the 'simple' stemming to disable stemming)

A nice to have is the ts_headline option ;)

@jeremydmiller jeremydmiller modified the milestone: 1.1 Jun 20, 2016

@jeremydmiller jeremydmiller modified the milestones: 1.1, 1.2 Oct 4, 2016

@jeremydmiller jeremydmiller modified the milestone: 1.2 Oct 18, 2016

@jeremydmiller jeremydmiller added this to the 3.0 milestone Aug 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment