Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full text searching support on the documents? #39

Closed
jeremydmiller opened this issue Oct 21, 2015 · 15 comments

Comments

Projects
None yet
8 participants
@jeremydmiller
Copy link
Contributor

commented Oct 21, 2015

More research on this one.

@jeffdoolittle

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2015

Part 2 of of the @robconery series on postgres as document store may be relevant to this issue:

http://rob.conery.io/category/postgres-document-api/

@danbarua

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2015

You could do this the same way you've done searchable fields, just persist it as a tsvector.

@jeremydmiller

This comment has been minimized.

Copy link
Contributor Author

commented Nov 11, 2015

@danbarua Yeah, but I haven't thought about how you might use or expose that from the client. You always have the ability to go straight at SQL I guess. Any suggestions on that angle?

@danbarua

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2015

I'll have a play with it and see if I can come up with anything.

@danbarua

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2015

Just leaving this here as a reminder:

class Foo
{
  [FullText]
  public string Name {get; set;}
}

db.Query<Foo>.Where(x => x.Name.MatchesTextQuery("your_full_text_query"))
@robconery

This comment has been minimized.

Copy link

commented Nov 11, 2015

You have to persist the values at save - unfortunately triggers won't work on documents (which is a bummer). What we did for Moebius (Elixir db tool) was to define a function that would index a set of fields;

db(:users)
  |> searchable([:name, :email])
  |> save(...)

I would imagine here you could do the same with an attribute on the document itself:

[SearchableDocument("name","email")]
class Foo {
  //...
}

The when save is called check the attributes and update them in a separate call after save:

-- spin up this SQL string based on the attributes
update foos set(search) = to_tsvector(concat(name, ' ', email, ' '));

Anyway - that's what we do :).

@jeremydmiller

This comment has been minimized.

Copy link
Contributor Author

commented Nov 11, 2015

@robconery Thanks for the input Rob.

In Marten's internals, we've got a class called DocumentMapping that models everything about how to persist a document type (what's the id, which fields are duplicated for searching, etc), and from that model Marten codegen's the db table, upsert sproc, and even the C# glue code that goes from a document object to calling the right sproc.

In this case I think we'd add the full text searchable fields to the DocumentMapping model and enhance the codegen stuff to pass that along to upsert sprocs that also set up the full text mapping.

@robconery

This comment has been minimized.

Copy link

commented Nov 11, 2015

Sounds like it fits right in - nice job :).

@danbarua

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2015

That's pretty much the approach I'm trying 👍

@jeremydmiller jeremydmiller modified the milestone: v0.9 Mar 8, 2016

@jakejscott

This comment has been minimized.

Copy link

commented Mar 19, 2016

Here's another approach that uses Trigram's and the pg_trgm extension http://www.postgresql.org/docs/current/static/pgtrgm.html

Gitlab are using it to speed up like matches
https://about.gitlab.com/2016/03/18/fast-search-using-postgresql-trigram-indexes/

@jeremydmiller jeremydmiller modified the milestone: v0.9 Apr 4, 2016

@phillip-haydon

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2016

WIP - API Design:

User implements

    public class Photo
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }
        public string[] Tags { get; set; }

        //...
    }

    public class PhotoSearch : Searchable<Photo>
    {
        public string ByNameAndDescription { get; set; }

        public string ByTag { get; set; }
    }

Marten implementation

    public class Searchable<T> where T : class, new()
    {
        public IList<SearchResult<T>> Results { get; set; }

        public Func<SearchResult<T>, object> OrderBy { get; set; } = x => x.Rank;

        public Directions Direction { get; set; } = Directions.Descending;
    }

    public abstract class SearchResult<T>
    {
        public T Row { get; set; }

        public decimal Rank { get; set; }
    }

    public enum Directions
    {
        Ascending,
        Descending
    }

Schema configuration

_.Schema.For<Photo>()
 .Searchable<PhotoSearch>(x =>
 {
    x.Vector(f => f.ByNameAndDescription)
     .With(f => f.Name, Weighting.High)
     .With(f => f.Description, Weighting.Low);

    x.Vector(f => f.ByTag)
     .With(f => f.Tags);
 });

Example query

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .And(x => x.ByTag, term);


var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);

Example usage of result

foreach (var photo in result)
{
    photo.Row.Name ...
    photo.Row.Tags ...

    photo.Rank // search result rank
}
@mdissel

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2016

Nice! Can you still query on other non-fulltext fields in the Photo class (in your sample)?

@phillip-haydon

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2016

Hmmm good point, I didn't consider that.

The problem I'm trying to solve is:

You can assign weighting against different terms as part of the tsvector index, but you can't specify a property in the tsvector

Meaning you cannot say:

Search against the Name field within the tsvector

So by creating a searchable doc we can have multiple tsvector columns for different search queries. The tsvectors can be rather small so this shouldn't be an issue to have a few extra columns.

But to have the API searchable I've tried to string it together. I guess it's now a case of exposing T so that we can also include normal criteria, something like:

var result = session.Search<PhotoSearch>()
                    .Include<Member>(...)
                    .Where(x => x.Create > DateTime.UtcNow.AddDays(-10))
                    .Against(x => x.ByNameAndDescription, term)
                    .Or(x => x.ByTag, term)
                    .OrderBy(x => x.Rank, Directions.Ascending);
@mdissel

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2016

Another option I'm missing is the stemming part. Where can we configure the stemming option that should be used (and the 'simple' stemming to disable stemming)

A nice to have is the ts_headline option ;)

@jeremydmiller jeremydmiller modified the milestone: 1.1 Jun 20, 2016

@jeremydmiller jeremydmiller modified the milestones: 1.1, 1.2 Oct 4, 2016

@jeremydmiller jeremydmiller modified the milestone: 1.2 Oct 18, 2016

@jeremydmiller jeremydmiller added this to the 3.0 milestone Aug 31, 2018

@jeremydmiller jeremydmiller removed this from the 3.0 milestone Sep 26, 2018

@oskardudycz

This comment has been minimized.

Copy link
Collaborator

commented Dec 1, 2018

I'm closing this issue as it was solved with #1098. Full text search is now available from Marten 3.2.0

@oskardudycz oskardudycz closed this Dec 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.