Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bleve v2.0.0 proposal PR #1494

Merged
merged 29 commits into from
Jan 12, 2021
Merged

bleve v2.0.0 proposal PR #1494

merged 29 commits into from
Jan 12, 2021

Conversation

mschoch
Copy link
Member

@mschoch mschoch commented Nov 10, 2020

This PR currently includes the bare minimum set of changes to avoid circular dependencies with zap.

This is accomplished by refactoring the blevesearch/bleve/index and blevesearch/index/scorch/segment packages out into a separate modules blevesearch/bleve_index_api and blevesearch/scorch_segment_api respectively.

Further, a new module blevesearch/zapx is introduced. This repository is a duplicate of blevesearch/zap up to today, with all major version branches (v11.x, v12.x, v13.x, v14.x and master) updated to no longer depend on bleve. This solution was chose we intend for the major version number to mostly strongly indicate file format compatibility. Compatibility with major version of bleve (or lack of such a dependency) is therefore another dimension, and we felt most clearly represented with a different package name.

Along the way a few other minor changes were required, in all such cases the focus was to keep these as small as possible.

@mschoch
Copy link
Member Author

mschoch commented Nov 10, 2020

Keep in mind this is a large change, so please review carefully. I attempted to clean it up as best I could, but in a change this large sometimes things are missed.

@mschoch
Copy link
Member Author

mschoch commented Nov 10, 2020

Links to new repos:

NOTE: we should finalize v1.0.0 of the api interfaces, and update accordingly before merging...

@mschoch mschoch mentioned this pull request Nov 10, 2020
3 tasks
@mschoch
Copy link
Member Author

mschoch commented Nov 10, 2020

So, one of the wrinkles you'll see upon closer review is that we have a few places where we type assert that a index.Document is really type *document.Document. As implemented in this PR, this is safe for all known index implementations, because both upsidedown and scorch are still in the bleve repo, and use document.Document directly.

However, one of my additional refactors I'd like to consider is moving upsidedown out, and this means it can no longer use document.Document as that brings circular dependency back. I'll continue the experiment, but it most likely will mean new methods on the index.Document type, and changing those type assertions to go through the interface...

@mschoch
Copy link
Member Author

mschoch commented Nov 10, 2020

OK, now looks like moving upsidedown out is not possible without substantially more refactoring, as it currently uses the registry package to instantiate the key/value store implementations. That seems like too big a change for this round.

Instead I'll focus on trying to remove any upsidedown specific things from the bleve_index_api interfaces.

analysis/freq.go Outdated Show resolved Hide resolved
analysis/freq.go Outdated Show resolved Hide resolved
review comments from Sreekanth
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly OK to me. Couple things though -

  • I'm guessing you'll set up release tag(s) for bleve_index_api and scorch_segment_api - think we should keep their go versions at 1.13 as well (rather than 1.15).
  • Is there a reason why you included go.sum into this change, I don't remember your reason for not committing it in earlier?

index/upsidedown/upsidedown.go Outdated Show resolved Hide resolved
@mschoch
Copy link
Member Author

mschoch commented Nov 16, 2020

I'm guessing you'll set up release tag(s) for bleve_index_api and scorch_segment_api - think we should keep their go versions at 1.13 as well (rather than 1.15).

I expect once these interfaces packages are finalized, they will be tagged v1.0.0. What reason is there for them to be something else?

Is there a reason why you included go.sum into this change, I don't remember your reason for not committing it in earlier?

Yes this is intentional. Best practices is to include go.sum. In the past we did not because our go.mod was committed after manual edits (forward declarations of module versions that did not yet exist). Therefore everyone's go.sum would change after doing a basic build locally, which meant every PR was going to propose a change to go.sum. I wished to avoid that. However, now that that problem has gone away, we should adopt the best practices.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me.

Logic inside segment implementations has been
moved up the stack into scorch, so that fewer
methods are required in the API.

- Iterator() becomes AutomatonIterator(nil, nil, nil)
- RangeIterator(start, end) now computes an end key
  compatible with traditional inclusive start, exclusive
  end keys.  Then becomes:
  AutomatonIterator(nil, start, endExclusive)
- PrefixIterator(prefix) now computes an end key
  and becomes:
  AutomatonIterator(nil, prefix, prefixEnd)

NumericRangeSearcher updated to no longer require
the IndexReaderOnly interface.  This interface
was only used for non-scorch indexes, but only
scorch actually implemented it.  Removing this
also meant we can remove the OnlyIterator method
from the segment dictionary API.
this code was originally placed in the segment package,
however it is only used by scorch, and makes more sense
to be placed at this level
the empty postings iterator is now only used
by scorch's optimize.go, so it is relocated here

the uvarint ascending encode/decode is only used
by scorch, so it too is moved inside this package
the unadorned postings lists are only used by scorch
so they are relocated to this package
note: while doing this it was observed that 'go mod tidy' was
failing with an error.  this relates to the kvstores in the
blevex package which will needed to be updated separately.

this change addresses this by removing the config_* and
upsidedown/benchmark_* files which caused the problem.

this does not remove support for these kvstores, just
the build tag support.
also fixup some places we missed during refactor
* move the plugin interface into scorch

* rename scorch Plugin to SegmentPlugin

and related functions

* update to make all analysis take place in func() (#1508)

* update to make all analysis take place in func()

by making analysis take place inside a func() a
closure can be used to encapsulate all the index
specific details.  this allows the index API to not
know about scorch vs upsidedown analysis structures

* rewrite variable declaration for clarity

also update comment, per review feedback

* update to latest versions

selects latest versions compatible with these changes
* update Advanced() method

at the top-level of bleve, Advanced no longer knows about
K/V stores, instead it only returns reference to the internal
index implementation.

this will allow us to remove the Advanced() method from the
internal index implementation.

NOTE: users who previously needed to access the underlying
K/V store, will still be able to do so, but must first
type assert that the internal index impl is
*UpsideDownCouch and if it is, they can use that Advanced()
method directly

* relocate store pkg inside upsidedown (#1510)

* relocate store pkg inside upsidedown

since kvstores are a concept unique to the upsidedown index
implementation, the store folder should be relocated inside
of that package

* remove Advanced() method from scorch (#1511)

* remove Advanced() method from scorch

after the API is updated, scorch no longer requires
this useless method to satisfy the interface

* update to latest apis
also, place shared kvstore test code in this
module, not in upsidedown (avoids circular deps)
previous the FieldCache type was in the index package,
but it is only used by upsidedown, so we relocate it there.
this removes the top-level bleve err ErrUnknownIndexType
instead, now the upsidedown specific error is returned.

users who actually care and are checking for this error type
still can, only now it is defined in the upsidedown package.
this interface is only used by the searcher package, thus
it makes sense to define it there
the dump methods were only ever supposed by upsidedown
now, users of this method must type-assert that their reader
is from the upsidedown package, where these methods remain,
unchanged.
mschoch and others added 8 commits December 16, 2020 10:03
these cause circular dependencies and are easy for applications
to do themselves
* use the moved indexing options

* update analysis.TokenFrequency to use options (#1522)

* update analysis.TokenFrequency to use options

previously it used boolean, and future changes would
require additional booleans.  using options directly
keeps the function signatures cleaner.

* update to latest apis
* update to work with removed items

* update to latest api
instead use the interface exposed by index api
* update for latest renaming

* update to latest apis
one example required updating as it's custom sort contained a
tie (both fields with age 0).  upsidedown always returned the
results in a particular order, which was not guaranteed by scorch.
to address this, the exmaple has been updated to include the
doc id as an additional sort criteria (ensuring the original
order expected is kept by scorch)
* Adding configurable freq/norm processing

Useful for use cases where score relevancy isn't needed
for the search requests.

A special frequency value of zero/0 is used for skipping
the freq/norm.

* update to work with removed items (#1524)

* update to work with removed items

* update to latest api

* do not type assert on document or field (#1525)

instead use the interface exposed by index api

* update for latest renaming (#1527)

* update for latest renaming

* update to latest apis

* switch default index to scorch with latest zap (#1528)

one example required updating as it's custom sort contained a
tie (both fields with age 0).  upsidedown always returned the
results in a particular order, which was not guaranteed by scorch.
to address this, the exmaple has been updated to include the
doc id as an additional sort criteria (ensuring the original
order expected is kept by scorch)

* update to release candidate apis (#1531)

Co-authored-by: Marty Schoch <marty.schoch@gmail.com>
@mschoch mschoch changed the title initial refactor to remove circular deps bleve v2.0.0 proposal PR Dec 31, 2020
mschoch and others added 5 commits January 5, 2021 11:54
it appears that while moving files around I mistakenly
changed a copyright header, which was not my intention
Right now `MatchOperator` constants are ints and when you're trying to
assign them to a variable you get an int. If you try to use those
variables as operator you get the following error:

```
cannot use operator (type int) as type MatchQueryOperator in argument to q.SetOperator

```

So, to fix this problem we just need to create `MatchOperator` constants
* update to v1 apis and zapx versions

* update to v1.0.1 upsidedown_store_api

* make sort integration tests more precise

the upsidedown implementation would break tie
sort order in a predictable way, and some tests
inadvertely relied on this behavior.

these tests have been adjusted to use a more
specific sort order, breaking all ties, such that
the expected results no longer depend on
implementation details.
@mschoch
Copy link
Member Author

mschoch commented Jan 12, 2021

@abhinavdangeti @sreekanth-cb last call before we merge to master, requesting final re-review.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, rebase.squash.merge away.

@mschoch mschoch merged commit 89234a6 into master Jan 12, 2021
@mschoch mschoch deleted the fix-circular-deps branch January 12, 2021 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants