Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Support wildcard search for annotation values and namespace values #47

Closed
abargnesi opened this issue Feb 25, 2015 · 4 comments
Closed

Comments

@abargnesi
Copy link
Member

Derived from #45.

Steps

  • Index suffixes.
  • Support compression/uncompression of FTS column values for SQLite. This reduces database size, but requires db-connection specific functions to be registered.
abargnesi added a commit to OpenBEL/rdf-misc that referenced this issue Feb 25, 2015
- no longer indexing uri, concept_type, identifier, pref_label, title,
  or alt_labels columns
- the text column contains divided suffixes for identifier, pref_label
  title, and alt_labels. for example the pref_label "AKT1" will be
  divided into "1 T1 KT1 AKT1". this allows for wildcard use to match
  prefix, infix, and postfix for a token.

refs OpenBEL/openbel-api#47
abargnesi added a commit that referenced this issue Feb 25, 2015
for example the search "glio neuron" will convert to "*glio* *neuron"
and can match a term like "ganglion interneuron"

refs #47
@abargnesi
Copy link
Member Author

Data is deployed to next.belframework.org.

@abargnesi
Copy link
Member Author

This presented a significant decrease in wall-clock time for single value wildcard matches like A*.

@abargnesi
Copy link
Member Author

Evaluating the use of the prefix option to index prefix matches more efficiently.

@abargnesi
Copy link
Member Author

The prefix option did not have any decrease in wall-clock time for queries like A*.

abargnesi pushed a commit that referenced this issue Apr 29, 2016
abargnesi pushed a commit that referenced this issue Jul 5, 2016
Squashed commit of the following:

commit 804c313
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 8 05:35:52 2016 -0400

    bump versions; published 1.0.1

commit a01f56f
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 8 03:29:17 2016 -0400

    bumped bel to version 1.0.0

commit c28b74d
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Jun 7 15:35:11 2016 -0400

    set language version as configured in OpenBEL API

commit d023769
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jun 6 11:24:05 2016 -0400

    /api/version route; exposes API semantic version

commit 12af9ce
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jun 6 10:42:55 2016 -0400

    refactored /api/language routes into one class

commit 5005429
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 1 14:44:46 2016 -0400

    remove explicit statement parse for nanopub

    statement parsing is encapsulated within Nanopub state

commit 335a982
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jun 1 14:38:09 2016 -0400

    create Annotation model before unification

commit 24f3cdf
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 31 10:57:11 2016 -0400

    json-format filters; thanks @wshayes!

commit d15f0e3
Merge: b36876e be3bba1
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 20:19:06 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit b36876e
Author: Nick <nick@>
Date:   Fri May 27 14:15:18 2016 -0400

    change nanopubs_store to nanopub_store

    The latter is what is used in code.

commit be3bba1
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 19:45:49 2016 -0400

    Fixed some typo's

commit 9e15b4f
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 27 19:37:09 2016 -0400

    Updating configuration and API documentation

commit 5452f09
Author: Nick <nick@>
Date:   Fri May 27 14:15:18 2016 -0400

    change nanopubs_store to nanopub_store

    The latter is what is used in code.

commit 13fe4d2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 02:05:42 2016 -0400

    fix reference to BELParser default resources

    refs OpenBEL/bel_parser#44

commit 07ee8d5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 02:00:58 2016 -0400

    functional validation API for expressions

    closes OpenBEL/bel_parser#44

commit 38dad57
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri May 27 01:57:27 2016 -0400

    added validation API doc within /api/expressions

commit e0aa6fb
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Thu May 26 21:12:04 2016 -0400

    Added /api back to all routes

commit 69e07c2
Merge: 1d90827 8dc3089
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Wed May 25 13:30:16 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit 1d90827
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Wed May 25 13:30:10 2016 -0400

    Updated RAML file - schemas and examples are now embedded

commit 8dc3089
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 01:34:23 2016 -0400

    [wip] Result for expression validation.

commit 7ab1680
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 00:47:47 2016 -0400

    config the default URI reader to ref TDB directory

    The default URI reader is established as the TDB directory that the
    biological concepts come from.

    The default URL reader will be ResourceURLReader and will only be used
    when the URI cannot be determined for a resource.

commit f710ac5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 25 00:20:52 2016 -0400

    pluralize the "nanopubs" route; /api/nanopubs/...

    renamed route file, route class name, paths, and references

commit f109a3c
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 24 09:50:43 2016 -0400

    datasetload; serialize statement from hash

    The bel_statement is serialized after hash conversion in order to be
    saved to Mongo.

commit d0a71c0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 23 16:16:00 2016 -0400

    refactor generate_uuid as instance method in mixin

commit 2217192
Merge: 7e454a3 fcb8d52
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Mon May 23 11:29:49 2016 -0400

    Merge branch 'next' of github.com:OpenBEL/openbel-api into next

commit 7e454a3
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Mon May 23 11:29:41 2016 -0400

    Fixed nanopub renaming issue

commit fcb8d52
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 16 13:11:45 2016 -0400

    refactor expression components api for bel_parser

    use the BELParser::Expression::Model as parsed objects

    removed unused classes that leveraged libbel APIs; the libbel API
    will be removed from bel.rb when bel_parser is fully integrated.

commit 81a79db
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Sat May 14 15:07:06 2016 -0400

    TmpFix for BEL language version (text/plain) issue

    Would only return the text/plain version never the application/json version.  I changed it to only return the JSON formatted data and commented out the Accept header option code.

commit f2066aa
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 13 20:22:39 2016 -0400

    Missed a nanopub -> Nanopub edit

commit 07edd9d
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri May 13 10:40:00 2016 -0400

    Refactor naming and language paths

    Refactored naming:  evidence to nanopub, summary text to support
    Moved /api/{functions|relations|version} to /api/language/...

commit dda76e9
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed May 11 15:11:08 2016 -0400

    rename for Nanopub model; refs OpenBEL/bel.rb#121

commit a1dafde
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue May 10 15:54:56 2016 -0400

    set bel & bel plugins to version, ~> 1.0.0.beta

commit 9e60c51
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Tue May 3 11:22:43 2016 -0400

    Remove sinatra reloader - no longer needed

commit b0a6058
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:50:11 2016 -0400

    return first for annotation/namespace properties

commit 27ce1e4
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:35:09 2016 -0400

    guard when item does not respond to match_text

    annotation_value/namespace_value resources

commit 937b3f2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:27:47 2016 -0400

    correct inScheme (in_scheme accessor) in namespace

commit 665f18a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 15:20:12 2016 -0400

    fix fromSpecies accessor (from_species)

    refs OpenBEL/bel.rb#120

commit 9446578
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon May 2 14:14:29 2016 -0400

    bumped bel.rb dependency to version 1.0.0

    1.0.0 is the version of bel.rb on the next branch. This will be the next
    major release of bel.rb. OpenBEL API needs version 1.0.0 in order to get
    bel_parser and translator plugin changes.

    refs OpenBEL/bel_parser#43

commit e57b936
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 12:21:41 2016 -0400

    remove return_type from relationship resource

    included some cleanup in route

    closes #48

commit d790081
Author: William Hayes <william.s.hayes@gmail.com>
Date:   Fri Apr 29 12:12:14 2016 -0400

    Partial update for /api/relationships

    Waiting on https://waffle.io/OpenBEL/bel_parser/cards/572386c9d39509b000f2b31b

commit 0da2c0e
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 11:25:36 2016 -0400

    fix vocab references due to rdf/rdf-vocab upgrade

commit b52355a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 11:18:35 2016 -0400

    fix pref_label accessor in routes/resources

    closes #47

commit 01e3060
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 02:41:43 2016 -0400

    bumped bel-rdf-jena plugin version to 0.4.3.beta

    Transitively includes 0.4.0.beta version of rdf-jena.

commit e696785
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 29 02:10:15 2016 -0400

    pass configured BEL version to Completion API

    update RDF serialization gems to version 2.0.0

    remove dependency on 'rdf' gem; already a dependency for bel.rb

    closes OpenBEL/bel_parser#45

commit 041174e
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 14:15:31 2016 -0400

    don't check cookie form if not using jwt=

commit 0e837bc
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 10:42:48 2016 -0400

    spec test auth capabilities

commit ec6a143
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 09:20:07 2016 -0400

    cleanup auth lint warnings

commit 863a3de
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Apr 19 09:19:45 2016 -0400

    fix token query string access in auth middleware

commit b2607e9
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 13:38:15 2016 -0400

    refactored /api/functions for BEL 1.0 / 2.0

    The functions route now uses the configured BEL specification to
    return functions. So far the short, long, description, and return type
    are provided.

    Updated functions resources to match object model.

    refs OpenBEL/bel_parser#33

commit 2dfe73f
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 13:35:51 2016 -0400

    added "bel.version" setting to configuration

    added bel_parser gem as runtime dependency in .gemspec

    validate bel.version is set in configuration and that it is a defined
    BEL specification (BELParser::Language.defines_version?)

commit c9c29f5
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Fri Apr 8 12:48:45 2016 -0400

    bumped version to 1.0.0; prepped CHANGELOG

    1.0.0 will be a major version bump to support a configurable BEL
    specification. This will bring support for BEL 2.0 into OpenBEL API.

commit 74517e2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 30 20:41:01 2016 -0400

    bumped version to 0.6.3; added changelog item

    refs #108

commit 31a27b9
Merge: 22eed27 29eb920
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 30 20:33:06 2016 -0400

    Merge branch 'master' into next

commit 22eed27
Merge: 386c2ea 8d79b26
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Wed Mar 30 20:28:27 2016 -0400

    Merge pull request #108 from nbargnesi/param_auth

    look for tokens as parameters as well

commit 8d79b26
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Wed Mar 30 16:51:58 2016 -0400

    look for tokens as parameters as well

commit 386c2ea
Merge: b2abcdf ca2c733
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 23 09:21:37 2016 -0400

    Merge branch 'master' into next

    fixed conflicts in CHANGELOG.md, UPGRADING.md, and VERSION by keeping
    master's changes.

commit b2abcdf
Merge: be2e6e1 85cd7a3
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 22 22:21:41 2016 -0400

    Merge pull request #106 from nbargnesi/issue105

    fixes #105

commit 85cd7a3
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Mar 22 18:17:44 2016 -0400

    fixes #105

commit be2e6e1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 15:07:24 2016 -0400

    replace method for BEL.keys_to_symbols

    additional style alignment

commit fbf5368
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 09:25:06 2016 -0400

    return 404 when translating empty evidence results

    refs #44

commit ac61baf
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:32:37 2016 -0400

    added storage.engine note for UPGRADING to 0.6.0

commit 3f4f700
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:27:14 2016 -0400

    added UPGRADING guide

commit 29f86e8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 08:05:01 2016 -0400

    added document for 0.6.0 mongodb migration

commit 0e22354
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:30:26 2016 -0400

    add configuration check for MongoDB 3.2

    Check will fail to start OpenBEL API is MongoDB is < 3.2

commit 45e5e39
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 15 06:17:57 2016 -0400

    added missing arg to render evidence collection

commit 1edb037
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 14:45:43 2016 -0400

    set mongo operation timeouts to unbounded

    The operation timeout is the number of seconds that can pass before
    subsequent reads from a mongo operation. This change makes this read
    timeout unbounded in order to satisfy long evidence and facet creation
    queries.

commit 39524ca
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 13:46:25 2016 -0400

    remove cache facets during dataset load

    Cached facets were removed at the end of a dataset load. Now they are
    additionally removed at the start of the load as well as every increment
    of 10k nanopubs loaded.

commit 68c2107
Merge: de9a500 61a291d
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 14 12:50:35 2016 -0400

    Merge branch 'next' into rewrite_references

commit 61a291d
Merge: 1b4dbb7 1bdf14e
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Mon Mar 14 12:20:40 2016 -0400

    Merge pull request #101 from nbargnesi/issue100

    Issue100

commit 1bdf14e
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Mon Mar 14 12:05:43 2016 -0400

    document auth.enabled, auth.secret

commit 0e900f6
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:56:15 2016 -0500

    include only auth enabled/secret in default config

    for #100

commit fbb8b06
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:55:54 2016 -0500

    simplify authenticate route to enabled/disabled

commit fe724ff
Author: Nick Bargnesi <nbargnesi@selventa.com>
Date:   Tue Feb 2 13:54:30 2016 -0500

    remove rest-client dependency

commit de9a500
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Mar 10 14:29:16 2016 -0500

    set mongo connection pool size to 30

    This number was chosen in order to have at most 30 long-running queries
    simulaneously executing. This would then fail the 31st query unless a
    connection could be obtained with a timeout of 5 seconds.

commit 8d46fc1
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Mar 9 14:54:15 2016 -0500

    do not index value of experiment_context/metadata

    annotation values can be large amount of text that will not fit into an
    index key of 1024, if it's attempted you may see an error:

      WiredTigerIndex::insert: key too large to index...

commit 4426582
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 23:01:46 2016 -0500

    flatten translator arrays so we return one, if any

commit 4d42c35
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:38:41 2016 -0500

    bump puma to 3.1.0

commit 5081567
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Mar 8 20:36:41 2016 -0500

    remove unnecessary local variables

commit 32c5e56
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:59:38 2016 -0500

    Update README.md

commit 53ea95f
Author: Tony Bargnesi <abargnesi@gmail.com>
Date:   Tue Mar 8 16:51:59 2016 -0500

    Update README.md

commit 53653c0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Mar 7 23:06:27 2016 -0500

    correct references when serialization evidence

    using rewrite references work in bel.rb

commit 1b4dbb7
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 16:11:02 2016 -0500

    convert /api/evidence to BEL using translators

    factored out rendering of evidence_resource_collection to evidence
    helper

    refs #44

commit 3500811
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:20:01 2016 -0500

    factored out filters validation into helper

    functional decomposition of filter validation for better
    understanding and maintenance; now reporting multiple JSON errors when
    responding with 400.

commit 83935aa
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:18:27 2016 -0500

    added doc for opening ::Sinatra::Helpers::Stream

    It is important to convey why methods were added to this class. The
    methods are a convenience so RDF.rb's writers can expect to call them.

commit c984f8a
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Feb 2 15:08:44 2016 -0500

    bump version dependencies for bel-rdf-jena / rdf

    rdf bumped to 1.99.1

    bel-rdf-jena bumped to 0.4.2

commit e4eb5dd
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Feb 1 14:50:34 2016 -0500

    dataset serialization to all bel.rb translators

    updated dependencies to support all bel.rb translators

    refs #99

commit b1243d8
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Tue Jan 26 15:57:16 2016 -0500

    aggregate on full-text search; avoids Mongo limits

    A full-text search filter to /api/evidence with a sort on bel_statement
    only used the text index. This means that the bel_statement sort had to
    be done in memory.

    This reaches the 32 MB sort limit with only several tens of thousands of
    documents.

    The solution employed here was to use cursored aggregation allowing disk
    use for sort stages.

    The solution was introduced as an alternative code path if a FTS filter
    was included in the HTTP request. Although this did minimize the risk of
    regression there is a fair bit of to clean up in the mongo
    access layer.

    closes #96

commit 5d44fd0
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 21:48:12 2016 -0500

    return annotation/namespace defs in BEL Script

    removed normalization of experiment_context annotation keywords. The
    normalized names were in inconsistent with references.annotations
    definitions.

    integrate next version of bel.rb (0.4.3) to get fixes for
    annotation/namespace formats.

    refs #95

commit 92f7e7e
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 15:51:14 2016 -0500

    require MongoDB 3.2; closes #98

commit 0507714
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 25 14:57:28 2016 -0500

    added 0.6.0 mongo migration helper, details follow

    The clear_evidence_facets_cache.rb mongo migration will clear out new
    evidence facet cache storage in case searches were built before
    migrating all documents in the "evidence" collection.

commit 7707a92
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 14:16:24 2016 -0500

    fix /api/datasets/{id}/evidence for facet changes

    Now facets correctly in light of evidence facet changes and respects
    "max_values_per_facet".

commit 19eedef
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:10:57 2016 -0500

    add scripts for Mongo data migrations in 0.6.0

    - Drops evidence_facets since it has been replaced by
      evidence_facet_cache plus individual "evidence_facet_cache_{UUID}"
      collections.
    - Updates each evidence document to have "facets" field contain JSON
      objects instead of JSON strings.

commit 21a7bc4
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Thu Jan 14 13:08:32 2016 -0500

    bumped next version to 0.6.0

    Minor release looking to include:
    - New evidence facet storage in mongo.
    - Improve dataset import for large documents (occasional OOM).
    - Evidence streaming.
    - Evidence export to multiple formats.

commit bb2ac16
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:44:47 2016 -0500

    facet cache collection creation and removal

    This design builds individual facet_cache collections based on the
    filters applied to the evidence collection. Each filtered evidence
    collection will get it's own "evidence_facet_cache_{UUID}" mongo
    collection. The facets values are grouped by category, name so it's
    trivial to cursor out the facets (still need to set the filter string
    though).

    This alleviates the max document size issue for large evidence
    collections. A max of 1000 facet values can be added to each category,
    name pair in order to stay within the size limit.

    Facet cache eviction isn't great here:

    - Individual evidence changes require removal of facet caches for the
      empty filter search as well as any overlapping filter/facet.
    - Creation or removal of a dataset will remove all facet caches. The
      thought is that for large dataset imports it is more effective to
      regenerate than cache vs. trying to synchronize it with new data.

    This includes a breaking change to evidence document schema. The
    evidence "facets" array stores the full category, name, value json
    objects instead of flat strings. This is done to make it possible to
    separate values into category, name groupings. We should include an
    upgrade note for this and possibly a script.

commit f5a08a3
Merge: f038be2 a515587
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Wed Jan 13 16:42:24 2016 -0500

    Merge branch 'master' into next

commit f038be2
Author: Anthony Bargnesi <abargnesi@selventa.com>
Date:   Mon Jan 11 22:58:47 2016 -0500

    batch evidence to an array, avoid JRuby enumerator

    The JRuby enumerator uses a thread per next object in an enumerator
    which proves costly. Hundreds of threads are created (tested with
    yourkit) when batch-creating evidence due to the "each_slice(500)" of
    the enumerator.

    This issue is logged in JRuby:
    jruby/jruby#2577

    The solution employed was to yield each evidence directly to the block
    and batch 500 into an array at a time. This should avoid the OOM
    exception received:

    ava.lang.OutOfMemoryError: unable to create new native thread

    Indeed the thread count was observed to be lower in yourkit.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant