Skip to content

Commit

Permalink
Highlight search refactoring (#227)
Browse files Browse the repository at this point in the history
* boost filtered by grammar score

* results language mapping on grammar search

* SearchGrammarsV2 fix

* grammar fixes

* grammars recover fix

* VAR_TEXT log

* grammar $text fixes and refactoring

* Completed plus

* content_type filter

* VariableMapToFilterValues fix

* remove spaces

* NewFilteredResultsSearchRequest error handling

* fix retrieveTextVarValues

* remove Explain from NewResultsSuggestGrammarV2CompletionRequest

* critical fix on assigment of values to variables in grammar

* fix ES_SRC_PARENTS_FOR_CHAPTER_POSITION_INDEX_LETTER

* synonym for Rabash Assorted Notes

* fix synonym (set tab instead space)

* Incr. filtered by grammar MaxScore

* grammar filter search - progress and fixes

* meal var value

* rt for sourceRequests

* remove tweets search from filter grammar

* grammar engine refactoring

* highlight for grammar filtered search

* fix grammar suggest search

* grammar - heb test for hey letter

* restore prev commit

* refactoring

* fix sources index test

* map geresh symbols to apostrophe

* filter.grammar rules on russian and spanish

* content_type var in spanish and russian

* Ignore grammar for known source titles

* grammar filtered scores logic

* fix landing page intents

* CT vars: virtual_lessons, women_lessons (no spanish)

* small refactoring

* fix when we do not have regular results

* ver. incr.

* search without synonyms for grammar percolate

* FILTERED_BY_GRAMMAR_SCORE_INCREMENT = 100

* add title suggest weight = 40

* add title suggest weight = 40

* grammar filtered scores logic

* cancel PerculateQuery on exact terms

* additional variable values for sources and meals in hebrew

* books_titles + he fixes in content_type.variable

* new grammar boost logic

* content_type variables file fixes

* move suggest to right place

* suggest weight for ptiha

* Save deb parameter

* Correct

* consider global max score from filtered results

* remove space

* incr pticha autocomplete weight to 250

* undo changes

* comment

* remove commented out code

* comment

* comment

* comment

* version incr.

* comment

* comment

* whitelist indexing CT_LESSONS_SERIES

* Add grammar search latency to log

* avoid cases with more than 1 filter intent

* fix error message

* version incr.

* support of grammar filtered search in intents engine

* seperate grammars logic to allow the return of filter intents before searching for filtered results

* typo suggest - consider grammar free text

* filter intents small refactoring and fixes

* fixes

* consider grammar CT for intent types

* Intents carousel according to grammar - fixes

* INTENTS_SEARCH_BY_FILTER_GRAMMAR_COUNT

* more content_type var values for he,lessons

* FILTER_GRAMMAR_INCREMENT_FOR_MATCH_CT_AND_FULL_TERM only above 5

* avoid searching full term in grammar filter search if CT is articles

* filtered results scores twick

* landing page heb daily kabbalah lesson value

* temp remove of content_type heb daily kabbalah lesson value

* fix GrammarVariablesMatch validation for filter intents

* update hebrew vars

* IntentSearchOptions (refactoring)

* comments

* ver incr

* grammar filter by source - progress

* filter by source progress

* sources grammar - carousel support WIP

* assign zero score for filtered hits that duplicate carousels results

* includeTypedUidsFromContentUnits

* adding he definite article 'the'

* adding 'writings' to books_titles

* remove the word 'peace' from Rabash synonym

* Revert "remove the word 'peace' from Rabash synonym"

This reverts commit 3054def.

* commented out log for debug

* Tream double single quotes as double quotes.

* Bring better landing pages by filtering duplications by collection

* comment

* comment

* ver. incr.

* LoadSourceNameTranslationsFromDB and combine results with translations from file

* set 'field' for percolatorQuery

* by_source rule progress

* additional source variables

* classification grammar by source

* cancel  carousel  for source+free text

* GRAMMAR_INTENT_CLASSIFICATION_BY_CONTENT_TYPE_AND_SOURCE progress

* GRAMMAR_INTENT_CLASSIFICATION_BY_CONTENT_TYPE_AND_SOURCE in GRAMMAR_INTENTS_TO_FILTER_VALUES

* assignedRulesSuggest no Suffixes

* GrammarVariablesMatch check for by_content_type_and_source rule

* fix GrammarVariablesMatch for GRAMMAR_INTENT_CLASSIFICATION_BY_CONTENT_TYPE_AND_SOURCE

* GRAMMAR_PERCULATE_SIZE = 5

* zohar source variables

* grammar scoring fixes for classification intents and misc

* disable intents engine if classification intents are returned from grammar

* avoid duplicates in classification intents from grammar engine

* classification.grammar fixes

* esacping q. in grammar index

* select intent with max score

* remove duplicate

* source variables

* article variable with :

* more source grammar filter variables and rules

* GRAMMAR_INTENT_FILTER_BY_SOURCE rule is not triggered with section filters

* Classification intents - combine and normalize results from GrammarEngine and IntentsEngine

* boostClassificationScore changes

* volume synonym

* volume synonym fix

* explanation to ClassificationIntent

* disable GRAMMAR_INTENT_CLASSIFICATION_BY_CONTENT_TYPE_AND_SOURCE

* avoid setting currentLang of empty results

* spanish grammar and variable data for by_source rule

* remove consts.GRAMMAR_INTENT_CLASSIFICATION_BY_CONTENT_TYPE_AND_SOURCE

* fix the remove of hits from 'filter grammar' that duplicates carousels source

* Introduction to The Study of the Ten Sefirot

* CONTENT_TYPE_INTENTS_BOOST

* GRAMMAR_INTENT_BY_POSITION WIP

* GRAMMAR_INTENT_FILTER_BY_SOURCE_AND_POSITION WIP

* grammar for position and position type - progress

* grammar small refactoring

* rename variable $PositionType to $DivisionType

* position var fixes

* Load source translations - Filter out Rabash Assorted Notes

* fix source variables SQL query

* remove by_source_and_position

* add heb. volume variable value

* fix typo

* disable suggest for GRAMMAR_INTENT_SOURCE_POSITION_WITHOUT_TERM

* sourcePathFromSql

* RB+BS Eng in source variables

* var values for articles and letters

* language fix in source.variable

* Rabash assorted notes source variable values

* remove 'note' from division type

* volume ru division_type

* GRAMMAR_INTENT_SOURCE_POSITION_WITHOUT_TERM progress

* fix source variable

* GRAMMAR_INTENT_SOURCE_POSITION_WITHOUT_TERM source fix

* article word  in ukraine

* remove Rabash articles from source.variable

* Remove Rabash Letters from source.variable

* sourcePathFromSql with leafPrefixType

* GRAMMAR_INTENT_SOURCE_POSITION_WITHOUT_TERM fix

* list of sources that will not be included in SourcesByPositionAndParent map

* fix for grammar filter by source

* Disable 'by content type' priorty boost if the query contains a number

* Tfilat Rabim article more common spelling

* fix loadSourcesByPositionAndParent

* sourcePathFromSql - attempt with default language (He)

* Allow only single sourcesPositionWithoutTerm

* not trigger grammar if the query equals to a value from source variables

* remove loadSourceTitlesWithMoreThanOeWord

* getSingleHitIntentsBySource when query term == source name

* allow grammar if term is author

* Return Library Landing Page for author names

* improve author recognition in SearchGrammarsV2

* check if term identical to source without quotes

* Fix logic of 'by content type' priorty boost

* If term identical to a name of author. Search only for Landing Pages grammar

* Allow LP intents for terms that identical to source names (not only authors)

* fix log message

* fix log

* GRAMMAR_INTENT_SOURCE_POSITION_WITHOUT_TERM fix

* New TES value for RU source variable

* Allow new life result to be on 4th place

* lower classification intents score in getSingleHitIntentsBySource to display the source above them

* naming

* CONTENT_TYPE_INTENTS_BOOST = 4

* getSingleHitIntentsBySource according to section filter

* fix source name translations query

* source var fixes

* fix LoadSourceNameTranslationsFromDB SQL query

* remove commented code

* section filter support for "source position without term" rule

* add SRC_CONNECTING_TO_THE_SOURCE to SOURCE_PARENTS_NOT_TO_INCLUDE_IN_VARIABLE_VALUES

* change var names

* fix error func name in message

* comment

* comment

* better comment

* remove comment

* using consts for source types

* remove commented code

* comments

* comments

* grammar request refactoring

* fix NJ variable value

* remove page in  TAAS vol. 1 exp. link

* remove 'time' part from film date

* version incr.

* filtered search with sections

* fix SearchByFilterIntents param

* NewFilteredResultsSearchRequest fixes and refactoring

* peace article value according to autocomplete

* Use timeoutForHighlight in ES API instead of request context

* ver. incr.

* const name

* comment about scores logic

* fix comment typo

* comment fix

* highlight search refactoring for performance

* fix for bad result of searching some sources with suggested full_title

* matan tora source variable

* version incr.

Co-authored-by: LAPTOP-NFLD56CB\Yuri <yurihechter@gmai.com>
Co-authored-by: Evgeny_v <gen.vinnikov@gmail.com>
Co-authored-by: davgur <gur28davaravut>
Co-authored-by: bbfsdev <bbfsdev@gmail.com>
  • Loading branch information
4 people committed Mar 16, 2021
1 parent 3c1bb42 commit bf0c094
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 20 deletions.
2 changes: 1 addition & 1 deletion api/handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@ func SearchHandler(c *gin.Context) {
// temp. disable typo suggestion for other interface languages than english, russian and hebrew
(c.Query("language") == consts.LANG_ENGLISH || c.Query("language") == consts.LANG_RUSSIAN || c.Query("language") == consts.LANG_HEBREW)

timeoutForHighlight := viper.GetString("elasticsearch.timeout-for-highlight")
timeoutForHighlight := viper.GetDuration("elasticsearch.timeout-for-highlight")

res, err := se.DoSearch(
context.TODO(),
Expand Down
57 changes: 40 additions & 17 deletions search/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,7 @@ func (e *ESEngine) timeTrack(start time.Time, operation string) {
e.ExecutionTimeLog.Store(operation, elapsed)
}

func (e *ESEngine) DoSearch(ctx context.Context, query Query, sortBy string, from int, size int, preference string, checkTypo bool, timeoutForHighlight string) (*QueryResult, error) {
func (e *ESEngine) DoSearch(ctx context.Context, query Query, sortBy string, from int, size int, preference string, checkTypo bool, timeoutForHighlight time.Duration) (*QueryResult, error) {
defer e.timeTrack(time.Now(), consts.LAT_DOSEARCH)

// Initializing all channels.
Expand Down Expand Up @@ -880,8 +880,10 @@ func (e *ESEngine) DoSearch(ctx context.Context, query Query, sortBy string, fro
if ret != nil && ret.Hits != nil && ret.Hits.Hits != nil {

// Preparing highlights search.
mssHighlights := e.esc.MultiSearch()
highlightRequestAdded := false
// Since some highlight queries are acting like bottlenecks (in cases of scanning large documents)
// and may hold the overall search duration for a few tens of seconds,
// we prefer to execute several ES calls in parallel with a timeout limit for each call.
highlightRequests := []*elastic.SearchRequest{}

highlightsLangs := query.LanguageOrder
if !shouldMergeResults {
Expand Down Expand Up @@ -909,9 +911,7 @@ func (e *ESEngine) DoSearch(ctx context.Context, query Query, sortBy string, fro
if err != nil {
return nil, errors.Wrap(err, "ESEngine.DoSearch - Error creating tweets request in multisearch Do.")
}
mssHighlights.Add(req)

highlightRequestAdded = true
highlightRequests = append(highlightRequests, req)
}
}
continue
Expand Down Expand Up @@ -945,28 +945,51 @@ func (e *ESEngine) DoSearch(ctx context.Context, query Query, sortBy string, fro
size: 1,
preference: preference,
useHighlight: true,
partialHighlight: true,
Timeout: &timeoutForHighlight})
partialHighlight: true})
if err != nil {
return nil, errors.Wrap(err, "ESEngine.DoSearch - Error creating highlight request in multisearch Do.")
}
mssHighlights.Add(req)

highlightRequestAdded = true
highlightRequests = append(highlightRequests, req)
}

if highlightRequestAdded {
if len(highlightRequests) > 0 {

log.Debug("Searching for highlights and replacing original results with highlighted results.")

var wg sync.WaitGroup
wg.Add(len(highlightRequests))
mhErrors := make([]error, len(highlightRequests))
mhResults := make([]*elastic.MultiSearchResult, len(highlightRequests))

beforeHighlightsDoSearch := time.Now()
mr, err := mssHighlights.Do(context.TODO())
for i, hr := range highlightRequests {
go func(req *elastic.SearchRequest, idx int) {
highlightCtx, cancelFn := context.WithTimeout(context.TODO(), timeoutForHighlight)
defer cancelFn()
mssHighlights := e.esc.MultiSearch().Add(req)
mr, err := mssHighlights.Do(highlightCtx)
if highlightCtx.Err() != nil {
mhErrors[idx] = highlightCtx.Err()
} else {
mhErrors[idx] = err
}
mhResults[idx] = mr
wg.Done()
}(hr, i)
}
wg.Wait()
e.timeTrack(beforeHighlightsDoSearch, consts.LAT_DOSEARCH_MULTISEARCHHIGHLIGHTSDO)
if err != nil {
return nil, errors.Wrap(err, "ESEngine.DoSearch - Error mssHighlights Do.")
responses := []*elastic.SearchResult{}
for i, mhResult := range mhResults {
if mhErrors[i] == context.DeadlineExceeded {
continue
}
if mhErrors[i] != nil {
return nil, errors.Wrap(mhErrors[i], "ESEngine.DoSearch - Error mssHighlights Do.")
}
responses = append(responses, mhResult.Responses...)
}

for _, highlightedResults := range mr.Responses {
for _, highlightedResults := range responses {
if highlightedResults.Error != nil {
log.Warnf("%+v", highlightedResults.Error)
return nil, errors.New(fmt.Sprintf("Failed multi get highlights: %+v", highlightedResults.Error))
Expand Down
2 changes: 1 addition & 1 deletion search/models.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,6 @@ type SearchRequestOptions struct {
// We use this data for a further filtering out of hits recieved from 'grammar filter' search that duplicates the carousel items.
// Since the search for 'grammar filter' is async. to classification intents (carousel) search, we don't have yet the data for filterOutCUSources fild.
includeTypedUidsFromContentUnits bool
// If not nil, set how long a search is allowed to take, e.g. "1s" or "500ms".
// If not nil, set how long a search is allowed to take, e.g. "1s" or "500ms". Note: Not always respected by ES.
Timeout *string
}
7 changes: 7 additions & 0 deletions search/variables/source.variable
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ en,DVSS0xAR => Baal HaSulam Letters

en,qMUUn22b => Baal HaSulam Shamati

en,2bscFWf4 => matan tora

es,xtKmrbb9 => Estudio de las diez Sefirot
es,xtKmrbb9 => Estudio de las eser Sefirot
es,xtKmrbb9 => Estudio de las diez Hasefirot
Expand Down Expand Up @@ -83,6 +85,11 @@ he,qMUUn22b => בעל הסולם שמעתי
he,E9tXXYJv => תפילת רבים
he,E9tXXYJv => רבש תפילת רבים

he,qMUUn22b => בעל הסולם > שמעתי
he,DVSS0xAR => בעל הסולם > אגרות
he,xtKmrbb9 => בעל הסולם > תלמוד עשר הספירות
he,b8SHlrfH => הרב"ש > אגרות

ru,xtKmrbb9 => Учение Десяти Сфирот
ru,xtKmrbb9 => Талмуд Десяти Сфирот
ru,xtKmrbb9 => ТЕС
Expand Down
2 changes: 1 addition & 1 deletion version/version.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "fmt"
var (
Major = 1
Minor = 11
Patch = 1
Patch = 2
PreRelease = "dev"
)

Expand Down

0 comments on commit bf0c094

Please sign in to comment.