-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New highlight controller and es highlight functionality #1207
Conversation
This means that the developers will need to keep track of which corpora are compatible with the new highlighter, i.e. those that have been indexed with the term vector 'positions_offsets' on the top level of the main content field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the new look, this looks really nice!
The frontend logic does not seem to work well:
- When I make any change to the filters, the highlight setting
highlight=200
gets added to the URL. Highlighting should only be specified in the URL if it is a non-default setting. - Interestingly, I cannot use the date filter in your branch - it always resets to the default value.
backend/addcorpus/corpus.py
Outdated
TODO: remove this property and its references when all corpora are reindexed using the | ||
current definitions (with the top-level term vector for speech) | ||
''' | ||
return True if self.es_index in NEW_HIGHLIGHT_CORPORA else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True if self.es_index in NEW_HIGHLIGHT_CORPORA else False | |
return self.es_index in NEW_HIGHLIGHT_CORPORA |
backend/addcorpus/es_mappings.py
Outdated
@@ -1,16 +1,22 @@ | |||
def main_content_mapping(token_counts = True, stopword_analysis = False, stemming_analysis = False): | |||
def main_content_mapping(token_counts = True, stopword_analysis = False, stemming_analysis = False, positions_offsets = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is good to support the legacy setup, but the default should be the new one.
positions_offsets
is not really a clear name.
I would suggest legacy_highlighting = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called it updated_highlighting
, since the True
state would mean that it has updated highlighting capabilities.
# list of corpora that have been re-indexed using the top-level term vector | ||
# for main content fields, needed for the new highlighter | ||
NEW_HIGHLIGHT_CORPORA = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this implementation of the term vector setting. Whether a corpus supports character positions for term vectors is now set in settings.py as well as the corpus definition. Why both? The addition to the settings file seems a fine way to handle a temporary transition, but it is now also hard-coded in the field definition.
I would suggest using only one of those, so either:
- Use the
settings.py
configuration to infer the mapping when indexing - Specify the mapping in the corpus definition. If you need a top-level corpus property to serialise to the frontend, infer it from the field mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for implementing it this way, on two location is as follows: the corpus definition declares how corpora should be indexed, whereas the settings reflect which corpora have been re-indexed using those settings. It is only for the corpora that have been reindexed that the new highlighter should be turned on. In time, we can get rid of the list in settings.py
. I realize it is an ugly solution but it is all I could come up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense; I agree that this works as a temporary solution.
I think it is this line which confused me:
https://github.com/UUDigitalHumanitieslab/I-analyzer/blob/3f9b4594bf231a15c0a1509ea50751310011dac9/backend/corpora/parliament/utils/field_defaults.py#L296
Which suggests that you would still be hard-coding this in individual corpus definitions.
<button class="button is-light" (click)="updateHighlightSize('more')"> | ||
<i class="fa fa-plus" aria-hidden="true"></i> | ||
</button> | ||
<button class="button is-light" (click)="updateHighlightSize('less')"> | ||
<i class="fa fa-minus" aria-hidden="true"></i> | ||
</button> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these buttons have only an icon, add aria-label
for accessibility
<button class="button is-light" (click)="updateHighlightSize('more')"> | |
<i class="fa fa-plus" aria-hidden="true"></i> | |
</button> | |
<button class="button is-light" (click)="updateHighlightSize('less')"> | |
<i class="fa fa-minus" aria-hidden="true"></i> | |
</button> | |
<button class="button is-light" (click)="updateHighlightSize('more')" aria-label="increase preview-size"> | |
<i class="fa fa-plus" aria-hidden="true"></i> | |
</button> | |
<button class="button is-light" (click)="updateHighlightSize('less')" aria-label="decrease preview size"> | |
<i class="fa fa-minus" aria-hidden="true"></i> | |
</button> |
text-align: right; | ||
} | ||
|
||
.field.has-addons { | ||
justify-content: flex-end; | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are also available as bulma classes has-text-right
and is-justify-content-flex-end
respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has-text-right
works, but is-justify-content-flex-end
does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The highlights-subtitle
class can go.
<tr> | ||
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div> | ||
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet --> | ||
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]"> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not your addition, but please remove the table row wrapper:
<tr> | |
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div> | |
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet --> | |
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]"> | |
</tr> | |
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div> | |
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet --> | |
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]"> |
This does not affect the visuals but it means the document has a normal table structure, which helps with accessibility.
</div> | ||
</div> | ||
<div class="control"> | ||
<button class="button highlight-toggle" [class.is-active]="highlight!=0" [disabled]="queryModel.queryText==null||queryModel.queryText==''" (click)="updateHighlightSize(highlight!=0 ? 'off' : 'on')"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the query text could also be undefined
, not sure. [disabled]=!queryModel.queryText
would the the quicker way to do this check, I think, but you could also add a hasQueryText
property to the query model.
If the highlight is disabled, can you add a balloon explaining why?
@@ -2,7 +2,7 @@ import { Component, Input, OnChanges, OnDestroy, SimpleChanges } from '@angular/ | |||
import { QueryModel } from '../models'; | |||
|
|||
|
|||
const HIGHLIGHT = 200; | |||
const HIGHLIGHT = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this an unreachable value? Why is it the default? Do you need a default value at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
frontend/src/app/utils/es-query.ts
Outdated
})) | ||
fields: highlightFields.map( function(field) { | ||
return field.displayType == "text_content" && field.positionsOffsets && corpus.new_highlight ? // add matched_fields for stemmed highlighting ({ [field.name]: {"type": "fvh", "matched_fields": ["speech", "speech.stemmed"] }}): | ||
({ [field.name]: {"type": "fvh", "matched_fields": ["speech", "speech.stemmed"] }}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard-coding the parliament fields.
@@ -41,7 +41,7 @@ | |||
</div> | |||
<div *ngIf="tabIndex==0"> | |||
<ng-container *ngFor="let field of contentFields"> | |||
<div class="content" *ngIf="document.fieldValues[field.name]" [innerHtml]="document.fieldValues[field.name] | highlight:query"></div> | |||
<div class="content" *ngIf="document.fieldValues[field.name]" [innerHtml]="highlightedInnerHtml(field)"></div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you did not adapt the highlight pipe for this? If so, I think the pipe is unused now and can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The highlight pipe is still used as a fallback option if es provides no highlight options (such as for the legacy corpora), as well as for the manual.
trim redundancies in highlight selector
make highlight selector consistent with other selectors make highlight switches not trigger ParamsHaveChanged make the highlight controller reactive to queries
Revert "default highlight overhaul" This reverts commit 2e5d016c0626471b9058e6b03a6c0778394c5a9d. Revert "Revert "default highlight overhaul"" This reverts commit 569dcaa8768eebd81acf5527a6a1a4f3c11a65e4. improve placement of checkHighlight
3f9b459
to
816a490
Compare
Alright I think I addressed all your comments. Please let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One css class can go, otherwise all seems sensible!
This branch contains a new highlight controller that is clickable instead of an input field and the highlight functionality is taken directly from elasticsearch for corpora that have been indexed using the term_vector: 'positions_offsets' setting on the top layer of the main content field.
A new option has been added to the main content field mapping in the backend, which lets users select whether they want the main content field to be indexed as such. The new highlight functionality is disabled for corpora that have not been indexed in this way, but because there is no way (that I could find) to query elasticsearch directly to check the term_vector value of the
_source
field, for now the developers will need to manually declare which corpora are compatible with the new highlight functionality in the local_settings. In the future, when all corpora have been indexed in this way, this variable can be removed from the settings. All the files that were changed for this conditionality can be found in commit5f09943021a292c2342912f86a98739eacaa9b44
Feel free to suggest a more elegant solution!The elasticsearch highlight functionality allows for the highlighting of stemmed matches and removes the ugly whitespaces in the highlight.
The button has been styled according to Luka's feedback, but I share the feeling that it is not really in its 'final form'. I think some user input could be useful here, and we might want to do a review of the UI at some point where we address this in a more global way.