Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New highlight controller and es highlight functionality #1207

Merged
merged 20 commits into from
Aug 23, 2023

Conversation

Meesch
Copy link
Contributor

@Meesch Meesch commented Jul 21, 2023

This branch contains a new highlight controller that is clickable instead of an input field and the highlight functionality is taken directly from elasticsearch for corpora that have been indexed using the term_vector: 'positions_offsets' setting on the top layer of the main content field.

A new option has been added to the main content field mapping in the backend, which lets users select whether they want the main content field to be indexed as such. The new highlight functionality is disabled for corpora that have not been indexed in this way, but because there is no way (that I could find) to query elasticsearch directly to check the term_vector value of the _source field, for now the developers will need to manually declare which corpora are compatible with the new highlight functionality in the local_settings. In the future, when all corpora have been indexed in this way, this variable can be removed from the settings. All the files that were changed for this conditionality can be found in commit 5f09943021a292c2342912f86a98739eacaa9b44 Feel free to suggest a more elegant solution!

The elasticsearch highlight functionality allows for the highlighting of stemmed matches and removes the ugly whitespaces in the highlight.

The button has been styled according to Luka's feedback, but I share the feeling that it is not really in its 'final form'. I think some user input could be useful here, and we might want to do a review of the UI at some point where we address this in a more global way.

@Meesch Meesch requested a review from lukavdplas July 21, 2023 12:27
@lukavdplas lukavdplas mentioned this pull request Jul 27, 2023
Copy link
Contributor

@lukavdplas lukavdplas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the new look, this looks really nice!

The frontend logic does not seem to work well:

  • When I make any change to the filters, the highlight setting highlight=200 gets added to the URL. Highlighting should only be specified in the URL if it is a non-default setting.
  • Interestingly, I cannot use the date filter in your branch - it always resets to the default value.

TODO: remove this property and its references when all corpora are reindexed using the
current definitions (with the top-level term vector for speech)
'''
return True if self.es_index in NEW_HIGHLIGHT_CORPORA else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return True if self.es_index in NEW_HIGHLIGHT_CORPORA else False
return self.es_index in NEW_HIGHLIGHT_CORPORA

@@ -1,16 +1,22 @@
def main_content_mapping(token_counts = True, stopword_analysis = False, stemming_analysis = False):
def main_content_mapping(token_counts = True, stopword_analysis = False, stemming_analysis = False, positions_offsets = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It is good to support the legacy setup, but the default should be the new one.
  • positions_offsets is not really a clear name.

I would suggest legacy_highlighting = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it updated_highlighting, since the True state would mean that it has updated highlighting capabilities.

Comment on lines +92 to +94
# list of corpora that have been re-indexed using the top-level term vector
# for main content fields, needed for the new highlighter
NEW_HIGHLIGHT_CORPORA = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this implementation of the term vector setting. Whether a corpus supports character positions for term vectors is now set in settings.py as well as the corpus definition. Why both? The addition to the settings file seems a fine way to handle a temporary transition, but it is now also hard-coded in the field definition.

I would suggest using only one of those, so either:

  • Use the settings.py configuration to infer the mapping when indexing
  • Specify the mapping in the corpus definition. If you need a top-level corpus property to serialise to the frontend, infer it from the field mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for implementing it this way, on two location is as follows: the corpus definition declares how corpora should be indexed, whereas the settings reflect which corpora have been re-indexed using those settings. It is only for the corpora that have been reindexed that the new highlighter should be turned on. In time, we can get rid of the list in settings.py. I realize it is an ugly solution but it is all I could come up with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; I agree that this works as a temporary solution.

I think it is this line which confused me:
https://github.com/UUDigitalHumanitieslab/I-analyzer/blob/3f9b4594bf231a15c0a1509ea50751310011dac9/backend/corpora/parliament/utils/field_defaults.py#L296

Which suggests that you would still be hard-coding this in individual corpus definitions.

Comment on lines 10 to 17
<button class="button is-light" (click)="updateHighlightSize('more')">
<i class="fa fa-plus" aria-hidden="true"></i>
</button>
<button class="button is-light" (click)="updateHighlightSize('less')">
<i class="fa fa-minus" aria-hidden="true"></i>
</button>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these buttons have only an icon, add aria-label for accessibility

Suggested change
<button class="button is-light" (click)="updateHighlightSize('more')">
<i class="fa fa-plus" aria-hidden="true"></i>
</button>
<button class="button is-light" (click)="updateHighlightSize('less')">
<i class="fa fa-minus" aria-hidden="true"></i>
</button>
<button class="button is-light" (click)="updateHighlightSize('more')" aria-label="increase preview-size">
<i class="fa fa-plus" aria-hidden="true"></i>
</button>
<button class="button is-light" (click)="updateHighlightSize('less')" aria-label="decrease preview size">
<i class="fa fa-minus" aria-hidden="true"></i>
</button>

Comment on lines 11 to 19
text-align: right;
}

.field.has-addons {
justify-content: flex-end;

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also available as bulma classes has-text-right and is-justify-content-flex-end respectively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has-text-right works, but is-justify-content-flex-end does not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The highlights-subtitle class can go.

Comment on lines 50 to 54
<tr>
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div>
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet -->
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]">
</tr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not your addition, but please remove the table row wrapper:

Suggested change
<tr>
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div>
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet -->
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]">
</tr>
<div ngPreserveWhitespaces style="word-break:break-word" [innerHtml]="highlight"></div>
<!-- insert ellipsis [...] in between snippets or at the end of a single snippet -->
<hr *ngIf="highlight != document.highlight[field.name][document.highlight[field.name].length-1]">

This does not affect the visuals but it means the document has a normal table structure, which helps with accessibility.

</div>
</div>
<div class="control">
<button class="button highlight-toggle" [class.is-active]="highlight!=0" [disabled]="queryModel.queryText==null||queryModel.queryText==''" (click)="updateHighlightSize(highlight!=0 ? 'off' : 'on')">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the query text could also be undefined, not sure. [disabled]=!queryModel.queryText would the the quicker way to do this check, I think, but you could also add a hasQueryText property to the query model.

If the highlight is disabled, can you add a balloon explaining why?

@@ -2,7 +2,7 @@ import { Component, Input, OnChanges, OnDestroy, SimpleChanges } from '@angular/
import { QueryModel } from '../models';


const HIGHLIGHT = 200;
const HIGHLIGHT = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an unreachable value? Why is it the default? Do you need a default value at all?

Copy link
Contributor Author

@Meesch Meesch Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

}))
fields: highlightFields.map( function(field) {
return field.displayType == "text_content" && field.positionsOffsets && corpus.new_highlight ? // add matched_fields for stemmed highlighting ({ [field.name]: {"type": "fvh", "matched_fields": ["speech", "speech.stemmed"] }}):
({ [field.name]: {"type": "fvh", "matched_fields": ["speech", "speech.stemmed"] }}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard-coding the parliament fields.

@@ -41,7 +41,7 @@
</div>
<div *ngIf="tabIndex==0">
<ng-container *ngFor="let field of contentFields">
<div class="content" *ngIf="document.fieldValues[field.name]" [innerHtml]="document.fieldValues[field.name] | highlight:query"></div>
<div class="content" *ngIf="document.fieldValues[field.name]" [innerHtml]="highlightedInnerHtml(field)"></div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you did not adapt the highlight pipe for this? If so, I think the pipe is unused now and can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The highlight pipe is still used as a fallback option if es provides no highlight options (such as for the legacy corpora), as well as for the manual.

trim redundancies in highlight selector
make highlight selector consistent with other selectors

make highlight switches not trigger ParamsHaveChanged

make the highlight controller reactive to queries
add filter subscription

add and improve documentation
Revert "default highlight overhaul"

This reverts commit 2e5d016c0626471b9058e6b03a6c0778394c5a9d.

Revert "Revert "default highlight overhaul""

This reverts commit 569dcaa8768eebd81acf5527a6a1a4f3c11a65e4.

improve placement of checkHighlight
@Meesch Meesch force-pushed the feature/new_highlight_improvements branch from 3f9b459 to 816a490 Compare August 14, 2023 16:27
@Meesch Meesch requested a review from lukavdplas August 14, 2023 16:27
@Meesch
Copy link
Contributor Author

Meesch commented Aug 14, 2023

Alright I think I addressed all your comments. Please let me know what you think!

Copy link
Contributor

@BeritJanssen BeritJanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One css class can go, otherwise all seems sensible!

@Meesch Meesch merged commit 57419e4 into develop Aug 23, 2023
1 check passed
@Meesch Meesch deleted the feature/new_highlight_improvements branch August 23, 2023 11:46
@lukavdplas lukavdplas added the affects-elasticsearch-index changes that require re-indexing elasticsearch data label Sep 26, 2023
@lukavdplas lukavdplas added enhancement improvements to user functionality affects-deployment changes that require an update in the deployment module labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-deployment changes that require an update in the deployment module affects-elasticsearch-index changes that require re-indexing elasticsearch data enhancement improvements to user functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants