Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/pass-query-to-event-page #137

Merged

Conversation

tohuynh
Copy link
Collaborator

@tohuynh tohuynh commented Dec 4, 2021

Link to Relevant Issue

This pull request resolves #123

Description of Changes

  • Highlight the gram with the highest value and/or user query in the exceprt portion of the event card.
  • Pass the user query to the event page (after the user click on an event card from a search results page). And populate the user query in the search bar in "Search transcript".
  • The "searching" is done through stem matching. A sentence is a search result if the stems of the sentence and the stems of the user query have at least one common element.
  • The searching now doesn't happen instantly as the user types in the search bar, they have to press enter to get search results.
  • The highlighting appears on the search results and will highlight any stems or tokens of the user query.

Link to Forked Storybook Site

https://tohuynh.github.io/cdp-frontend/?path=/story/library-cards-event--meeting-search-result (highlight even card)

@tohuynh tohuynh added the enhancement New feature or request label Dec 4, 2021
@tohuynh tohuynh self-assigned this Dec 4, 2021
@evamaxfield
Copy link
Member

So this all looks very good. I have a few comments:

  1. Any idea why the highlight appears on some of the results but not all:
    image
    (using seattle staging data, query = "Renters rights and rent control")

  2. It may be out of the scope of this PR, but because we are changing the behavior of the search transcript here, I am tempted to ask for another feature which is "strict match" -- if the user puts their search in quotes, "renters rights", return only the sentences that have that substring exactly. Google does this but to ensure users know we can do it too, we should probably have a little <sub><em> string help text underneath the search transcript that says: "Use quotes ("") for exact match"? -- I would love this for the general event search too so I am going to work on that problem right now.

@tohuynh
Copy link
Collaborator Author

tohuynh commented Dec 4, 2021

Any idea why the highlight appears on some of the results but not all: (using seattle staging data, query = "Renters rights and rent control")

The selected gram was "rent control" and the selected context span was ""We know 25% rent increases are Unconsciousable and Seattle needs...". For highlighting in the event card, it will highlight "rent control" or "Renters rights and rent control". (I didn't tokenize these two phrases for highlighting)

It may be out of the scope of this PR, but because we are changing the behavior of the search transcript here, I am tempted to ask for another feature which is "strict match" -- if the user puts their search in quotes, "renters rights", return only the sentences that have that substring exactly. Google does this but to ensure users know we can do it too, we should probably have a little string help text underneath the search transcript that says: "Use quotes ("") for exact match"? -- I would love this for the general event search too so I am going to work on that problem right now.

Agree. I think the strict match feature is very common amongst searchers. You might not even need the helper text.

I think you only need to change something in here:

const visibleSentences = useMemo(() => {
   
    if (!searchedTerm.trim()) {
      return sentences;
    }
   //if is strict match, use /utils/isSubstring to return visible sentences

    const cleanedQuery = cleanText(searchedTerm);
    const tokenizedQuery = removeStopwords(cleanedQuery.split(" "));
    if (!cleanedQuery || tokenizedQuery.length === 0) {
      // empty query or no valid tokens to search
      return [];
    }
    const stemmedQuery = tokenizedQuery.map((token) => stem(token).toLowerCase());
    return sentences.filter((_, i) => stemmedQuery.some((q) => stemmedSentences[i].has(q)));
  }, [sentences, stemmedSentences, searchedTerm]);

@tohuynh
Copy link
Collaborator Author

tohuynh commented Dec 4, 2021

Also should we use weight datetme relevance here?:

//use user's selected ranking algo or the default value ?
const matchingGramWithHighestValue = maxBy(matchingIndexedEventGrams, "value");

we might need to pass an optional sorting option to eventSearchService.searchEvents

@evamaxfield
Copy link
Member

Made a new issue for search operators for general search. #138

The selected gram was "rent control" and the selected context span was ""We know 25% rent increases are Unconsciousable and Seattle needs...". For highlighting in the event card, it will highlight "rent control" or "Renters rights and rent control". (I didn't tokenize these two phrases for highlighting)

Hmmmm dang. I think it would be better if rent to was highlighted there. So maybe just highlight any of the search terms in the context span ignore casing and punctuation?


Yes to strict matching. Agree it should be easy to do for the intra-event search but the general search is a bit of a different challenge.

@evamaxfield
Copy link
Member

Also should we use weight datetme relevance here?:

//use user's selected ranking algo or the default value ? const matchingGramWithHighestValue = maxBy(matchingIndexedEventGrams, "value");

we might need to pass an optional sorting option to eventSearchService.searchEvents

So the datetime weighted is useful for search but that function is asking for "what is the most valuable search gram used in the search for this document".. Keep it as it is, its useful for context not search if that makes sense?

@tohuynh
Copy link
Collaborator Author

tohuynh commented Dec 4, 2021

Hmmmm dang. I think it would be better if rent to was highlighted there. So maybe just highlight any of the search terms in the context span ignore casing and punctuation?

Sure, but the real fix would be to get the selected gram and context span in sync right?

@evamaxfield
Copy link
Member

Sure, but the real fix would be to get the selected gram and context span in sync right?

Yes and no... it's a hack because it's hard to find the right part of the context span to pull from: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/pipeline/event_index_pipeline.py#L237

The full sentence found is:

We know 25% rent increases are Unconsciousable and Seattle needs rent control with no corporate loopholes which is why my office alongside Renters rights advocates has prepared rent control legislation that if passed will go into effect the moment state legislators tend 40 year ban on any regulation of rent and if this Council does pass rent control then the Democrats in Lim who are in a majority in both houses and occupy the government will no longer be able to ignore the rent control ban they have maintained in place for decades.

Which has rent control in it but just later and the similarity match for getting the context span simply found the wrong section of the sentence.

@tohuynh
Copy link
Collaborator Author

tohuynh commented Dec 4, 2021

New commit does this:

So maybe just highlight any of the search terms in the context span ignore casing and punctuation?

Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for making the change.

Want to include the strict match in transcript search in this PR or no?

@tohuynh
Copy link
Collaborator Author

tohuynh commented Dec 4, 2021

Looks good to me. Thanks for making the change.

Want to include the strict match in transcript search in this PR or no?

Let's wait for the other PR.

@tohuynh tohuynh merged commit a51a70d into CouncilDataProject:main Dec 4, 2021
@tohuynh tohuynh deleted the feature/pass-query-to-event-page branch December 29, 2021 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants