Create alternative mechanism for school search API #21717

drewsamnick · 2018-04-06T03:33:48Z

I've been experimenting with ways to improve the school search API and this is what I came up with. It allows matches on any subset of the search terms and matches against the school name, city, and zip code. The results are sorted by the number of terms that matched. Just playing around it seemed to die reasonable results. Even so, I didn't want to replace the current behavior. Instead I parameterized the API so that you can selectively use the old or the new approach. We may want to do an A/B test to see if the new version results in fewer 'school not found' selections without an negative impact on other metrics.

I haven't had a chance to do any investigation into the performance impact of this change. That should be investigated before an significant use in prod.

I tried a few other things before landing here. The call to MATCH(...) AGAINST ... returns the MySQL relevance score for the match. Initially, I tried just removing the +'s from the query to allow any terms to match and sorting by relevance. That didn't work well since it would double count terms. This was especially problematic when the same term appears in both the school name and the city, which is very common. That term get over weighted and the sort order is not very good. I tried mitigating this by creating indexes on just name and just city and matching those separately. In the end I determined that the relevance score isn't going to do what we want. In particular, it often gives different weights to different terms.

islemaster · 2018-04-06T18:54:50Z

dashboard/lib/autocomplete_helper.rb

@@ -15,6 +15,12 @@ def self.format_limit(limit)
    return [MIN_LIMIT, [limit.to_i, MAX_LIMIT].min].max
  end

+  def self.get_query_terms(query)
+    query.strip.split(/\s+/).map do |w|
+      w.gsub(/\W/, '').upcase.presence


You're stripping word characters from query terms here but I don't see a matching operation on the other side of the query - are all the names we're matching against also limited to word characters?

My concern is that we won't match, or will at least be biased against, schools with non-word characters in their names:

Hyphens (Jonesboro-Hodge High School)

Periods (St. Martin's Episcopal School)

Apostrophes (Waiʻanae High School)

Based on a little reading it looks like extended-set characters will be handled properly by the \W class, but probably worth having a test for. (Nānākuli High)

Ah, I also see you just moved this functionality, didn't add it. Still interested in your take though.

I had not thought about this. There are 17,744 schools with a non-word, non-whitespace character in the name or city. These are mostly - or ' with a few others thrown in(,, &, @,/, etc.) Poking around it seems like these are inconsistent in the data. For example, sometimes we have "WINSTON-SALEM" and sometimes "WINSTON SALEM" and sometimes I see "CHILDREN'S" and sometimes "CHILDRENS". That means that we won't get the right results if we just leave those non-word characters as literals in the search. Instead, it seems reasonable to split on non-word character as well as on whitespace. That means that if you search for "WINSTON-SALEM" we'll look for "WINSTON" and for "SALEM" which will match both "WINSTON-SALEM" and "WINSTON SALEM" Since this is an issue even in the original search, I am changing it in both and adding a few tests.

drewsamnick · 2018-04-11T18:41:08Z

Any other comments on this?

islemaster

Nope. LGTM!

Create alternative mechanism for school search API

2294670

drewsamnick requested review from ewjordan, islemaster, tanyaparker, sureshc and poorvasingal April 6, 2018 03:33

islemaster reviewed Apr 6, 2018

View reviewed changes

Drew Samnick added 2 commits April 8, 2018 06:49

Use parsed terms in where clause, not raw query input

efb4e9e

Split school search query on non-word charaters

a238ee1

islemaster approved these changes Apr 11, 2018

View reviewed changes

drewsamnick merged commit 6337783 into staging Apr 12, 2018

drewsamnick deleted the school-search-improvement branch April 12, 2018 03:44

breville mentioned this pull request Apr 12, 2018

Revert "Create alternative mechanism for school search API" #21823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create alternative mechanism for school search API #21717

Create alternative mechanism for school search API #21717

drewsamnick commented Apr 6, 2018

islemaster Apr 6, 2018

islemaster Apr 6, 2018

drewsamnick Apr 8, 2018

drewsamnick commented Apr 11, 2018

islemaster left a comment

Create alternative mechanism for school search API #21717

Create alternative mechanism for school search API #21717

Conversation

drewsamnick commented Apr 6, 2018

islemaster Apr 6, 2018

Choose a reason for hiding this comment

islemaster Apr 6, 2018

Choose a reason for hiding this comment

drewsamnick Apr 8, 2018

Choose a reason for hiding this comment

drewsamnick commented Apr 11, 2018

islemaster left a comment

Choose a reason for hiding this comment