Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][#30] Fix i14y querying #43

Closed
wants to merge 1 commit into from
Closed

[WIP][#30] Fix i14y querying #43

wants to merge 1 commit into from

Conversation

viclim
Copy link
Contributor

@viclim viclim commented Apr 29, 2018

Fix #30.

  • Use /search/docs as the path for I14ySearch.
  • Includes tbs and sort_by into @search_options. This will be used to send to i14y service.
  • Disable re-recording of application_document_title cassettes. Why do these need to be rerecorded every 2 months anyway?

@nickmarden

@viclim viclim changed the title [Issue 30] Fix i14y querying [WIP] [Issue 30] Fix i14y querying Apr 29, 2018
@viclim viclim changed the title [WIP] [Issue 30] Fix i14y querying [#30] Fix i14y querying Apr 29, 2018
@@ -2,12 +2,14 @@ class I14ySearch < FilterableSearch
include SearchInitializer
include Govboxable
I14Y_SUCCESS = 200
attr_reader :collection
attr_reader :collection,
:dc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MothOnMars does the i14y search interface support the dc attribute? I'm concerned that @viclim might be sending an attribute that will be silently dropped on the floor without the expected behavior.

Copy link
Contributor Author

@viclim viclim Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickmarden I think you're right about dc being a useless attribute. As shown here, all the affiliate's handles are being sent to the i14y service.

So, what's the relationship between a DocumentCollection and an I14yDrawer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclim a DocumentCollection is an override for the default search scope set in the domains list. The folders specified in a DocumentCollection may or may not be held in an i14y drawer, they may be over in the SearchGov index.

The DocumentCollection id can be sent in as a param, to tell the system which override to use....

search.usa.gov/search?affiliate=foo&query=bar -> this uses the Domains list as the scope for the query
search.usa.gov/search/docs?affiliate=foo&dc=12345&query=bar -> this uses the list of URL path prefixes listed in the DocumentCollection 12345

In either of these cases, the source index may be
BingV6 (if search_engine=BingV6)
OR
SearchGov + i14y Drawer(s) (if search_engine=SearchGov AND gets_i14y_results=true)
OR
SearchGov (if search_engine=SearchGov AND gets_i14y_results=false)
OR
i14y Drawer(s) (if search_engine=BingV6 AND gets_i14y_results=true)

SearchGov is itself an i14y drawer, but related to the affiliate as a search engine, not as an i14y drawer.

@@ -105,6 +105,9 @@
options[:re_record_interval] = nil if /bing.?v5/i === name
options[:re_record_interval] = nil if /google|gss/i === name

# disable re-recording application_document_title cassettes which relies on tika server.
options[:re_record_interval] = nil if /application_document_title/ === name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MothOnMars could you weigh on whether you want this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickmarden @viclim , the short answer is yes, but we should do this a different way.

The backstory on re-recording cassettes is that we originally decided to re-record everything monthly in case there were any external API changes that we weren't aware of. That was later bumped to bi-monthly, and I recently turned off re-recording entirely for cucumber tests. At this point, I'm inclined to turn off re-recording for everything but BingV6-related specs, but I would like to do that in a separate commit (on its way).

@nickmarden
Copy link
Contributor

@viclim I'm not sure that this PR fixes the issues that #30 describes. In that issue it is stated that the two issues to be solved are:

So there are actually two bugs here:

one in our construction of the filter links for docs
one preventing the time filter params from being used in collection searches.

To make sure I understand this PR better, could you test the following and report the results here (or fix your PR as needed):

  1. Whether the "path bug" (/search instead of /search/docs) also occurs for non-i14y searches as well? (the initial bug report seemed to indicated that this path bug was not i14y-specific)
  2. Whether the new parameters you are passing to i14y in this PR are actually used by i14y?
  3. Whether your PR branch preserved time filter params for both i14y and non-i14y docs searches?

@viclim
Copy link
Contributor Author

viclim commented Apr 30, 2018

@nickmarden

  1. The only time when the time filter bar is visible under /search/docs is when search is a FilterableSearch, and I14ySearch is the only FilterableSearch used in /search/docs. So I don't think the path bug occurs for non-i14y searches.

  2. It's not being used. Will need to correct that. We need a way to know which handle to use for which document collection though.

  3. Only i14y doc search preserved time filter params. Non-i14y docs search has its own link generator, which doesn't include time filter params in the output.

@viclim viclim changed the title [#30] Fix i14y querying [WIP][#30] Fix i14y querying Apr 30, 2018
I14ySearch should return docs_search_path, include :dc in search params, :sort_by and :tbh in filter params.
Disable recording of application_document_title cassettes
@dawnpm
Copy link
Collaborator

dawnpm commented Apr 30, 2018

We definitely need the dc parameter to be retained and used. Otherwise there won’t be a way for us to scope searches within the searchgov index for a given collection.

@dawnpm
Copy link
Collaborator

dawnpm commented Apr 30, 2018

Regarding which drawer handle to use, couldn’t we leverage the method that @MothOnMars established when we introduced i14y-supported collection searches?

@nickmarden
Copy link
Contributor

We definitely need the dc parameter to be retained and used. Otherwise there won’t be a way for us to scope searches within the searchgov index for a given collection.

@dawnpm what is that comment in response to?

@nickmarden
Copy link
Contributor

nickmarden commented Apr 30, 2018

Only i14y doc search preserved time filter params. Non-i14y docs search has its own link generator, which doesn't include time filter params in the output.

@viclim I am having a verb problem with "[o]nly i14y doc search preserved time filter params". Did you mean "only i14y doc search should preserve time filter params", or did you really mean (in the past tense) "before this issue was reported, i14y doc search was already preserving time filter params"?

The first interpretation seems reasonable; the second one seems impossible because we wouldn't be having this discussion if i14y doc search already preserved these params.

Sorry for being a grammar nit but I'm trying to make sure I understand your point 3.

@dawnpm
Copy link
Collaborator

dawnpm commented Apr 30, 2018

@nickmarden it was in response to the comments in #43 (comment), but I was trying to reply on my phone, and the comment threading didn't work the way it looked like it would.

@viclim
Copy link
Contributor Author

viclim commented Apr 30, 2018

Only i14y doc search preserved time filter params. Non-i14y docs search has its own link generator, which doesn't include time filter params in the output.

@viclim I am having a verb problem with "[o]nly i14y doc search preserved time filter params". Did you mean "only i14y doc search should preserve time filter params", or did you really mean (in the past tense) "before this issue was reported, i14y doc search was already preserving time filter params"?

The first interpretation seems reasonable; the second one seems impossible because we wouldn't be having this discussion if i14y doc search already preserved these params.

Sorry for being a grammar nit but I'm trying to make sure I understand your point 3.

The first interpretation is what I meant.

@nickmarden
Copy link
Contributor

@MothOnMars after a bunch of discussion I believe we arrived at the following:

  • The only /search/docs search type that supports filtering (i.e. is_a? FilterableSearch) is I14ySearch, so @viclim added support to remember since_date, until_date, sort_by and tbs to those searches
  • The path for filterable i14y searches is now/search/docs instead of /search

Could you take a look at the PR? Let me know if you want me to deploy it somewhere for smoke testing.

@viclim
Copy link
Contributor Author

viclim commented Apr 30, 2018

@nickmarden @dawnpm

Regarding which drawer handle to use, couldn’t we leverage the method that @MothOnMars established when we introduced i14y-supported collection searches?

You're absolute right about this. I didn't notice this, which append the URL prefixes from DocumentCollection to the query params, and send it to the i14y API. This means the scoping by DocumentCollection is already in place without any further changes.

@dawnpm
Copy link
Collaborator

dawnpm commented Apr 30, 2018

The path for filterable i14y searches is now /search/docs instead of /search

Does this mean that i14y searches on /search will not be filterable? We want i14y searches to be filterable on both /search and /search/docs.

@nickmarden
Copy link
Contributor

nickmarden commented Apr 30, 2018

@dawnpm I think we've gotten to the heart of this issue and why the path bug exists to begin with.

@viclim I think that the change you made to force i14y searches to link to /search/docs is actually incorrect. We want the link to i14y searches to be /search/docs only if the previous i14y search that we are currently handling was a /search/docs search. 🍝

In other words asking "what path should we generate for the search?" (e.g. path_for_i14y_search) cannot be answered in the helper only by looking at the type of search that is being made, but also by looking at the context of the current SERP path (/search vs /search/docs) in which the results are being shown.

I think that's the case anyway. @MothOnMars your thoughts?

@viclim if I'm right, I look forward to an elegant solution to that problem :trollface:

@MothOnMars
Copy link
Contributor

@nickmarden @viclim , I'm going to dive a bit deeper into this today to make sure we're not breaking anything. I agree with the statement that "want the link to i14y searches to be /search/docs only if the previous i14y search that we are currently handling was a /search/docs search." I also plan to put in a cucumber test to cover this scenario, because our collections/filtered searches and SERP views are brittle enough to need plenty of integration coverage.

@MothOnMars
Copy link
Contributor

Closing this up. I have a different fix for this.

@MothOnMars MothOnMars closed this May 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants