-
Notifications
You must be signed in to change notification settings - Fork 0
Add ADR for surfacing Primo CDI records in results #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: The proposed unified search interface would display records from Primo CDI (via Primo Search API) and Alma (via TIMDEX API) in the same results list. TIMDEX UI does not currently have a means to combine results from multiple APIs in this way. Relevant ticket(s): N/A How this addresses that need: This adds an ADR that outlines a proposed solution to this problem, by introducing a search orchestration layer that will handle API calls and results normalization. Side effects of this change: There are additional decisions to be made around the architecture of the search orchestrator, such as how to manage relevance normalization. These decisions are noted in the ADR and will be explored in future ADRs.
Pull Request Test Coverage Report for Build 16882158882Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in a position right now to indicate a vote or approval/change request on this text - I think we're in a discussion phase prior to concluding anything, so that's what I'm focusing on for now...
- While I agree with the prospect of building an orchestrator and proceeding with a simple interleaving logic for merging the two sets of results, I think formally I'd like this document to describe multiple options prior to stating the decision we're going with. The one clear alternative I see right now would be to harvest CDI records into TIMDEX, although I think others are implicitly referenced (continuing to display the two results separately without merging, building an external orchestrator in Python, building an internal orchestrator in Rails).
- The sequence diagram feels like it could be cleaner in its placement of the "User submits search query" phase - maybe that comes in from a User participant to the frontend, which then delegates the search to both TACOS and the orchestrator?
I'm also not sure whether any of the interactions in the diagram are blocking each other - it feels like the basic phases are meant to be:
-
User submits search query to TIMDEX UI
-
TIMDEX UI starts processing, doing internal things
-
TIMDEX UI issues the search to three targets: TACOS, TIMDEX API, and the CDI (these later two may be managed by the Search Orchestrator, while some other model in the application handled TACOS?)
-
As responses are received from the three external systems, some (TACOS) are displayed immediately, in parallel to the other streams. The Search Orchestrator, however, waits to respond until it has received responses from both TIMDEX API and the CDI. It interleaves those responses on its own before returning them for rendering (which is also a difference to TACOS, whose response is not modified in any way)
-
As the responses are processed in step 4, they are added to the DOM and shown to the user.
I think this is pretty much the same sequence you're proposing - so I think I'm ultimately wanting to have a diagram that makes clear what the dependencies are, and what might be happening in parallel. This might be my misunderstanding how to read a sequence diagram, however.
- I'm not sure whether the TIMDEX UI is the most appropriate place to consider how to facilitate computation access to CDI records. While it is definitely a concern for the Libraries, I think it makes sense to discuss that use case elsewhere? The choice to build an orchestrator or adopt a separate approach feels completely separate from that use case (although I might be missing something, so I'm not yet at a point of blocking it being here)
I think those are the thoughts which are most prominently occurring to me at the moment. I may have other comments in the next day or so, prior to our discussing this as a team.
Thanks for putting this together! I appreciate the work it took to assemble.
@matt-bernhardt Thank you so much for the prompt and thoughtful feedback! These are my initial thoughts as I mull it over this morning.
Love this idea, especially because the inclusion of an 'alternative options' list is consistent with how we write more complex ADRs. I'll restructure the document to formalize that.
Great point. As currently diagrammed, it looks like the user is submitting the query directly to the orchestrator. What I meant to convey was, as you said, the query handoff from UI to orchestrator.
Yeah, I see what you mean. I tried to indicate asynchronous interactions using dotted lines, but generally speaking, I wasn't sure about the clarity of the sequence diagram, so it's helpful to have that instinct affirmed. I wonder if a flow diagram would be more readable/useful?
When I wrote that, I was thinking less about TIMDEX UI and more about the whole TIMDEX ecosystem. Ostensibly, TIMDEX API is intended to facilitate computational access, and integrating CDI records into TIMDEX feels like an opportunity to consider if we can provide such access to CDI. However, I agree that computational access to CDI is out of scope for this ADR. As I consider it further, I might argue that it's out of scope of TIMDEX altogether, since TIMDEX API is primarily focused on our local collections. (Though, that has changed somewhat since GDT.) |
@matt-bernhardt I just pushed up a revision to incorporate some of your initial feedback. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to meet as a team to discuss these options now that we have this document as a starting point.
## Context | ||
|
||
The Libraries' unified search strategy calls for a discovery interface that surfaces results from | ||
both Primo Central Discovery Index (CDI) and Alma (via TIMDEX), replacing the current [Bento UI](https://github.com/MITLibraries/bento). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call it Primo CDI, but it's actually Ex Libris Central Discovery Index (CDI)
|
||
## Decision | ||
|
||
This approach aligns with the unified search strategy's goal to display all known results from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe you stated which approach was decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it does make things easier on me if I just leave that up to the reader...
Relevance normalization is a critical issue. We can begin with rank-based interleaving, but we | ||
should not assume this to be a long-term solution. | ||
|
||
We should connect with the MIT research community to determine whether computational access to CDI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly certain there is no legal way to obtain these without violating our contract to access the records in the first place. It would be better for us to look at what our user needs are more broadly and understand whether there is a CDI-alternative that would better meet their needs (either as part of TIMDEX or as a recommendation to use instead of TIMDEX)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's reasonable. I can zoom this out a bit.
|
||
### Cons | ||
|
||
- Requires runtime integration with Primo Search API, which may introduce latency or complexity. (We can mitigate this by implementing a caching strategy similar to that in Bento.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which may
should probably be which will
. We already know Primo API is extremely slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Probably wishful thinking on my part. Latency in Bento feels less awful since results are rendered independently.
- **Data availability**: Because Primo does not expose CDI records in OAI-PMH, we would need to harvest using the Primo Search API, making the process needlessly complex and perhaps impossible. | ||
- **Licensing**: Harvesting CDI records for TIMDEX likely has licensing implications. Ex Libris seems to discourage the practice, as Primo does not provide OAI-PMH support, and the Search API caps records per request at 5,000 via the [`offset` parameter](https://developers.exlibrisgroup.com/primo/apis/docs/primoSearch/R0VUIC9wcmltby92MS9zZWFyY2g=/#output:~:text=Note%3A%20The%20Primo%20search%20API%20has%20a%20hardcoded%20offset%20limitation%20parameter%20of%205000.). | ||
|
||
### Display separate result streams in tabbed views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a desire from the UX side to have tabs for "primo" and "timdex" even if we figure out how to merge all the results. We may want to update our decision to reflect having multiple tabs ("everything (which is what I believe the Orchestrator approach would provide), "primo (probably Alma + CDI)", "timdex (everything not in the primo tab)"
docs/architecture-decisions/0003-surface-primo-cdi-records-in-results.md
Show resolved
Hide resolved
While arguably an improvement on Bento, this design does not deliver the combined Alma/CDI results | ||
view as envisioned in the unified UI. | ||
|
||
### Implement external search orchestrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this document, but we'll likely need a separate ADR for the orchestrator if we agree that is the best path forward as I suspect there are a few solid directions to go (Rails, lambda, lambda feeding OpenSearch, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I talked a bit in Slack about deferred architectural decisions for the orchestrator, but I'm not sure it translated into the ADR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't caught up on Slack (still haven't fully!). We don't need to update this ADR to note there will be a different ADR with more details on whatever we choose... I was just saying it out loud :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you. It just felt useful to me to note it in the ADR, because I think at times we start developing before we're done documenting decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... that's a very good point.
We've decided to postpone this decision pending further exploration of the problem space. |
Why these changes are being introduced:
The proposed unified search interface would display records from Primo CDI (via Primo Search API) and Alma (via TIMDEX API) in the same results list. TIMDEX UI does not currently have a means to combine results from multiple APIs in this way.
Relevant ticket(s):
N/A
How this addresses that need:
This adds an ADR that outlines a proposed solution to this problem, by introducing a search orchestration layer that will handle API calls and results normalization.
Side effects of this change:
There are additional decisions to be made around the architecture of the search orchestrator, such as how to manage relevance normalization. These decisions are noted in the ADR and will be explored in future ADRs.
Developer
Accessibility
New ENV
Approval beyond code review
Additional context needed to review
There's some hand-waving regarding implementation details. From my perspective, it seemed adequate to delay these decisions until we start working on the orchestrator, but I would be happy to modify the ADR to include more detail if that would be useful.
Code Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing