Skip to content

Conversation

jazairi
Copy link
Contributor

@jazairi jazairi commented Aug 7, 2025

Why these changes are being introduced:

The proposed unified search interface would display records from Primo CDI (via Primo Search API) and Alma (via TIMDEX API) in the same results list. TIMDEX UI does not currently have a means to combine results from multiple APIs in this way.

Relevant ticket(s):

N/A

How this addresses that need:

This adds an ADR that outlines a proposed solution to this problem, by introducing a search orchestration layer that will handle API calls and results normalization.

Side effects of this change:

There are additional decisions to be made around the architecture of the search orchestrator, such as how to manage relevance normalization. These decisions are noted in the ADR and will be explored in future ADRs.

Developer

Accessibility
  • ANDI or WAVE has been run in accordance to our guide.
  • This PR contains no changes to the view layer.
  • New issues flagged by ANDI or WAVE have been resolved.
  • New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
  • No new accessibility issues have been flagged.
New ENV
  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.
Approval beyond code review
  • UXWS/stakeholder approval has been confirmed.
  • UXWS/stakeholder review will be completed retroactively.
  • UXWS/stakeholder review is not needed.
Additional context needed to review

There's some hand-waving regarding implementation details. From my perspective, it seemed adequate to delay these decisions until we start working on the orchestrator, but I would be happy to modify the ADR to include more detail if that would be useful.

Code Reviewer

Code
  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.
Documentation
  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.
Testing
  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

Why these changes are being introduced:

The proposed unified search interface would display records from Primo
CDI (via Primo Search API) and Alma (via TIMDEX API) in the same results
list. TIMDEX UI does not currently have a means to combine results
from multiple APIs in this way.

Relevant ticket(s):

N/A

How this addresses that need:

This adds an ADR that outlines a proposed solution to this problem,
by introducing a search orchestration layer that will handle API
calls and results normalization.

Side effects of this change:

There are additional decisions to be made around the architecture of
the search orchestrator, such as how to manage relevance normalization.
These decisions are noted in the ADR and will be explored in future ADRs.
@mitlib mitlib temporarily deployed to timdex-ui-pi-cdi-adr-fi74pmjx7 August 7, 2025 14:50 Inactive
@coveralls
Copy link

coveralls commented Aug 7, 2025

Pull Request Test Coverage Report for Build 16882158882

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.2%) to 98.49%

Files with Coverage Reduction New Missed Lines %
app/controllers/search_controller.rb 1 98.11%
Totals Coverage Status
Change from base Build 16027462749: -0.2%
Covered Lines: 587
Relevant Lines: 596

💛 - Coveralls

Copy link
Member

@matt-bernhardt matt-bernhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in a position right now to indicate a vote or approval/change request on this text - I think we're in a discussion phase prior to concluding anything, so that's what I'm focusing on for now...

  • While I agree with the prospect of building an orchestrator and proceeding with a simple interleaving logic for merging the two sets of results, I think formally I'd like this document to describe multiple options prior to stating the decision we're going with. The one clear alternative I see right now would be to harvest CDI records into TIMDEX, although I think others are implicitly referenced (continuing to display the two results separately without merging, building an external orchestrator in Python, building an internal orchestrator in Rails).

  • The sequence diagram feels like it could be cleaner in its placement of the "User submits search query" phase - maybe that comes in from a User participant to the frontend, which then delegates the search to both TACOS and the orchestrator?

I'm also not sure whether any of the interactions in the diagram are blocking each other - it feels like the basic phases are meant to be:

  1. User submits search query to TIMDEX UI

  2. TIMDEX UI starts processing, doing internal things

  3. TIMDEX UI issues the search to three targets: TACOS, TIMDEX API, and the CDI (these later two may be managed by the Search Orchestrator, while some other model in the application handled TACOS?)

  4. As responses are received from the three external systems, some (TACOS) are displayed immediately, in parallel to the other streams. The Search Orchestrator, however, waits to respond until it has received responses from both TIMDEX API and the CDI. It interleaves those responses on its own before returning them for rendering (which is also a difference to TACOS, whose response is not modified in any way)

  5. As the responses are processed in step 4, they are added to the DOM and shown to the user.

I think this is pretty much the same sequence you're proposing - so I think I'm ultimately wanting to have a diagram that makes clear what the dependencies are, and what might be happening in parallel. This might be my misunderstanding how to read a sequence diagram, however.


  • I'm not sure whether the TIMDEX UI is the most appropriate place to consider how to facilitate computation access to CDI records. While it is definitely a concern for the Libraries, I think it makes sense to discuss that use case elsewhere? The choice to build an orchestrator or adopt a separate approach feels completely separate from that use case (although I might be missing something, so I'm not yet at a point of blocking it being here)

I think those are the thoughts which are most prominently occurring to me at the moment. I may have other comments in the next day or so, prior to our discussing this as a team.

Thanks for putting this together! I appreciate the work it took to assemble.

@jazairi
Copy link
Contributor Author

jazairi commented Aug 8, 2025

@matt-bernhardt Thank you so much for the prompt and thoughtful feedback! These are my initial thoughts as I mull it over this morning.

While I agree with the prospect of building an orchestrator and proceeding with a simple interleaving logic for merging the two sets of results, I think formally I'd like this document to describe multiple options prior to stating the decision we're going with.

Love this idea, especially because the inclusion of an 'alternative options' list is consistent with how we write more complex ADRs. I'll restructure the document to formalize that.

The sequence diagram feels like it could be cleaner in its placement of the "User submits search query" phase - maybe that comes in from a User participant to the frontend, which then delegates the search to both TACOS and the orchestrator?

Great point. As currently diagrammed, it looks like the user is submitting the query directly to the orchestrator. What I meant to convey was, as you said, the query handoff from UI to orchestrator.

I think this is pretty much the same sequence you're proposing - so I think I'm ultimately wanting to have a diagram that makes clear what the dependencies are, and what might be happening in parallel.

Yeah, I see what you mean. I tried to indicate asynchronous interactions using dotted lines, but generally speaking, I wasn't sure about the clarity of the sequence diagram, so it's helpful to have that instinct affirmed. I wonder if a flow diagram would be more readable/useful?

I'm not sure whether the TIMDEX UI is the most appropriate place to consider how to facilitate computation access to CDI records.

When I wrote that, I was thinking less about TIMDEX UI and more about the whole TIMDEX ecosystem. Ostensibly, TIMDEX API is intended to facilitate computational access, and integrating CDI records into TIMDEX feels like an opportunity to consider if we can provide such access to CDI. However, I agree that computational access to CDI is out of scope for this ADR. As I consider it further, I might argue that it's out of scope of TIMDEX altogether, since TIMDEX API is primarily focused on our local collections. (Though, that has changed somewhat since GDT.)

@jazairi jazairi temporarily deployed to timdex-ui-pi-cdi-adr-fi74pmjx7 August 11, 2025 13:05 Inactive
@jazairi
Copy link
Contributor Author

jazairi commented Aug 11, 2025

@matt-bernhardt I just pushed up a revision to incorporate some of your initial feedback. Let me know what you think.

Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to meet as a team to discuss these options now that we have this document as a starting point.

## Context

The Libraries' unified search strategy calls for a discovery interface that surfaces results from
both Primo Central Discovery Index (CDI) and Alma (via TIMDEX), replacing the current [Bento UI](https://github.com/MITLibraries/bento).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it Primo CDI, but it's actually Ex Libris Central Discovery Index (CDI)


## Decision

This approach aligns with the unified search strategy's goal to display all known results from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe you stated which approach was decided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it does make things easier on me if I just leave that up to the reader...

Relevance normalization is a critical issue. We can begin with rank-based interleaving, but we
should not assume this to be a long-term solution.

We should connect with the MIT research community to determine whether computational access to CDI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain there is no legal way to obtain these without violating our contract to access the records in the first place. It would be better for us to look at what our user needs are more broadly and understand whether there is a CDI-alternative that would better meet their needs (either as part of TIMDEX or as a recommendation to use instead of TIMDEX)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's reasonable. I can zoom this out a bit.


### Cons

- Requires runtime integration with Primo Search API, which may introduce latency or complexity. (We can mitigate this by implementing a caching strategy similar to that in Bento.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which may should probably be which will. We already know Primo API is extremely slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Probably wishful thinking on my part. Latency in Bento feels less awful since results are rendered independently.

- **Data availability**: Because Primo does not expose CDI records in OAI-PMH, we would need to harvest using the Primo Search API, making the process needlessly complex and perhaps impossible.
- **Licensing**: Harvesting CDI records for TIMDEX likely has licensing implications. Ex Libris seems to discourage the practice, as Primo does not provide OAI-PMH support, and the Search API caps records per request at 5,000 via the [`offset` parameter](https://developers.exlibrisgroup.com/primo/apis/docs/primoSearch/R0VUIC9wcmltby92MS9zZWFyY2g=/#output:~:text=Note%3A%20The%20Primo%20search%20API%20has%20a%20hardcoded%20offset%20limitation%20parameter%20of%205000.).

### Display separate result streams in tabbed views
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a desire from the UX side to have tabs for "primo" and "timdex" even if we figure out how to merge all the results. We may want to update our decision to reflect having multiple tabs ("everything (which is what I believe the Orchestrator approach would provide), "primo (probably Alma + CDI)", "timdex (everything not in the primo tab)"

While arguably an improvement on Bento, this design does not deliver the combined Alma/CDI results
view as envisioned in the unified UI.

### Implement external search orchestrator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this document, but we'll likely need a separate ADR for the orchestrator if we agree that is the best path forward as I suspect there are a few solid directions to go (Rails, lambda, lambda feeding OpenSearch, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I talked a bit in Slack about deferred architectural decisions for the orchestrator, but I'm not sure it translated into the ADR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't caught up on Slack (still haven't fully!). We don't need to update this ADR to note there will be a different ADR with more details on whatever we choose... I was just saying it out loud :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you. It just felt useful to me to note it in the ADR, because I think at times we start developing before we're done documenting decisions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... that's a very good point.

@jazairi jazairi temporarily deployed to timdex-ui-pi-cdi-adr-fi74pmjx7 August 11, 2025 13:53 Inactive
@jazairi jazairi temporarily deployed to timdex-ui-pi-cdi-adr-fi74pmjx7 August 11, 2025 13:55 Inactive
@jazairi
Copy link
Contributor Author

jazairi commented Aug 12, 2025

We've decided to postpone this decision pending further exploration of the problem space.

@jazairi jazairi closed this Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants