Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC - ADR about future CASEI system when transitioned to ESDS #546

Closed
1 task
heidimok opened this issue Aug 10, 2023 · 4 comments
Closed
1 task

EPIC - ADR about future CASEI system when transitioned to ESDS #546

heidimok opened this issue Aug 10, 2023 · 4 comments
Assignees

Comments

@heidimok
Copy link
Contributor

heidimok commented Aug 10, 2023

Context

ADMG and CASEI are transitioning to ESDS. But on the technical side we aren't sure what implications this has for development goals in the next fiscal year (starting next PI) because it depends on what ESDS see as their future vision for CASEI. From early discussions, we believe that ESDS would ideally like to use CMR as the backend for the CASEI front end as they already maintain CMR. So in sprint FY23.4.1, @naomatheus and @praveenphatate did some initial research into CMR #539 resulting in a casei-cmr-explorer and document going over some of the CMR direct curl/http queries. However, we ultimately want to take this learning to get to some high level technical direction.

What we know:

  • ESDS would ideally like to use CMR as the CASEI backend
  • ADMG would want ESDS to ideally continue to provide all the unique metadata and front end that exists today
  • Future metadata curators (whether it's trained DAACs in the future or the current curation team) still need some interface or non-technical way to contribute their data that goes through an approval workflow.

Epic Acceptance Criteria

  • An ADR that weighs options for a future CASEI system maintained by ESDS

The ADR should:

  • Include possible options for a future CASEI system with pros and cons. Considerations:
    • Changes to the current system. Can we acknowledge whether we should still invest development time into refactoring any parts of the existing system? CASEI assessed for potential refactoring #529
    • Any new systems that need to be built - 'middle ware' between current system and CMR? Would we benefit from further prototyping Prototype of CASEI using CMR as backend #543 (or close this out)?
    • Tradeoffs to the curation process - E.g. Support for the complex approval workflow
    • Cost - developer time and effort
    • Maintainability of whatever we propose
  • Validate our research with a technical person at ESDS (Heidi can help connect with the right person via Stephanie)
  • Include recommendations from our current team's perspective (the ESDS team may take this and make other decisions but at least they will understand the tradeoffs).
  • Be written for an audience of both non-technical project leads on ADMG and ESDS + technical leads and developers on ESDS who are familiar with CMR but not CASEI

Deadline

This should be completed and shared by Friday Aug 25, 2023 (mid-sprint) so that Heidi can then engage in a discussion the following week during Deep Dive with Stephanie and include it into the ESDS<>ADMG transition wiki #538

@heidimok
Copy link
Contributor Author

Here's an example ADR from the front end on a different topic that I think could be a good template: https://github.com/NASA-IMPACT/admg-casei/blob/develop/docs/adr/0004-visual-testing.md

@naomatheus
Copy link
Collaborator

There's an unanswered question of how Gatsby GraphQL runs its queries.

TLDR; Even GatsbyV5 does not use the Relay compiler greater than v12 that would support "component by component" GraphQL queries. The only support we have for this in CASEI is the use of GraphQL Fragments. So there is a cause for concern that CMR would not respond to many concurrent requests from CASEI's build environment. Since CASEI would require querying CMR many times, there is a concern with concurrent requests not receiving a complete response and hobbling CASEI's build process.

Long version:
In Gatsby v5, the Relay compiler being used to process GraphQL queries appears to be limited to a version no newer than Relay 12.0.0. Consequently, Gatsby v5 does not support local reasoning. In other words, this version of Gatsby will run all of its GraphQL queries at once, instead of on a component-by-component basis.

This poses a concern that during CASEI's build time, CMR may not respond with the data being requested by CASEI if all requests for data from CMR are sent simultaneously.

Here's an excerpt from Relay's docs describing the compilation process:

  1. GraphQL text is extracted from source files and "parsed" into an intermediate representation (IR) using information from the schema.
  2. The set of IR documents forms a CompilerContext, which is then transformed and optimized.
  3. Finally, GraphQL is printed (e.g., to files, saved to a database, etc.), and any artifacts are generated.

More about Relay V12 used in GatsbyV5 can be found here.

Relay Compiler in v12: As of Relay v12, there has been a focus on "local reasoning," where queries are associated more closely with individual components. This is designed to improve performance by allowing more granular control over data fetching.

Gatsby's Behavior: If Gatsby v5 is using a version of Relay older than v12, or if it's not taking advantage of local reasoning features, then it's possible that all GraphQL queries would be run at once.

Impact on Build Time: If the server (CMR in this case) has limitations on concurrent requests or if the queries are interdependent.

CASEI and CMR Specifics: The real impact would be dependent on the specific queries being made, the amount of data being requested, the CMR's ability to handle concurrent requests, and any rate limiting that might be in place. If CASEI is making a large number of complex queries simultaneously, and there's no way to control that processing flow, and CMR has limitations on how many requests it can handle at once, then there would be a concern.

It might be beneficial to conduct some testing with the actual build process, queries, and environment to see if this concern is borne out in practice.

@edkeeble
Copy link
Contributor

I'm assuming Gatsby could also build using local json files as source data, so if CMR can't handle the concurrent requests, we could add a step prior to gatsby build, where we download all the necessary data from CMR into local json files, with appropriate delays between each request.

@naomatheus
Copy link
Collaborator

Plan to review the ADR and then complete the merge

@heidimok heidimok closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants