Add storage finder facets and update CI workflow permissions#281
Add storage finder facets and update CI workflow permissions#281s-sajid-ali merged 49 commits intomainfrom
Conversation
|
|
While the initial implementation of the storage finder was done with data from the CSV sheet, the generated JSON file was edited (for instance in commit d23fe45). Per claude, here's the summary of changes (that need to be made in the CSV file): https://gist.github.com/s-sajid-ali/33ce8a6488db28582d7ccba462e46bff |
3af04ac to
7d91a7b
Compare
Add synchronous access and alumni access facets to the storage finder configuration. Update workflow to include explicit permissions for improved security. Regenerate data files and update dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…et tree Improved clarity of access permission descriptions in config and reorganized the facet tree with numeric IDs, added contextual descriptions for risk classification and affiliation questions, and reordered questions for better user flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restore descriptions from main branch for "What is the risk classification of your data?" and "What is your University affiliation?" facets to improve user guidance in the storage finder UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7d91a7b to
14978d6
Compare
|
Deleted the |
|
Per analysis by @Amanda-dong: 7fdcefd removed |
Adds a new "From where will the data be accessed?" question with four choices (VPN, Public Cloud, Off Campus, Browser GUI), driven by the new "Access locations" CSV column. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The access-location facet was missing a corresponding field definition, so access location data was not included in service records' field_data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace loose word-boundary patterns with patterns that require each keyword to appear as a standalone comma-separated item, preventing "VPN" from matching within embedded text and ensuring any combination of access locations is handled correctly without hardcoding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Google Sheet export includes a line break inside the column header "Access locations (VPN, Public Cloud, \nOff Campus, Browser GUI)", causing row lookups to return undefined for every service and triggering the fallback: "all" for all access-location facets regardless of actual data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Removed the option |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Correct, it is not indexed by the client side search engine we currently use and I don't think there's a way to add non-local URLs for that search index to crawl. We could do that if we switch to Algolia (in #298) by adding the source URL for the Google Sheet. Or we convert that sheet to markdown and add a new page at
I'm okay with that. @genericdata : What did we mean for this column to indicate originally?
Allowed that facet for Ceph for consistency and yes, we'll have to point to the access policy somewhere. |
…alues Update anchor-based regex matchers to handle leading/trailing whitespace and newlines in spreadsheet cell values. Adds multiline flag (m) so ^/$ match line boundaries, and adds \s* around anchors to absorb surrounding whitespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e suffixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…capacity matchers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I mean that you can't find this page with the search, not the Google sheet. "data finder" or "storage" should probably point to the storage finder. What does "alumni access" mean? That it is possible for an alumni to have access if they get a researcher sponsor? In this case the answer is "yes" for a lot more options (S3, HPC RPS, Data Lake, probably Google Shared Drive and Research Workspace too) |
I'll take a look at indexing that page.
I agree and have removed it now. I was mainly focused on moving to ingesting the data from the Google sheet that I didn't really think about which data made sense to move. |
|
I think something happened with the risk rating, the table now only shows "Storable Files: High" which is not as clear as the previous "Storable Files: High, Moderate, & Low Risk". The word "risk" should be present. |
|
A lot of other facets lost details, such as "backup", which changed for Box from "Retains up to 100 previous versions of a single file" to "yes" (lost details) and for S3 from "available for additional cost" to "yes" (incorrect) for example. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cation to include all lower tiers
|
Added all the text for the backup options that was removed, updated the descriptions for storable files to be explicit rather than implicit. Thanks for catching these issues! |
|
Merging this as it is in a usable state with updated information (removed deprecated, retired services and added new offerings), remaining issues to be fixed in subsequent PRs. |
This PR updates the implementation to populate the datafinder data with to account for all facets, updates the CI workflow permissions and on a related note the source Google Sheet URL has also been updated.