Skip to content

Comments

SOLR-16825: Migrate v2 definitions to 'api' module, pt 6#1978

Merged
gerlowskija merged 6 commits intoapache:mainfrom
gerlowskija:SOLR-16825-get-schema-api-final
Oct 11, 2023
Merged

SOLR-16825: Migrate v2 definitions to 'api' module, pt 6#1978
gerlowskija merged 6 commits intoapache:mainfrom
gerlowskija:SOLR-16825-get-schema-api-final

Conversation

@gerlowskija
Copy link
Contributor

@gerlowskija gerlowskija commented Oct 4, 2023

https://issues.apache.org/jira/browse/SOLR-16825

Description

SOLR-16825 added a new gradle module, 'api', which holds v2 API definitions as interfaces. This allows us to generate an OAS (and SolrRequest implementations from that) as a part of the solrj build.

But these artifacts (the OAS and generated Java code), only cover the v2 APIs that have interfaces in the 'api' module. We need to extract interfaces to live in 'api' for each v2 API in 'core' that doesn't already have one.

Solution

This PR creates 'api' interfaces for a number of v2 APIs, allowing SolrRequest implementations to be generated for them. The following APIs are covered in this PR:

  • get (entire) schema
  • get schema version
  • get schema name
  • get schema similarity
  • get schema uniquekey
  • get schema zkversion
  • delete shard

Tests

PR is a refactor, so doesn't add any additional tests. But manual testing has been done to make sure the affected v2 APIs continue to work, and existing tests continue to pass.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.

@gerlowskija
Copy link
Contributor Author

❗ ❗ Naming Emergency ❗ ❗

Many v2 APIs are currently offered at two paths based on whether a user is providing a core or a collection name. For example, schema APIs are available at both /collections/techproducts/schema and /cores/techproducts_shard1_replica_n1/schema

Trying to define these APIs concisely puts us in the awkward place of needing a generic term for either a core or a collection.

So far in this PR I've made due with "index", but I'm not terribly happy with it. This choice appears in a few places:

  • JAX-RS annotations e.g. @Path("/{indexType:collections|cores}/{indexName}/schema")
  • Committed Java code (mostly constants) e.g. public static final String INDEX_NAME_PATH_PARAMETER = "indexName";
  • Generated SolrRequest classes e.g. public GetSchemaName(IndexType indexType, String indexName) { ... }

It's a somewhat consequential naming decision, as it'll recur in the generated SolrJ code for most of our per core-or-collection APIs: /schema, /config, /select, /update, etc.

Do we already have some generic term to refer to a core or collection? Does anyone have any good ideas?

@epugh
Copy link
Contributor

epugh commented Oct 4, 2023

corllection ? Or, could we use the word collection everywhere, and then just under the covers, if it's a Core then do something smart? Or, do we need actually seperate routes? Would there be some value in /collection/bob/schema being a totally different url from /core/bob/schema? Would having them different open up the door to some how better/cleaner routes?

@epugh
Copy link
Contributor

epugh commented Oct 4, 2023

You know, if you want a single term in place of "core" and "collection" in Apache Solr's RESTful URLs, here are some potential terms you could consider:

Index
Repository
Dataset
Store
Source
Resource
Data
Container
Unit
Entity

These terms can convey the idea of a logical grouping or container for data within Solr, similar to the concept of a core or collection.

@epugh
Copy link
Contributor

epugh commented Oct 4, 2023

^ above was generated ;-). I never write like that. I think I like anything BUT "index" because I think that is an overloaded term that has specific meaning the closer to disk you get.. A Collection is NEVER a index. Resource? Container?

@epugh
Copy link
Contributor

epugh commented Oct 4, 2023

Index: An index is a common term used in information retrieval systems to represent a structured collection of data for efficient searching and retrieval. It implies a storage structure that organizes and facilitates quick access to the data.

Repository: A repository is a centralized storage location that holds a collection of data or resources. It suggests a place where data is stored and managed, implying a sense of organization and structure.

Dataset: A dataset refers to a collection of related data that is treated as a single unit. It implies a cohesive and self-contained set of data that can be queried and analyzed collectively.

Store: A store is a generic term that signifies a place where items or data are kept. It implies a central location where data is stored and accessed.

Source: Source implies the origin or provider of data. It suggests a place where data is obtained, managed, and made available for querying and retrieval.

Resource: Resource is a broad term that encompasses data or information that can be accessed, utilized, or manipulated. It conveys the idea of a pool of data that can be interacted with.

Data: Data is a straightforward term that represents information or facts. It conveys the notion of a collection of structured or unstructured information that can be stored and processed.

Container: A container is a metaphorical term indicating a logical enclosure or compartment that holds and organizes items. It suggests a grouping mechanism that holds data in a cohesive manner.

Unit: Unit implies a self-contained entity or component. It suggests a discrete and independent part that can be managed and accessed as a single entity.

Entity: Entity refers to a distinct and individual object or unit. It implies an identifiable element that can be stored, queried, and manipulated within the system.

@gerlowskija
Copy link
Contributor Author

Haha, someone's enjoying their LLM's this morning 😛

"Store" is a tad generic for my tastes, but I like it better than "index" I think? We would have "store type" (which could either be 'collection' or 'core') and "store name" (which would be the actual name of the collection/core).

@HoustonPutman
Copy link
Contributor

I'm also cool with resourceType and resourceName, but store is cool too.

@epugh
Copy link
Contributor

epugh commented Oct 5, 2023

I think in the balance, I'm leaning to towards "store", because resources could mean many thing... "dataStoreType" and "dataStoreName" ?

Or, we could just rip the bandaid off and get rid of /core/ ;-).

What if we had /collection/{collectionName}/core/{coreName} for cores? I.e, if we can't come up with a great solution to the naming problem, let's just change the problem statement to avoid a crummy answer... In SolrCloud, why would I ever go direct to a core? I would always navigate through a collection.. Heck, maybe it gives nicer answers. And, maybe in the future I could get data about a core regardless of which physical node I'm talking to. In legacy old should be retired and never spoken of again Solr, you just do /collection/standalone/core/{coreName}. <--- I'm liking this more...

@epugh
Copy link
Contributor

epugh commented Oct 5, 2023

These API's are all experimental and new, so introducing /collection/standalone/ into the "sorta classic" core urls is actually allowed ;-). And think about it helping folks in the future with their standalone to cloud migrations...

Also merges GetSchemaAPI and GetSchemaZkVersionAPI, since there's no
real need for them to be separate.
@gerlowskija
Copy link
Contributor Author

I think I'll go with "store" as the term for now. I kindof agree with Eric that "resource" is maybe too generic? It's also an overloaded term, as JAX-RS/Jersey calls API classes "resources" too. We'll be able to change this later if folks find it doesn't work well in practice.

(Also, I suspected it from his comments above, but confirmed with @epugh offline that some of the other suggestions, i.e. /collections/standalone, were tongue-in-cheek 😛 )

Will aim to update this PR shortly with the new name, and merge. Thanks for the help guys!

@gerlowskija gerlowskija merged commit b74fcaf into apache:main Oct 11, 2023
@gerlowskija gerlowskija deleted the SOLR-16825-get-schema-api-final branch October 11, 2023 18:14
gerlowskija added a commit that referenced this pull request Oct 11, 2023
This commit covers various GET /schema APIss well as delete-shard.

Extracting annotated interfaces for these APIs includes them in the SolrRequest-
generation we now do in SolrJ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants