Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-14726: Initial draft of a new quickstart guide #2594

Draft
wants to merge 2 commits into
base: branch_8x
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion solr/solr-ref-guide/src/getting-started.adoc
Expand Up @@ -22,7 +22,9 @@ Solr makes it easy for programmers to develop sophisticated, high-performance se

This section introduces you to the basic Solr architecture and features to help you get up and running quickly. It covers the following topics:

<<solr-tutorial.adoc#,Solr Tutorial>>: This tutorial covers getting Solr up and running
<<quickstart.adoc#,Quickstart Guide>>: A quickstart guide to get started with Solr

<<solr-tutorial.adoc#,Solr Tutorial>>: A more detailed tutorial than the quickstart guide

<<a-quick-overview.adoc#,A Quick Overview>>: A high-level overview of how Solr works.

Expand Down
140 changes: 140 additions & 0 deletions solr/solr-ref-guide/src/quickstart.adoc
@@ -0,0 +1,140 @@
= Quickstart Guide
:experimental:
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

Here's a quickstart guide to start Solr, add some documents and perform some searches.

== Starting Solr

Start a Solr node in cluster mode (SolrCloud mode)

[source,subs="verbatim,attributes+"]
----
$ bin/solr -c

Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=34942). Happy searching!
----

To start another Solr node and have it join the cluster alongside the first node,

[source,subs="verbatim,attributes+"]
----
$ bin/solr -c -z localhost:9983 -p 8984
----

An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to XXXX.

== Creating a collection

Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows:

[source,subs="verbatim,attributes+"]
----
$ curl --request POST \
--url http://localhost:8983/api/collections \
--header 'Content-Type: application/json' \
--data '{
"create": {
"name": "techproducts",
"numShards": 1,
"replicationFactor": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no config attribute ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same. If the consensus is that we're going away from field guessing, then we should not promote the current _default config, but rather be explicit and reference the bundled techproducts configset. Or better, show them how to use Schema Designer to setup a configset for a certain dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For quickstart examples, we don't need the user to use their own configsets. They can start with the default configset, add fields (schema API) and their indexing/searching.

If the consensus is that we're going away from field guessing, then we should not promote the current _default config, but rather be explicit and reference the bundled techproducts configset.

I'm more inclined to remove the techproducts configset. They can be downloaded from some web resource for those who need it.

Or better, show them how to use Schema Designer to setup a configset for a certain dataset?

+1

}
}'
----

== Indexing documents

A single document can be indexed as:
[source,subs="verbatim,attributes+"]
----
$ curl --request POST \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit pick is that the collection is tech productions, and we have books..... Maybe we should think (separately) renaming techproducts to just products.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. I just took those docs off the Solr tutorial (which indexes books into techproducts). But, clearly, it is time for a better example.

--url 'http://localhost:8983/api/collections/techproducts/update' \
--header 'Content-Type: application/json' \
--data ' {
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}'
----

Multiple documents can be indexed in the same request:
[source,subs="verbatim,attributes+"]
----
$ curl --request POST \
--url 'http://localhost:8983/api/collections/techproducts/update' \
--header 'Content-Type: application/json' \
--data ' [
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}
,
{
"id" : "978-1423103349",
"cat" : ["book","paperback"],
"name" : "The Sea of Monsters",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 2,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 6.49,
"pages_i" : 304
}
]'
----

A file containing the documents can be indexed as follows:
[source,subs="verbatim,attributes+"]
----
$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update
----

== Commit
After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if introducing terms used by other search engines is useful... though maybe we want to build up a gloassary that would list "equivalent" terms from other engines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A glossary sounds like a very good idea, for people coming from different systems.

Copy link
Contributor Author

@chatman chatman Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if introducing terms used by other search engines is useful

I feel that those coming from ES / OpenSearch backgrounds might be able to relate better. My main motivation with this document is to cut down on paragraphs of text and have more copy-paste-able snippets, esp. using JSON/V2 apis, to make Solr more appealing to those who find ES easy to use (mainly due to their superior beginner documentation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"solr for ES/OS refugees"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me as a point of reference. It might be more economical to say "(also called refresh in ElasticSearch/OpenSearch)" ... unless there are other search engines that refer to this concept as "refresh"?


[source,subs="verbatim,attributes+"]
----
$ curl -X POST -H 'Content-type: application/json' -d '{"set-property":{"updateHandler.autoCommit.maxTime":15000}}' http://localhost:8983/api/collections/techproducts/config
----

Alternatively, `commit=true` can be passed to calls to `/update` handler (in above examples) to commit immediately after indexing the document. Committing after every document (or a small batch of documents) is not recommended. Here's how one can send a commit:
[source,subs="verbatim,attributes+"]
----
$ curl -X POST http://localhost:8983/api/collections/techproducts/update?commit=true
----

== Basic search queries

... TODO