Skip to content

Commit

Permalink
SOLR-13259: Add new section on Reindexing in Solr (#594)
Browse files Browse the repository at this point in the history
Add new reindexing.adoc page; standardize on "reindex" vs "re-index"
  • Loading branch information
ctargett committed Mar 19, 2019
1 parent c5685d6 commit aa643af
Show file tree
Hide file tree
Showing 10 changed files with 218 additions and 19 deletions.
4 changes: 4 additions & 0 deletions solr/solr-ref-guide/src/collections-api.adoc
Expand Up @@ -139,11 +139,15 @@ Set to `true` to clear all stored completed and failed async request responses.

*Input: Valid Request ID*

// tag::createalias-simple-example[]

[source,text]
----
http://localhost:8983/solr/admin/collections?action=DELETESTATUS&requestid=foo&wt=xml
----

//end::createalias-simple-example[]

*Output*

[source,xml]
Expand Down
4 changes: 2 additions & 2 deletions solr/solr-ref-guide/src/docvalues.adoc
Expand Up @@ -38,7 +38,7 @@ Enabling a field for docValues only requires adding `docValues="true"` to the fi
----

[IMPORTANT]
If you have already indexed data into your Solr index, you will need to completely re-index your content after changing your field definitions in `schema.xml` in order to successfully use docValues.
If you have already indexed data into your Solr index, you will need to completely reindex your content after changing your field definitions in `schema.xml` in order to successfully use docValues.

DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are:

Expand Down Expand Up @@ -79,7 +79,7 @@ If `docValues="true"` for a field, then DocValues will automatically be used any

Field values retrieved during search queries are typically returned from stored values. However, non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (e.g., "`fl=*`") for search queries depending on the effective value of the `useDocValuesAsStored` parameter for each field. For schema versions >= 1.6, the implicit default is `useDocValuesAsStored="true"`. See <<field-type-definitions-and-properties.adoc#field-type-definitions-and-properties,Field Type Definitions and Properties>> & <<defining-fields.adoc#defining-fields,Defining Fields>> for more details.

When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly requested by name in the <<common-query-parameters.adoc#fl-field-list-parameter,fl param>>, but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular" stored fields at query time has performance implications that stored fields may not because DocValues are column-oriented and may therefore incur additional cost to retrieve for each returned document. Also note that while returning non-stored fields from DocValues, the values of a multi-valued field are returned in sorted order rather than insertion order and may have duplicates removed, see above. If you require the multi-valued fields to be returned in the original insertion order, then make your multi-valued field as stored (such a change requires re-indexing).
When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly requested by name in the <<common-query-parameters.adoc#fl-field-list-parameter,fl param>>, but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular" stored fields at query time has performance implications that stored fields may not because DocValues are column-oriented and may therefore incur additional cost to retrieve for each returned document. Also note that while returning non-stored fields from DocValues, the values of a multi-valued field are returned in sorted order rather than insertion order and may have duplicates removed, see above. If you require the multi-valued fields to be returned in the original insertion order, then make your multi-valued field as stored (such a change requires reindexing).

In cases where the query is returning _only_ docValues fields performance may improve since returning stored fields requires disk reads and decompression whereas returning docValues fields in the fl list only requires memory access.

Expand Down
@@ -1,5 +1,5 @@
= Indexing and Basic Data Operations
:page-children: introduction-to-solr-indexing, post-tool, uploading-data-with-index-handlers, uploading-data-with-solr-cell-using-apache-tika, uploading-structured-data-store-data-with-the-data-import-handler, updating-parts-of-documents, detecting-languages-during-indexing, de-duplication, content-streams
:page-children: introduction-to-solr-indexing, post-tool, uploading-data-with-index-handlers, uploading-data-with-solr-cell-using-apache-tika, uploading-structured-data-store-data-with-the-data-import-handler, updating-parts-of-documents, detecting-languages-during-indexing, de-duplication, content-streams, reindexing
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
Expand Down Expand Up @@ -39,6 +39,8 @@ This section describes how Solr adds data to its index. It covers the following
* *<<content-streams.adoc#content-streams,Content Streams>>*: Information about streaming content to Solr Request Handlers.
* *<<reindexing.adoc#reindexing,Reindexing>>*: Details about when reindexing is required or recommended, and some strategies for completely reindexing your documents.
== Indexing Using Client APIs

Using client APIs, such as <<using-solrj.adoc#using-solrj,SolrJ>>, from your applications is an important option for updating Solr indexes. See the <<client-apis.adoc#client-apis,Client APIs>> section for more information.
8 changes: 4 additions & 4 deletions solr/solr-ref-guide/src/major-changes-in-solr-7.adoc
Expand Up @@ -26,9 +26,9 @@ There are many hundreds of changes in Solr 7, however, so a thorough review of t

You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 6.2, you should review changes made in all subsequent 6.x releases in addition to changes for 7.0.

Re-indexing your data is considered the best practice and you should try to do so if possible. However, if re-indexing is not feasible, keep in mind you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.
<<reindexing.adoc#upgrades,Reindexing>> your data is considered the best practice and you should try to do so if possible. However, if reindexing is not feasible, keep in mind you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.

If you do not re-index now, keep in mind that you will need to either re-index your data or upgrade your indexes before you will be able to move to Solr 8 when it is released in the future. See the section <<indexupgrader-tool.adoc#indexupgrader-tool,IndexUpgrader Tool>> for more details on how to upgrade your indexes.
If you do not reindex now, keep in mind that you will need to either reindex your data or upgrade your indexes before you will be able to move to Solr 8 when it is released in the future. See the section <<indexupgrader-tool.adoc#indexupgrader-tool,IndexUpgrader Tool>> for more details on how to upgrade your indexes.

See also the section <<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading a Solr Cluster>> for details on how to upgrade a SolrCloud cluster.

Expand Down Expand Up @@ -131,7 +131,7 @@ The `qt` parameter is still used as a SolrJ special parameter that specifies the
=== Point Fields Are Default Numeric Types
Solr has implemented \*PointField types across the board, to replace Trie* based numeric fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8.

If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to re-index your data.
If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to reindex your data.

=== Spatial Fields

Expand Down Expand Up @@ -187,7 +187,7 @@ Note again that this is not a complete list of all changes that may impact your

* The Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
* JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any `mincount` greater than 1.
* If you use historical dates, specifically on or before the year 1582, you should re-index for better date handling.
* If you use historical dates, specifically on or before the year 1582, you should reindex for better date handling.
* If you use the JSON Facet API (json.facet) with `method=stream`, you must now set `sort='index asc'` to get the streaming behavior; otherwise it won't stream. Reminder: `method` is a hint that doesn't change defaults of other parameters.
* If you use the JSON Facet API (json.facet) to facet on a numeric field and if you use `mincount=0` or if you set the prefix, you will now get an error as these options are incompatible with numeric faceting.
* Solr's logging verbosity at the INFO level has been greatly reduced, and you may need to update the log configs to use the DEBUG level to see all the logging messages you used to see at INFO level before.
Expand Down
2 changes: 1 addition & 1 deletion solr/solr-ref-guide/src/managed-resources.adoc
Expand Up @@ -218,7 +218,7 @@ However, the intent of this API implementation is that changes will be applied u

[IMPORTANT]
====
Changing things like stop words and synonym mappings typically require re-indexing existing documents if being used by index-time analyzers. The RestManager framework does not guard you from this, it simply makes it possible to programmatically build up a set of stop words, synonyms, etc.
Changing things like stop words and synonym mappings typically require reindexing existing documents if being used by index-time analyzers. The RestManager framework does not guard you from this, it simply makes it possible to programmatically build up a set of stop words, synonyms, etc. See the section <<reindexing.adoc#reindexing,Reindexing>> for more information about reindexing your documents.
====

== RestManager Endpoint
Expand Down

0 comments on commit aa643af

Please sign in to comment.