SOLR-17023: Add documentation and tutorial to new ONNX model feature #3663

epugh · 2025-09-16T15:55:37Z

https://issues.apache.org/jira/browse/SOLR-17023

This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.

Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
Integrates the new tutorial into existing documentation navigation
Improves code quality in related Java files with IntelliJ suggestion

epugh · 2025-09-16T15:56:06Z

@jzonthemtn would you give this a pass?

Copilot

Pull Request Overview

This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.

Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
Integrates the new tutorial into existing documentation navigation
Improves code quality in related Java files with IntelliJ suggestions

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tutorial-opennlp.adoc	New comprehensive tutorial for sentiment analysis using OpenNLP and ONNX models
solr-tutorial.adoc	Updated to reference the new OpenNLP tutorial
getting-started-nav.adoc	Added navigation entry for the new tutorial
update-request-processors.adoc	Added documentation for DocumentCategorizerUpdateProcessorFactory
OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java	Code quality improvements and comment fixes
DocumentCategorizerUpdateProcessorFactory.java	Added comprehensive JavaDoc and code improvements
README.md	Updated to mention Huggingface model support

Comments suppressed due to low confidence (1)

solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:1

Unused field analyzerFieldType should be removed as it's declared but never used in the class.

/*

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-16T15:57:11Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+
+The model we're using is a multilingual BERT model fine-tuned for sentiment classification and quantized for better performance. It produces classifications on a 5-point scale from "very bad" to "very good".
+
+Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"].


Duplicate line: this exact ONNX reference link appears both on line 80 and line 84. Remove the duplicate on line 84.

Suggested change

Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"].

rzo1 · 2025-09-17T12:45:12Z

solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc

 {solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model.
 Note that in order to use model files larger than 1MB on SolrCloud, you must xref:deployment-guide:zookeeper-ensemble#increasing-the-file-size-limit[configure both ZooKeeper server and clients].

+{solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.html[DocumentCategorizerUpdateProcessorFactory]::: Classify text in fields using models.  These models can be sourced from Huggingface and run directly in Solr using OpenNLP via {onnx}[ONNX].


These models must be in onxx format and can be ...

rzo1 · 2025-09-17T12:45:54Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+This tutorial demonstrates how to enhance Solr with advanced Natural Language Processing (NLP) capabilities through Apache OpenNLP and ONNX. 
+You'll learn how to set up a sentiment analysis pipeline that automatically classifies documents during indexing.
+
+We are going to use the https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model in the tutorial, however there are many others you can use.


Can this be a link reference in adoc instead of the plain text link?

rzo1 · 2025-09-17T12:46:16Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled


Maybe describe how to do that or point to a doc?

we actually covert his below, so I think this is an example of claude being overly verbose.. this is all what you are GOING to do.... stripping it out.

rzo1 · 2025-09-17T12:46:35Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the cluster)


Maybe describe how to do that or point to a doc?

rzo1 · 2025-09-17T12:47:30Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+[,console]
+----
+$ export SOLR_SECURITY_MANAGER_ENABLED=false
+$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true


Ah it is done given via properties ;-) (maybe just add the properties in the prerequisites in single line code highlighting?)

rzo1 · 2025-09-17T12:47:54Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+
+[NOTE]
+====
+We temporarily disable the security manager to allow loading of the ONNX runtime. In production environments, you would configure appropriate security policies instead.


Would it make sense to link to docs about security policies? Something like: To learn more about it, follow the link...

we are at a werid time where you don't have it if you are on newer java's, and we are going to be soon removing all mention of it, so rewording.

rzo1 · 2025-09-17T12:49:14Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+
+Let's create a directory for our models and download them:
+
+[,console]


That will only work on Linux. If you are on OSX (no wget) or Windows (no wget). Maybe state in the prerequisites that this tutorial was done on Linux and readers might need to adjust commands ?

rzo1 · 2025-09-17T12:50:03Z

solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc

+
+[,console]
+----
+$ mkdir -p models/sentiment/


It is not really clear in which direction the folders need to be created. Maybe define some variable at the beginning, so people can re-use that through the tutorial?

epugh · 2025-09-18T17:03:50Z

Thanks @rzo1 for all your edits/feedback. I responded to them, did a fresh test, and am going to add changelog and this is ready for Solr 10!

…ation

epugh added 5 commits September 16, 2025 10:23

First pass of tutorial

dc1415b

Add a tutorial on this new capablity.

8a0cb92

A bit o' advertising

b58779d

Add docs and respond to code suggestions.

59f0c88

lint and clean up

08e06d9

github-actions bot added documentation Improvements or additions to documentation cat:index labels Sep 16, 2025

epugh requested review from Copilot and cpoerschke September 16, 2025 15:56

Copilot AI reviewed Sep 16, 2025

View reviewed changes

Remove duplicate link

6eda9ea

rzo1 reviewed Sep 17, 2025

View reviewed changes

respond to feedback on content

8acce9d

epugh added 3 commits September 18, 2025 13:07

Document new feature

2d561fc

Merge remote-tracking branch 'upstream/main' into SOLR-17023-document…

7319f49

…ation

document new feature

bdff1bb

epugh merged commit 23afb28 into apache:main Sep 18, 2025
2 of 3 checks passed


		The model we're using is a multilingual BERT model fine-tuned for sentiment classification and quantized for better performance. It produces classifications on a 5-point scale from "very bad" to "very good".

		Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"].


		Let's create a directory for our models and download them:

		[,console]

SOLR-17023: Add documentation and tutorial to new ONNX model feature #3663

SOLR-17023: Add documentation and tutorial to new ONNX model feature #3663

Uh oh!

Conversation

epugh commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epugh commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

epugh commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

epugh commented Sep 16, 2025 •

edited

Loading