Skip to content

Conversation

@epugh
Copy link
Contributor

@epugh epugh commented Sep 16, 2025

https://issues.apache.org/jira/browse/SOLR-17023

This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.

Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
Integrates the new tutorial into existing documentation navigation
Improves code quality in related Java files with IntelliJ suggestion

@github-actions github-actions bot added documentation Improvements or additions to documentation cat:index labels Sep 16, 2025
@epugh
Copy link
Contributor Author

epugh commented Sep 16, 2025

@jzonthemtn would you give this a pass?

@epugh epugh requested review from Copilot and cpoerschke September 16, 2025 15:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.

  • Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
  • Integrates the new tutorial into existing documentation navigation
  • Improves code quality in related Java files with IntelliJ suggestions

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tutorial-opennlp.adoc New comprehensive tutorial for sentiment analysis using OpenNLP and ONNX models
solr-tutorial.adoc Updated to reference the new OpenNLP tutorial
getting-started-nav.adoc Added navigation entry for the new tutorial
update-request-processors.adoc Added documentation for DocumentCategorizerUpdateProcessorFactory
OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java Code quality improvements and comment fixes
DocumentCategorizerUpdateProcessorFactory.java Added comprehensive JavaDoc and code improvements
README.md Updated to mention Huggingface model support
Comments suppressed due to low confidence (1)

solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:1

  • Unused field analyzerFieldType should be removed as it's declared but never used in the class.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

The model we're using is a multilingual BERT model fine-tuned for sentiment classification and quantized for better performance. It produces classifications on a 5-point scale from "very bad" to "very good".
Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"].
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate line: this exact ONNX reference link appears both on line 80 and line 84. Remove the duplicate on line 84.

Suggested change
Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"].

Copilot uses AI. Check for mistakes.
{solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model.
Note that in order to use model files larger than 1MB on SolrCloud, you must xref:deployment-guide:zookeeper-ensemble#increasing-the-file-size-limit[configure both ZooKeeper server and clients].

{solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.html[DocumentCategorizerUpdateProcessorFactory]::: Classify text in fields using models. These models can be sourced from Huggingface and run directly in Solr using OpenNLP via {onnx}[ONNX].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These models must be in onxx format and can be ...

This tutorial demonstrates how to enhance Solr with advanced Natural Language Processing (NLP) capabilities through Apache OpenNLP and ONNX.
You'll learn how to set up a sentiment analysis pipeline that automatically classifies documents during indexing.

We are going to use the https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model in the tutorial, however there are many others you can use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a link reference in adoc instead of the plain text link?

Before starting this tutorial, you'll need:

* Apache Solr (version 10 or later)
* The `analysis-extras` module enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe describe how to do that or point to a doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually covert his below, so I think this is an example of claude being overly verbose.. this is all what you are GOING to do.... stripping it out.


* Apache Solr (version 10 or later)
* The `analysis-extras` module enabled
* Packages enabled in Solr (to allow you to upload the model files to the cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe describe how to do that or point to a doc?

[,console]
----
$ export SOLR_SECURITY_MANAGER_ENABLED=false
$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it is done given via properties ;-) (maybe just add the properties in the prerequisites in single line code highlighting?)


[NOTE]
====
We temporarily disable the security manager to allow loading of the ONNX runtime. In production environments, you would configure appropriate security policies instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to link to docs about security policies? Something like: To learn more about it, follow the link...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are at a werid time where you don't have it if you are on newer java's, and we are going to be soon removing all mention of it, so rewording.


Let's create a directory for our models and download them:

[,console]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will only work on Linux. If you are on OSX (no wget) or Windows (no wget). Maybe state in the prerequisites that this tutorial was done on Linux and readers might need to adjust commands ?


[,console]
----
$ mkdir -p models/sentiment/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not really clear in which direction the folders need to be created. Maybe define some variable at the beginning, so people can re-use that through the tutorial?

@epugh
Copy link
Contributor Author

epugh commented Sep 18, 2025

Thanks @rzo1 for all your edits/feedback. I responded to them, did a fresh test, and am going to add changelog and this is ready for Solr 10!

@epugh epugh merged commit 23afb28 into apache:main Sep 18, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:index documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants