-
Notifications
You must be signed in to change notification settings - Fork 806
SOLR-17023: Add documentation and tutorial to new ONNX model feature #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jzonthemtn would you give this a pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.
- Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
- Integrates the new tutorial into existing documentation navigation
- Improves code quality in related Java files with IntelliJ suggestions
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tutorial-opennlp.adoc | New comprehensive tutorial for sentiment analysis using OpenNLP and ONNX models |
| solr-tutorial.adoc | Updated to reference the new OpenNLP tutorial |
| getting-started-nav.adoc | Added navigation entry for the new tutorial |
| update-request-processors.adoc | Added documentation for DocumentCategorizerUpdateProcessorFactory |
| OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java | Code quality improvements and comment fixes |
| DocumentCategorizerUpdateProcessorFactory.java | Added comprehensive JavaDoc and code improvements |
| README.md | Updated to mention Huggingface model support |
Comments suppressed due to low confidence (1)
solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:1
- Unused field
analyzerFieldTypeshould be removed as it's declared but never used in the class.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| The model we're using is a multilingual BERT model fine-tuned for sentiment classification and quantized for better performance. It produces classifications on a 5-point scale from "very bad" to "very good". | ||
| Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"]. |
Copilot
AI
Sep 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate line: this exact ONNX reference link appears both on line 80 and line 84. Remove the duplicate on line 84.
| Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"]. |
| {solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model. | ||
| Note that in order to use model files larger than 1MB on SolrCloud, you must xref:deployment-guide:zookeeper-ensemble#increasing-the-file-size-limit[configure both ZooKeeper server and clients]. | ||
|
|
||
| {solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.html[DocumentCategorizerUpdateProcessorFactory]::: Classify text in fields using models. These models can be sourced from Huggingface and run directly in Solr using OpenNLP via {onnx}[ONNX]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These models must be in onxx format and can be ...
| This tutorial demonstrates how to enhance Solr with advanced Natural Language Processing (NLP) capabilities through Apache OpenNLP and ONNX. | ||
| You'll learn how to set up a sentiment analysis pipeline that automatically classifies documents during indexing. | ||
|
|
||
| We are going to use the https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model in the tutorial, however there are many others you can use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a link reference in adoc instead of the plain text link?
| Before starting this tutorial, you'll need: | ||
|
|
||
| * Apache Solr (version 10 or later) | ||
| * The `analysis-extras` module enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe describe how to do that or point to a doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually covert his below, so I think this is an example of claude being overly verbose.. this is all what you are GOING to do.... stripping it out.
|
|
||
| * Apache Solr (version 10 or later) | ||
| * The `analysis-extras` module enabled | ||
| * Packages enabled in Solr (to allow you to upload the model files to the cluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe describe how to do that or point to a doc?
| [,console] | ||
| ---- | ||
| $ export SOLR_SECURITY_MANAGER_ENABLED=false | ||
| $ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it is done given via properties ;-) (maybe just add the properties in the prerequisites in single line code highlighting?)
|
|
||
| [NOTE] | ||
| ==== | ||
| We temporarily disable the security manager to allow loading of the ONNX runtime. In production environments, you would configure appropriate security policies instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to link to docs about security policies? Something like: To learn more about it, follow the link...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are at a werid time where you don't have it if you are on newer java's, and we are going to be soon removing all mention of it, so rewording.
|
|
||
| Let's create a directory for our models and download them: | ||
|
|
||
| [,console] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will only work on Linux. If you are on OSX (no wget) or Windows (no wget). Maybe state in the prerequisites that this tutorial was done on Linux and readers might need to adjust commands ?
|
|
||
| [,console] | ||
| ---- | ||
| $ mkdir -p models/sentiment/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really clear in which direction the folders need to be created. Maybe define some variable at the beginning, so people can re-use that through the tutorial?
|
Thanks @rzo1 for all your edits/feedback. I responded to them, did a fresh test, and am going to add changelog and this is ready for Solr 10! |
https://issues.apache.org/jira/browse/SOLR-17023
This PR adds comprehensive documentation and tutorial content for the new ONNX model feature that enables sentiment analysis using OpenNLP in Solr. It introduces a complete tutorial explaining how to use document categorization with machine learning models.
Adds a detailed tutorial on sentiment analysis with OpenNLP and ONNX models
Integrates the new tutorial into existing documentation navigation
Improves code quality in related Java files with IntelliJ suggestion