Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates for solr based search #1091

Merged
merged 9 commits into from
Dec 12, 2019
Merged

updates for solr based search #1091

merged 9 commits into from
Dec 12, 2019

Conversation

fdefalco
Copy link
Contributor

No description provided.

pavgra
pavgra previously requested changes May 15, 2019
src/main/resources/application.properties Outdated Show resolved Hide resolved
@anthonysena
Copy link
Collaborator

This issue aims to address #580

@anthonysena
Copy link
Collaborator

anthonysena commented Dec 5, 2019

Setup notes updated in #1091 (comment)

@anthonysena
Copy link
Collaborator

anthonysena commented Dec 5, 2019

A few other notes after discussing with @fdefalco:

  • We should provide some guidance on how to run SOLR in a durable process (i.e. Windows Service) if possible.
  • It is desirable to expose the SOLR settings through the /info endpoint. The elements would be:
    • SOLR enabled
    • List of the cores available
  • The vocabulary search endpoint needs to be backwards compatible - this branch makes a breaking change to the endpoint.
  • Add the SOLR resources that are mentioned earlier in this PR to the \resources folder for reference.

Nice to haves:

  • Expose the InitializeFullTextIndexCache through a secured end-point to enable a refresh of SOLR cores without the need to restart WebAPI
  • Provide some admin functionality for building cores via Atlas/WebAPI

pom.xml Show resolved Hide resolved
@anthonysena
Copy link
Collaborator

@fdefalco - this is back to you for review. I've addressed the notes mentioned in #1091 (comment) as well as feedback from @pavgra regarding the use of solrj, IoC approach and change to the configuration settings. From my side, this is working well as I've tested it with SOLR enabled/disabled.

We can look at creating a new issue for Atlas 3.0 for the "nice to haves" mentioned above; let me know if you agree with that.

Still outstanding is:

  • Guidance on how to setup & operate SOLR in a durable process - Tomcat should be suitable for this purpose from what I've been reading.

@@ -46,6 +46,7 @@
<person.viewDates>false</person.viewDates>
<!-- Full Text Search With SOLR Settings -->
<solr.endpoint></solr.endpoint>
<solr.query.prefix>{!complexphrase inOrder=true}</solr.query.prefix>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the default query behavior to use the SOLR ComplexPhraseQueryParser: https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser. This will allow for searches that are closer in behavior to the SQL Wildcard search.

@@ -305,12 +305,14 @@
<filter class="solr.FlattenGraphFilterFactory"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonysena
Copy link
Collaborator

anthonysena commented Dec 12, 2019

Final Setup Notes for Wiki

These instructions will focus on Windows since Solr deployment on Linux is covered on Solr install guide.

Install the Solr Windows Service

We will use Apache Procrun to wrap Solr 8.3.1 in a Windows service to ensure we can control start up/shutdown like other services. To do this, follow the following steps

@echo off
set SERVICE_NAME=Solr
set SERVICE_HOME=E:\solr\solr-8.3.1
set PR_INSTALL=%SERVICE_HOME%\bin\prunsrv.exe

@REM Service Log Configuration
set PR_LOGPREFIX=%SERVICE_NAME%
set PR_LOGPATH=%SERVICE_HOME%\logs
set PR_STDOUTPUT=auto
set PR_STDERROR=auto
set PR_LOGLEVEL=Debug

set PR_STARTUP=auto
set PR_STARTMODE=exe
set PR_STARTIMAGE=%SERVICE_HOME%\bin\solr.cmd
set PR_STARTPARAMS=start

@REM Shutdown Configuration
set PR_STOPMODE=exe
set PR_STOPIMAGE=%SERVICE_HOME%\bin\solr.cmd
set PR_STOPPARAMS=stop;-all

%PR_INSTALL% //IS/%SERVICE_NAME% ^
  --Description="Apache Solr 8.3.1" ^
  --DisplayName="%SERVICE_NAME%" ^
  --Install="%PR_INSTALL%" ^
  --Startup="%PR_STARTUP%" ^
  --LogPath="%PR_LOGPATH%" ^
  --LogPrefix="%PR_LOGPREFIX%" ^
  --LogLevel="%PR_LOGLEVEL%" ^
  --StdOutput="%PR_STDOUTPUT%" ^
  --StdError="%PR_STDERROR%" ^
  --StartMode="%PR_STARTMODE%" ^
  --StartImage="%PR_STARTIMAGE%" ^
  ++StartParams="%PR_STARTPARAMS%" ^
  --StopMode="%PR_STOPMODE%" ^
  --StopImage="%PR_STOPIMAGE%" ^
  ++StopParams="%PR_STOPPARAMS%"

if not errorlevel 1 goto installed
echo Failed to install "%SERVICE_NAME%" service.  Refer to log in %PR_LOGPATH%
exit /B 1

:installed
echo The Service "%SERVICE_NAME%" has been installed
exit /B 0

NOTE: Adjust the SERVICE_HOME setting to match your install location.

Run a Windows Command Prompt in Administrator mode and then run E:\solr\solr-8.3.1\bin\service.bat. The service will be installed in the list of Windows Services as "Solr". Before moving forward, confirm the service is created but not running.

Creating the Solr core for WebAPI vocabulary search

NOTE: The name of the Solr core used must match the vocabulary version you plan to use in ATLAS & WebAPI with an underscore. For this example, the vocabulary version is "v5.0 17-JUN-19" and the corresponding folder name to hold this vocabulary is "v5.0_17-JUN-19". You should verify your vocabulary by running the following query on the CDM(s) you plan to use with WebAPI:

select vocabulary_version from vocabulary where vocabulary_id = 'None';

If your vocabulary version and core do not match, WebAPI will not find the Solr core and it will continue to use the DB when querying the vocabulary.

Solr core creation

  • Verify that the JAR files for your RDMBS are located in E:\solr\solr-8.3.1\server\lib otherwise you will face issues when attempting to build the SOLR core.
  • Created 2 directories for the core:
    • E:\solr\solr-8.3.1\server\solr\v5.0_17-JUN-19
    • E:\solr\solr-8.3.1\server\solr\v5.0_17-JUN-19\data
  • Copy the contents of WebAPI\src\main\resources\solr into E:\solr\solr-8.3.1\server\solr\v5.0_17-JUN-19. Next make the following edits to the files in E:\solr\solr-8.3.1\server\solr\v5.0_17-JUN-19:
    • data-config.xml: Edit the data source & references in the query to match the database holding your vocabulary
    • core.properties: Edit the name to match the directory: v5.0_17-JUN-19
    • conf\solrconfig.xml: Edit the connection information in the <requestHandler name="/dataimport" class="solr.DataImportHandler"> block to provide it the details to connect the database holding your vocabulary.

Building the Solr core

Next, start up the Solr windows service and verify connect to http://localhost:8983. If there are problems starting up the service, please review the logs found in E:\solr\solr-8.3.1\server\logs.

  • From the SOLR Admin screen (http://localhost:8983/solr/#/), build the core by using the 'Core Selector' dropdown in the left-hand menu. Select the v5.0 17-JUN-19 core from the drop down, and then in the sub-menu that appears, I selected Dataimport and then used the 'Execute' button. The service will build the core from the concepts per the query in the data-config.xml
  • Once the execution of the core indexing is complete, you can use the Solr "Query" tool under the core sub-menu to make sure the core is working properly before moving to WebAPI. A sample query you can use is: query:metformin

WebAPI Configuration

In the settings.xml for WebAPI, add the following XML to your profile:

  <solr.endpoint>http://localhost:8983/solr</solr.endpoint>

Recompile and deploy WebAPI.war. Once deployed, verify that the SOLR service is available for search by going to the endpoint WebAPI/info. The output will look similar to this:

{
	"version": "2.8.0",
	"buildInfo": {
		"artifactVersion": "WebAPI 2.8.0-SNAPSHOT",
		"build": "NA",
		"timestamp": "Thu Dec 12 16:53:22 UTC 2019"
	},
	"configuration": {
		"security": {
			"enabled": true
		},
		"vocabulary": {
			"cores": [
				"v5.0 17-JUN-19"
			],
			"solrEnabled": true
		},
		"person": {
			"viewDatesPermitted": false
		},
		"heracles": {
			"smallCellCount": "5"
		}
	}
}

Note in the JSON above, the vocabulary section shows the Solr is enabled and lists the core(s) that are available for search.

@anthonysena anthonysena dismissed pavgra’s stale review December 12, 2019 20:03

All review items addressed.

@anthonysena anthonysena merged commit 78c0132 into master Dec 12, 2019
@anthonysena anthonysena deleted the vocabulary-performance branch December 12, 2019 20:03
@anthonysena anthonysena mentioned this pull request Dec 12, 2019
8 tasks
@alondhe
Copy link
Contributor

alondhe commented Apr 22, 2021

Updated to use SOLR 8.11.1 due to log4j vulnerability

Install the Solr Service on RedHat (tested on v7)

Download the binary and install (sudo is needed)

cd /opt
# Download the binary (https://solr.apache.org/downloads.html) -- v8.11.1 tgz file

tar xzf solr-8.11.1.tgz solr-8.11.1/bin/install_solr_service.sh --strip-components=2
bash ./install_solr_service.sh solr-8.11.1.tgz

Solr core creation

  1. Verify that the JAR files for your RDMBS are located in /opt/solr/server/lib otherwise you will face issues when attempting to build the SOLR core.

  2. Create 2 directories for the core:

  • /var/solr/data/v5.0_12-FEB-21
  • /var/solr/data/v5.0_12-FEB-21/data
  1. Copy the contents of WebAPI/src/main/resources/solr into /var/solr/data/v5.0_12-FEB-21.

  2. Next make the following edits to the files in /var/solr/data/v5.0_12-FEB-21:

  • data-config.xml: Edit the data source & references in the query to match the database holding your vocabulary
  • core.properties: Edit the name to match the directory: v5.0_12-FEB-21, remove the "conf\\" prefix from config and schema values.
  • conf\solrconfig.xml: Edit the connection information in the block to provide it the details to connect the database holding your vocabulary.
    • if using Spark, add this to this /dataimport block: <str name="autoCommit">true</str>

Building the Solr core

Next, start up the Solr service (systemctl start solr) and verify connect to http://localhost:8983. If there are problems starting up the service, please review the logs found in /var/solr/logs.

  • From the SOLR Admin screen (http://localhost:8983/solr/#/), build the core by using the 'Core Selector' dropdown in the left-hand menu. Select the v5.0_12-FEB-21 core from the drop down, and then in the sub-menu that appears, I selected Dataimport and then used the 'Execute' button. The service will build the core from the concepts per the query in the data-config.xml
  • Once the execution of the core indexing is complete, you can use the Solr "Query" tool under the core sub-menu to make sure the core is working properly before moving to WebAPI. A sample query you can use is to enter this in the q field: query:metformin

Then, follow the rest of the WebAPI instructions in the Windows instructions above.

@alondhe
Copy link
Contributor

alondhe commented Jun 18, 2021

@anthonysena - the query for the core only uses the concept table, should it also union STCM records? I know STCM is deprecated (ish), but there are site-specific mappings we (and I believe others) use there.

@alondhe
Copy link
Contributor

alondhe commented Jul 9, 2021

Updated to use SOLR 8.11.1 due to log4j vulnerability

Install SOLR using Docker

On Windows/Mac:

Download Docker desktop app (https://www.docker.com/products/docker-desktop).

On Debian/Ubuntu Linux:

sudo apt-get install docker

On RHEL:

sudo yum install docker

Solr core creation

  1. Create a directory to store the Dockerfile and configuration files, this directory is known as the docker build context folder. Navigate to it in your command line interface.

  2. Create a file named "Dockerfile" in this build context folder and use this as the content:

FROM solr:8.11.1

# argument variables to define
ARG vocabulary_version
ARG jdbc_file_name

# copy the solr configset from WebAPI
COPY --chown=solr /WebAPI/src/main/resources/solr /var/solr/data/$vocabulary_version

# copy your JDBC file
COPY --chown=solr $jdbc_file_name /opt/solr-8.11.1/server/lib/$jdbc_file_name
  1. Clone the WebAPI Git repo (using your desired commit or release) into the docker build context folder:
git clone https://github.com/OHDSI/WebAPI.git
  1. Copy your JDBC jar file (for connecting to your vocabulary's database platform) into the build context folder.

  2. Next make the following edits to the files in /WebAPI/src/main/resources/solr:

  • data-config.xml: Edit the data source & references in the query to match the database holding your vocabulary
  • core.properties: Edit the name to match the vocabulary version (e.g. v5.0_20-MAY-21), remove the "conf\\" prefix from config and schema values.
  • conf\solrconfig.xml: Edit the connection information in the block to provide it the details to connect the database holding your vocabulary.
    • if using Spark, add this to this /dataimport block: <str name="autoCommit">true</str>

Build/create/start the docker container

  1. Run the container build step, specifying the vocabulary version and the name of the JDBC file needed for connecting to your database platform:
docker build --build-arg vocabulary_version=v5.0_20-MAY-21 --build-arg jdbc_file_name=SparkJDBC41.jar --no-cache -t solr .
  1. Run the create step, which will then create the container:
docker create --restart=always --name=solr -p 8983:8983 -t solr
  1. Start the container:
docker start solr
  1. Test the container has started by going to http://localhost:8983/solr/#/ (substitute the server name for localhost)

Building the Solr core

  • From the SOLR Admin screen (http://localhost:8983/solr/#/), build the core by using the 'Core Selector' dropdown in the left-hand menu. Select the vocabulary core from the drop down, and then in the sub-menu that appears, select Dataimport and then used the 'Execute' button. The service will build the core from the concepts per the query in the data-config.xml
  • Once the execution of the core indexing is complete, you can use the Solr "Query" tool under the core sub-menu to make sure the core is working properly before moving to WebAPI. A sample query you can use is to enter this in the q field: query:metformin

Then, follow the rest of the WebAPI configuration instructions in the Windows instructions above.

@alondhe
Copy link
Contributor

alondhe commented Oct 15, 2022

With Atlas 2.12 adding invalid_start_date and invalid_end_date, the SOLR core creation now needs to handle these fields.

Edit: never mind, handled already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants