Term Explorer

View the GitHub project here or download the latest release here.

Overview

Written By: Jason Wells

Use this script to help you explore the terms present in a Nuix case.

Begin by providing one or more single term wild card expressions and/or fuzzy expressions. Separate multiple expressions with spaces. The provided expressions are then used to find related terms present in the current Nuix case.

Wild card expressions support * (0 or more characters) or ? (1 character).

Expression	Example Term Matches
`cat*`	`cat`, `catalina`, `catch`, `catching`, `category`, `caterpillar`
`*th`	`depth`, `path`, `width`, `with`
`ca?`	`car`, `cat`, `can`
`ca??`	`call`, `card`, `case`, `cats`
`h???th`	`health`, `hearth`

Fuzzy terms expressions take the form TERM~SIMILARITY, with SIMILARITY being a value between 0.0 (not similar) and 1.0 (exactly the same). Providing a blank value for SIMILARITY (i.e. car~) is the same as providing 0.5 (i.e. car~0.5).

Expression	Example Term Matches
`cat~0.5`	`act`, `at`, `bat`, `can`, `car`, `cat`, `cats`, `coat`, `eat`, `hat`, `nat`, `pat`, `rat`, `sat`
`jason~0.5`	`bison`, `jackson`, `jadon`, `jalyn`, `jaron`, `jason`, `jast`, `jevon`, `json`, `larson`, `olson`, `saison`, `samson`, `wasn`
`jason~0.8`	`jadon`, `jaron`, `jasen`, `jason`, `javon`, `jayson`, `mason`

Fuzzy similarity can be calculated using several difference methods:

Nuix: Determines what terms Nuix resolved the given fuzzy search term to. Running the given fuzzy query expression in the Nuix search bar is expected to return the same results as when you take the matched terms found by this script and join them into an OR query (i.e. jadon OR jaron OR jasen OR javon OR ... == jason~0.8). Note that this approach most accurately represents how Nuix would resolve the given fuzzy search to terms, but also takes longer because it is resolved against each responsive item individually. That means the more items responsive to the given fuzzy search, the longer this approach can take to resolve!
Levenstein Distance - Uses Lucene's built in Levenshstein distance string comparison to filter provided fuzzy term against case terms list. Levenshstein distance compares 2 strings by determining the number of edits (insertions, deletions and substitutions) needed to change one string to the other. The Lucene method scales the resulting edit distance to a range between 0.0 and 1.0. The script resolves this type of fuzzy search against the case terms list.
Jaro-Winkler - Uses Lucene's built in Jaro-Winkler string comparison method.
NGram Distance - Uses Lucene's built in NGram distance string comparison method. The script resolves this type of fuzzy search against the case terms list.

Resulting matched terms can be exported to a CSV or added to a running collection of terms you build up (the right hand table). This collection of terms you have selected can then be used in different ways including generating a query from the terms or saving them to a file.

Expression Matches Table

The Expression Matches table is populated with terms that match your expression against the specified locations (content and/or properties) and fall within items responsive to your scope query. The table, and CSVs exported from it, have the following columns.

Column	Description
`Original Expressions`	What expression or expressions you provided led to this term being a match.
`Matched Term`	A term which matched one or more of your provided single term expressions.
`Similarity`	When the provided expression is a fuzzy expression, this column will contain the highest similarity value of all the fuzzy expressions that contributed to matching this term.
`Occurrences`	How many times does this term occur in the scope. A single item may have multiple occurrences of any given term and therefore contribute more than 1 occurence to this count.
`Scope Responsive Items`	Based the scope query and whether you chose to resolve matches against `content` and/or `properties` this will be the count of items within those contraints that have the given matched term. See below for more detail.

When determining the Scope Responsive Items count column, Case.count(String queryString) is used to determine responsive item count, using queries built with the following logic.

Matched Term	Fields	Scope Query	Count Query
`catalina`	`content`	`kind:email`	`(kind:email) AND (content:catalina)`
`catalina`	`properties`	`kind:email`	`(kind:email) AND (properties:catalina)`
`catalina`	`content` and `properties`	`kind:email`	`(kind:email) AND (content:catalina OR properties:catalina)`

Getting Started

Setup

Begin by downloading the latest release of this code. Extract the contents of the archive into your Nuix scripts directory. In Windows the script directory is likely going to be either of the following:

%appdata%\Nuix\Scripts - User level script directory
%programdata%\Nuix\Scripts - System level script directory

Cloning this Repository

This script relies on 2 JAR files, not included in this repository (but is included in releases).

TermExplorerGUI.jar provides the user interface of the script. The source code of this JAR file is included in this repository in the Java sub-directory.

The other JAR file is SuperUtilities.jar which includes the class TermExpander which does the work of taking a given term expression and resolving it to the appropriate related terms. You can head over to the SuperUtilities repository and either download a copy of the source and build it yourself or download an already built release JAR.

License

Copyright 2020 Nuix

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Java		Java
Ruby/TermExplorer.nuixscript		Ruby/TermExplorer.nuixscript
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.TXT		LICENSE.TXT
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Term Explorer

Overview

Expression Matches Table

Getting Started

Setup

Cloning this Repository

License

About

Releases 5

Packages

Languages

License

Nuix/Term-Explorer

Folders and files

Latest commit

History

Repository files navigation

Term Explorer

Overview

Expression Matches Table

Getting Started

Setup

Cloning this Repository

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages