Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opensearch module #1011

Merged
merged 10 commits into from
Dec 10, 2022
Merged

Opensearch module #1011

merged 10 commits into from
Dec 10, 2022

Conversation

jnioche
Copy link
Contributor

@jnioche jnioche commented Dec 9, 2022

Port of Elasticsearch resources to OpenSearch.

Most of the content has been preserved, apart from the CollapsingSpout and ScrollSpout. The former's performance was systematically worse than the aggregation based ones, the latter was never used much in practice.

This contains an improvement in that the index schemas are created automatically if they are not already there. It is of course still possible to specify a custom mapping for a given index.

This has been tested with Opensearch 2.4 running with docker-compose.

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
…s indices + port tests from ES module

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
…or explicit generation of index mappings

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
@jnioche jnioche added this to the 2.7 milestone Dec 9, 2022
@@ -0,0 +1,29 @@
OSHOST=${1:-"http://localhost:9200"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2148: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
request.requests().stream()
.map(DocWriteRequest::id)
.collect(Collectors.toUnmodifiableSet());
waitAckLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockNotBeforeTry: Prefer locking immediately before the try block which releases the lock to avoid the possibility of any intermediate statements throwing.


Suggested change
waitAckLock.lock();
waitAckLock.lock();try {

ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

request.requests().stream()
.map(DocWriteRequest::id)
.collect(Collectors.toUnmodifiableSet());
waitAckLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockNotBeforeTry: Prefer locking immediately before the try block which releases the lock to avoid the possibility of any intermediate statements throwing.


Suggested change
waitAckLock.lock();
waitAckLock.lock();try {

ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

return response.getVersion();
}

public <T extends DocWriteResponse> T getResponse() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeParameterUnusedInFormals: Declaring a type parameter that is only used in the return type is a misuse of generics: operations on the type parameter are unchecked, it hides unsafe casts at invocations of the method, and it interacts badly with method overload resolution.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

if (client != null)
try {
client.close();
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmptyCatch: Caught exceptions should not be ignored


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

if (client != null)
try {
client.close();
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmptyCatch: Caught exceptions should not be ignored


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

"Filter " + parsefilterclass + " does not extend ParseFilter");
}

delegatedParseFilter = (ParseFilter) filterClass.newInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClassNewInstance: Class.newInstance() bypasses exception checking; prefer getDeclaredConstructor().newInstance()


Suggested change
delegatedParseFilter = (ParseFilter) filterClass.newInstance();
delegatedParseFilter = (ParseFilter) filterClass.getDeclaredConstructor().newInstance();

ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

"Filter " + urlfilterclass + " does not extend URLFilter");
}

delegatedURLFilter = (URLFilter) filterClass.newInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClassNewInstance: Class.newInstance() bypasses exception checking; prefer getDeclaredConstructor().newInstance()


Suggested change
delegatedURLFilter = (URLFilter) filterClass.newInstance();
delegatedURLFilter = (URLFilter) filterClass.getDeclaredConstructor().newInstance();

ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Map<String, String[]> keyVals = filterMetadata(metadata);

for (String fieldName : keyVals.keySet()) {
String[] values = keyVals.get(fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INEFFICIENT_KEYSET_ITERATOR: Accessing a value using a key that was retrieved from a keySet iterator. It is more efficient to use an iterator on the entrySet of the map, avoiding the extra HashMap.get(key) lookup.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

public void onRemoval(
@Nullable String key, @Nullable List<Tuple> value, @NotNull RemovalCause cause) {
if (!cause.wasEvicted()) return;
LOG.error("Purged from waitAck {} with {} values", key, value.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL_DEREFERENCE: object value could be null and is dereferenced at line 277.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

protected void populateBuffer() {

if (queryDate == null) {
queryDate = new Date();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 4 similar findings have been found in this PR


JavaUtilDate: Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.


🔎 Expand here to view all instances of this finding
File Path Line Number
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/metrics/MetricsConsumer.java 101
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/HybridSpout.java 101
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/AggregationSpout.java 102
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/HybridSpout.java 97

Visit the Lift Web Console to find more details in your report.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Map<String, Object> keyValues = hit.getSourceAsMap();
String url = (String) keyValues.get("url");
// is already being processed - skip it!
if (beingProcessed.containsKey(url)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 46 similar findings have been found in this PR


THREAD_SAFETY_VIOLATION: Read/Write race. Non-private method AbstractSpout.addHitToBuffer(...) reads without synchronization from this.beingProcessed. Potentially races with write in method AbstractSpout.open(...).
Reporting because another access to the same memory occurs on a background thread, although this access may not.


🔎 Expand here to view all instances of this finding
File Path Line Number
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 208
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 237
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/StatusUpdaterBolt.java 270
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/AbstractSpout.java 104
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 213
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 287
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/StatusUpdaterBolt.java 255
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/AbstractSpout.java 163
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 244
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/persistence/AbstractSpout.java 209

Showing 10 of 46 findings. Visit the Lift Web Console to see all.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]


for name in status content
do
curl $OSCREDENTIALS -s -XDELETE "$OSHOST/$name/" > /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 2 similar findings have been found in this PR


SC2086: Double quote to prevent globbing and word splitting.


🔎 Expand here to view all instances of this finding
File Path Line Number
external/opensearch/OS_IndexInit.sh 8
external/opensearch/OS_IndexInit.sh 8

Visit the Lift Web Console to find more details in your report.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

jnioche and others added 3 commits December 9, 2022 17:37
…er/opensearch/persistence/AggregationSpout.java

Co-authored-by: sonatype-lift[bot] <37194012+sonatype-lift[bot]@users.noreply.github.com>
…er/opensearch/persistence/AbstractSpout.java

Co-authored-by: sonatype-lift[bot] <37194012+sonatype-lift[bot]@users.noreply.github.com>
…it had introduced

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
do
curl $OSCREDENTIALS -s -XDELETE "$OSHOST/$name/" > /dev/null
echo "Deleted '$name' index, now recreating it..."
curl $OSCREDENTIALS -s -XPUT "$OSHOST/$name" -H 'Content-Type: application/json' --upload-file src/main/resources/$name.mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2086: Double quote to prevent globbing and word splitting.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

echo "Deleted 'metrics' index, now recreating it..."

# http://localhost:9200/metrics/_mapping/status?pretty
curl $OSCREDENTIALS -s -XPOST "$OSHOST/_template/metrics-template" -H 'Content-Type: application/json' --upload-file src/main/resources/metrics.mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2086: Double quote to prevent globbing and word splitting.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]


final Metadata metadata = (Metadata) tuple.getValueByField("metadata");

if (!filterDocument(metadata)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 2 similar findings have been found in this PR


THREAD_SAFETY_VIOLATION: Read/Write race. Non-private method IndexerBolt.execute(...) indirectly reads without synchronization from this.filterKeyValue. Potentially races with write in method IndexerBolt.prepare(...).
Reporting because this access may occur on a background thread.


🔎 Expand here to view all instances of this finding
File Path Line Number
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 180
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/IndexerBolt.java 213

Visit the Lift Web Console to find more details in your report.


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonatype-lift ignoreall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignoreall command is active on this PR, all the existing Lift issues are ignored.

@jnioche jnioche merged commit 45537d9 into master Dec 10, 2022
@jnioche jnioche deleted the opensearch branch December 10, 2022 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant