Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DeletionBolt.java for Solr. #1050 #1073

Merged
merged 3 commits into from
May 15, 2023

Conversation

syefimov
Copy link
Contributor

storm-crawler-solr bug. Missing DeletionBolt bolt code. #1050

storm-crawler-solr bug. Missing DeletionBolt bolt code. apache#1050
License header added
@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented May 13, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/DigitalPebble/storm-crawler/1073.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/DigitalPebble/storm-crawler/1073.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@jnioche
Copy link
Contributor

jnioche commented May 14, 2023

Could you please add the bolt to the SolrCrawlTopology so that people can see how to connect it to the other components?

import org.slf4j.LoggerFactory;

public class DeletionBolt extends BaseRichBolt {
/** */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty comment and serialversion

private SolrConnection connection;

public DeletionBolt() {
/* empty */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DeletionBolt extends BaseRichBolt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining how it should be connected to the status updater bolt

@jnioche jnioche merged commit 9920e6b into apache:master May 15, 2023
4 checks passed
@jnioche
Copy link
Contributor

jnioche commented May 15, 2023

thanks @syefimov

@jnioche jnioche added the SOLR label May 15, 2023
@jnioche jnioche added this to the 2.9 milestone May 15, 2023
michaeldinzinger pushed a commit to michaeldinzinger/storm-crawler that referenced this pull request May 22, 2023
* Create DeletionBolt.java

storm-crawler-solr bug. Missing DeletionBolt bolt code. apache#1050

* Update DeletionBolt.java

License header added

* Update DeletionBolt.java

formatting

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
jnioche added a commit that referenced this pull request May 23, 2023
* Remove injection from crawl topologies in *Search archetypes, fixes #1065

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* BasicURLNormalizer .unmangleQueryString() returns invalid results if "&" symbol in a parents path #1059 (#1062)

* Fix unmangleQueryString filter.

Fix unmangleQueryString filter. Do not analyze full URL path, just last child,

* formatting

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Removed remaining references to ES in OPenSearch module

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Dependency upgrades.fixes #1066 (#1067)

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Automatic creation of index definitions should use the bolt type (#1069)

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Maven plugin upgrades + better handling of plugin versions

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* bgufix test jar not attached

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update maven.yml

v3 version of actions

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* mechanism to retrieve more generic value of configuration  (#1071)

* mechanism to retrieve more generic value of configuration if a specific one is not found, fixes #1070

Signed-off-by: Julien Nioche <julien@digitalpebble.com>

* minor javadoc fix

Signed-off-by: Julien Nioche <julien@digitalpebble.com>

---------

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Batch requests in DeleterBolt, fixes #1072

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update README.md

link to docker project

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Create DeletionBolt.java for Solr. #1050 (#1073)

* Create DeletionBolt.java

storm-crawler-solr bug. Missing DeletionBolt bolt code. #1050

* Update DeletionBolt.java

License header added

* Update DeletionBolt.java

formatting

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* SOLR: suppress warnings + minor changes and Javadoc + added deletion to default topology

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Tika 2.8.0, fixes 1066

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Increase the number of redirects to 5 for Robots.txt fetching (#1074)

* Issue #1058: Allow 5 redirects for Robots.txt fetching

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Minor variable renaming

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Add test coverage reports with JaCoCo and Coveralls, fixes #1075

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* #1075 - Add test coverage reports with JaCoCo

Signed-off-by: Richard Zowalla <richard.zowalla@hs-heilbronn.de>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* #1075 - Update GH workflow to reduce log spam by adding -B and --no-transfer-progess maven options

Signed-off-by: Richard Zowalla <richard.zowalla@hs-heilbronn.de>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Rebase - Issue #1042: Forbid all rules by default

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Modify Robots.txt parsing logic and add test cases

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Parse robots txt rules only for status code 200

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Trying to resolve merge conflicts

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Modify Robots.txt parsing logic and add test cases

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Parse robots txt rules only for status code 200

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Merge HttpRobotRulesParserTest

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Signed-off-by: Richard Zowalla <richard.zowalla@hs-heilbronn.de>
Co-authored-by: Julien Nioche <julien@digitalpebble.com>
Co-authored-by: syefimov <syefimov@ptfs.com>
Co-authored-by: Richard Zowalla <richard.zowalla@hs-heilbronn.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants