Skip to content

Add unified way of initializing classes via string and configuring them. #943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 9, 2022

Conversation

FelixEngl
Copy link
Contributor

@FelixEngl FelixEngl commented Feb 3, 2022

Hello @jnioche,

I extracted the code for initializing classes from string and configuring them from my fork https://github.com/FelixEngl/storm-crawler/tree/local_version. (This branch contains all fixes/changes that i made. I'll try to extract PR after PR until both are on a same level.)

I don't know how far you got with #937, but I this one has a more reasonable size with the trade-off that it introduces some new warnings due to missing @Contract, @NotNull and @Nullable annotations in various sub-classes and sub-interfaces. These warnings will be fixed by either #937 or a PR in the future.

Best Regards

Felix

Signed-off-by: Felix Engl felix.engl@hotmail.com

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>

/** Replace with URLBufferUtil.createInstance */
@Deprecated
public static URLBuffer getInstance(Map<String, Object> stormConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InlineMeSuggester: This deprecated API looks inlineable. If you'd like the body of the API to be inlined to its callers, please annotate it with @InlineMe. (details)
(at-me in a reply with help or ignore)

*/
@SuppressWarnings("rawtypes")
/** @deprecated Replace with ConfigurableUtil.configure */
@Deprecated
public static <T extends Configurable> List<T> configure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InlineMeSuggester: This deprecated API looks inlineable. If you'd like the body of the API to be inlined to its callers, please annotate it with @InlineMe. (details)
(at-me in a reply with help or ignore)


/** @see ConfigurableUtil for more information */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MissingSummary: A summary fragment is required; consider using the value of the @see block as a summary fragment instead. (details)
(at-me in a reply with help or ignore)

@jnioche
Copy link
Contributor

jnioche commented Feb 4, 2022

hi @FelixEngl did you see my comments in #937? Some of the changes you are proposing here would meet the same objections e.g. use of org.jetbrains annotations.

@FelixEngl
Copy link
Contributor Author

FelixEngl commented Feb 4, 2022

hi @FelixEngl did you see my comments in #937? Some of the changes you are proposing here would meet the same objections e.g. use of org.jetbrains annotations.

No, where are these comments? I didn't see them in that PR-Conversation and I didn't get any kind of notification. Or are there any other tabs that I have to check for notifications like that?
That is my first time working with other people soely with GitHub, usually I'm working alone or in little groups of 5-7 people in the same room. So using GitHub for Code-Reviews is quite new to me.
(Maybe I messed up with my notification-settings? I'll check them up later.)

@jnioche
Copy link
Contributor

jnioche commented Feb 5, 2022

No, where are these comments? I didn't see them in that PR-Conversation and I didn't get any kind of notification. Or are there any other tabs that I have to check for notifications like that?

They were pending, you should be able to see them now

@jnioche
Copy link
Contributor

jnioche commented Feb 8, 2022

No rest for the wicked ;-)

Could you please explain what you are trying to achieve in the description of this PR and why you think it is necessary?

@FelixEngl
Copy link
Contributor Author

No rest for the wicked ;-)

Could you please explain what you are trying to achieve in the description of this PR and why you think it is necessary?

Yes, the next one will be a little bit better with the description. I didn't know it better until recently.
But I learned a lot from our discussions in #937. Thanks again for that opportunity and your patience! :)

Basically:
A lot of classes of SC use some kind of initialisation where they initialize a class by string name. But all of them just copy-paste the very same code or a very similar one. Some of the copied codes are fautly or at least error prone without any useful information for the user/developer when they fail like "failed to initialize".

This PR proposes a unified toolkit for initializing classes from qualified names. The core of this PR is the InitialisationUtil-class.

Proposing a universal way has multiple advantages:

  • No more copy paste code.
  • A clean and direct approach that works for everyone.
  • If the way how a class is initialized by string changes, we can simply adapt the change at a single place and not everywhere.
  • The InitialisationUtil is even useable from developers using SC, therefore writing clean code/extensions by utilizing this for initializing from string names leads to a more secure and clean code in the future.
  • Due to the importance of that feature I decided to create fine grained exceptions with easy to understand messages, making the development/debugging for and with SC much more easier.
  • The new InitialisationUtil even allows for additional checks to assert, that the initialized class does not just implement the said superclass but also some other interface.

@@ -156,19 +144,27 @@ boolean isMatch(final Metadata metadata) {
private final LinkedList<FilteredProtocol> protocols = new LinkedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdkObsolete: It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not. (details)
(at-me in a reply with help or ignore)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FelixEngl could we replace the LinkedList with an ArrayList or ArrayDeque as suggested by Lift?

@FelixEngl
Copy link
Contributor Author

@jnioche Ah, I forgot something:
The other notable change is at core/src/main/java/com/digitalpebble/stormcrawler/protocol/ProtocolFactory.java where I introduce double checket locking for a more secure access of the singleton in a multithreaded context.
Otherwise there could be cases where multiple instances of the singleton are created.

*
* @throws IOException
*/
/** loads the filters from a JSON configuration file */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break end user code (removal of a checked exception), so we need to be clear in the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this break end user code?
It still throws that IOException but i removed the empty comment.

} catch (Exception e) {
throw new RuntimeException("Can't instanciate " + className);
throw new RuntimeException("Can't instanciate " + className, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -85,29 +89,4 @@ default void acked(String url) {
}

default void configure(Map<String, Object> stormConf) {}

/** Returns a URLBuffer instance based on the configuration * */
public static URLBuffer getInstance(Map stormConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break end user code (removal of a public method), so we need to be clear in the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deprecated it at line 42.
I just moved it up. Strange that git doesn't recognize this. :/

@@ -0,0 +1,128 @@
package com.digitalpebble.stormcrawler.util;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misses a license header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes every new class needs a license header

@@ -0,0 +1,7 @@
package com.digitalpebble.stormcrawler.util.exceptions.initialisation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes every new class needs a license header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnioche What about Tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, test classes as well please.

@@ -0,0 +1,3 @@
package com.digitalpebble.stormcrawler.helper.initialisation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

@@ -0,0 +1,21 @@
package com.digitalpebble.stormcrawler.helper.initialisation.base;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

@@ -0,0 +1,3 @@
package com.digitalpebble.stormcrawler.helper.initialisation.base;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

@@ -0,0 +1,4 @@
package com.digitalpebble.stormcrawler.helper.initialisation.base;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

@@ -0,0 +1,168 @@
package com.digitalpebble.stormcrawler.util;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License? wdyt @jnioche

@jnioche
Copy link
Contributor

jnioche commented Feb 8, 2022

Guys, apologies in advance if I come across as being a bit blunt.
I am totally in favour of improving aspect of the code but this will not come at the cost of adding I-don't-know-how-many classes for various exceptions + tons if utility classes and so on, for what is not a new functionality that users will benefit from or a major improvement.
One of the things that I am pleased to have achieved with SC is (I think) a good trade-off between code simplicity and richness of features. I intend to keep it that way.
Can this be done in a way that doesn't involve the addition of so many new classes?
Thanks

@FelixEngl
Copy link
Contributor Author

Guys, apologies in advance if I come across as being a bit blunt. I am totally in favour of improving aspect of the code but this will not come at the cost of adding I-don't-know-how-many classes for various exceptions + tons if utility classes and so on, for what is not a new functionality that users will benefit from or a major improvement. One of the things that I am pleased to have achieved with SC is (I think) a good trade-off between code simplicity and richness of features. I intend to keep it that way. Can this be done in a way that doesn't involve the addition of so many new classes? Thanks

Don't worry about being blunt, I prefer it when someone speaks his/her mind directly instead of dancing around the problem or finding some straw arguments. :D

I could remove the exception classes and only throw runtime exceptions. That would reduce the number of classes.
Would that be enough or are there other classes that are too much too?

@rzo1
Copy link
Contributor

rzo1 commented Feb 8, 2022

I think, that it looks like a lot of classes. Basically, I see 2 (new) utility classes + a lot of test-related classes + (runtime) exception sugar. Guess, that the new runtime exceptions can be dropped.

@FelixEngl
Copy link
Contributor Author

I think, that it looks like a lot of classes. Basically, I see 2 (new) utility classes + a lot of test-related classes + (runtime) exception sugar. Guess, that the new runtime exceptions can be dropped.

Then I'll drop them.
And you are right. Basically we have just 2 new Utility Classes and a lot of tests.
I wanted to test that feature very thoroughly because it's very often used.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>
@FelixEngl
Copy link
Contributor Author

FelixEngl commented Feb 8, 2022

@jnioche I removed the Exceptions. We could also remove the URLBufferUtil if you want.
Then we would basically only introduce two new classes in the code.

But we need the other new classes for testing the initialisation feature, so we can't remove the new classes created in the test source.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>
Signed-off-by: Felix Engl <felix.engl@hotmail.com>
# Conflicts:
#	core/src/main/java/com/digitalpebble/stormcrawler/persistence/urlbuffer/URLBuffer.java
@FelixEngl
Copy link
Contributor Author

FelixEngl commented Feb 8, 2022

@jnioche I removed the Exceptions. We could also remove the URLBufferUtil if you want. Then we would basically only introduce one new class in the code.

But we need the other new classes for testing the initialisation feature, so we can't remove the new classes created in the test source.

Removed the URLBufferUtil too, now we have only two new classes. InitialisationUtil and ConfigurableUtil.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>

/** Replace with {@link URLBuffer#createInstance(Map)} */
@Deprecated
static URLBuffer getInstance(Map<String, Object> stormConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InlineMeSuggester: This deprecated API looks inlineable. If you'd like the body of the API to be inlined to its callers, please annotate it with @InlineMe. (details)
(at-me in a reply with help or ignore)

return filterLists;
/** @deprecated Replace with ConfigurableUtil.configure */
@Deprecated
static <T extends Configurable> List<T> configure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InlineMeSuggester: This deprecated API looks inlineable. If you'd like the body of the API to be inlined to its callers, please annotate it with @InlineMe. (details)
(at-me in a reply with help or ignore)

@jnioche jnioche added this to the 2.3 milestone Feb 9, 2022
@jnioche jnioche self-assigned this Feb 9, 2022
@jnioche
Copy link
Contributor

jnioche commented Feb 9, 2022

@jnioche I removed the Exceptions. We could also remove the URLBufferUtil if you want. Then we would basically only introduce one new class in the code.
But we need the other new classes for testing the initialisation feature, so we can't remove the new classes created in the test source.

Removed the URLBufferUtil too, now we have only two new classes. InitialisationUtil and ConfigurableUtil.

Great! That's way better

InitialisationUtil has no equivalent in the current code, so we definitely need it. ConfigurableUtil is a bit different. I see that you removed the utility code from the interface and kept the latter as a strict interface. I can understand why but to be honest I quite liked having the utility code within the interface class - all in one place. Probably not orthodox, I agree but I thought it was quite elegant in a way. How open would you be to keeping it that way? This is just an extra class, I can also be swayed.

And @rzo1 is right, we need license headers on every new file.

Thanks for your open-mindedness BTW.


public static ProtocolFactory getInstance(Config conf) {

if (single_instance != null) return single_instance;
// https://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such a great bugfix! thanks for that

@FelixEngl
Copy link
Contributor Author

FelixEngl commented Feb 9, 2022

@jnioche I removed the Exceptions. We could also remove the URLBufferUtil if you want. Then we would basically only introduce one new class in the code.
But we need the other new classes for testing the initialisation feature, so we can't remove the new classes created in the test source.

Removed the URLBufferUtil too, now we have only two new classes. InitialisationUtil and ConfigurableUtil.

Great! That's way better

InitialisationUtil has no equivalent in the current code, so we definitely need it. ConfigurableUtil is a bit different. I see that you removed the utility code from the interface and kept the latter as a strict interface. I can understand why but to be honest I quite liked having the utility code within the interface class - all in one place. Probably not orthodox, I agree but I thought it was quite elegant in a way. How open would you be to keeping it that way? This is just an extra class, I can also be swayed.

And @rzo1 is right, we need license headers on every new file.

Thanks for your open-mindedness BTW.

I didn't have a problem with the static function (tbh I like putting creation calls for interfaces there too) but i had a problem with the static LOG reference in the interface. That was used by one of subclasses, and that is quite confusing when logging and you only see the interface-reference.

What about hiding the ConfigUtil in package scope and keeping the original creation method in the interface?
That way we can assure that the Logger for the configure is only called by the Configure and nowhere else.

…tionUtil to ParserBolt. Use ArrayList in DelegatorProtocol.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>
@FelixEngl
Copy link
Contributor Author

FelixEngl commented Feb 9, 2022

@jnioche I implemented the hidden-util-class approach, I think this should make both of us happy.
You are/were right, it feels more natural when you use a static function in the interface for that instead of an util-class. :)
Furthermore I added to all files the license and replaced the LinkedList.

Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one minor point about returning an immutable List

@@ -366,7 +363,7 @@ public void declareOutputFields(OutputFieldsDeclarer declarer) {
// components check whether the URL is valid
LOG.error("MalformedURLException on {}", parentURL);
eventCounter.scope("error_invalid_source_url").incrBy(1);
return new LinkedList<Outlink>();
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list can get things added to it later. it must not be immutable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Felix Engl <felix.engl@hotmail.com>
@jnioche jnioche merged commit fd2d4dc into apache:master Feb 9, 2022
@FelixEngl FelixEngl deleted the configurable branch February 9, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants