Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pluginservice #12367

Closed
wants to merge 32 commits into from
Closed

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Jul 21, 2015

Today plugins are all dumped into the main classloader and can easily have jar hell dependencies with each other (great example is the python scripting plugin). Also pluginservice is messy, there is way too much leniency, and most logic is accomplished with heuristics.

Instead we need mandatory metadata that is explicit: enough to easily iterate/manipulate/validate plugins from the filesystem. This simplifies the code and we can move forward to make this stuff really reliable. There are currently also a ton of ways to do things, I removed any esoteric options/unnecessary stuff from this code here: back to basics guys.

Instead of dumping all code into one classloader, plugins can just have a standard URLClassLoader with ES loader as the parent. We don't need any special classloader because we simply don't allow conflicts with the parent (jarhell will simply fail).

Unfortunately some plugins might not be so clean, and share hard dependencies with each other. For now those plugins can set isolated to false until they clean their act up. All plugins that do this stuff just share a loader with each other. We should consider this deprecated since it makes things more complicated than it needs to be. Loose coupling, please.

In all cases we aggressively jarhell check, and all plugin manipulation is underneath securitymanager protection.

With this branch all plugin integration tests are passing, and moreover the python script that loads all of them at once finally passes:

[INFO ][plugins                  ] [smoke_tester] loaded [lang-python, analysis-kuromoji, analysis-smartcn, cloud-gce, analysis-stempel, delete-by-query, cloud-aws, analysis-phonetic, cloud-azure, lang-javascript, analysis-icu], sites []

I disabled most of the unit tests, but they can be fixed / new ones written after sleep. There are also a lot of checks we should add in the future, and PluginManager should reuse some of this logic to prevent installing a plugin that isn't going to work or somehow screw the installation.

Closes #11917

@rmuir rmuir added the WIP label Jul 21, 2015
@uschindler
Copy link
Contributor

+1 Nice work!

rjernst and others added 26 commits July 21, 2015 14:42
…ransportInstanceSingleOperationAction and rely on transport service to execute locally. (forking thread etc.)

Change TransportInstanceSingleOperationAction to have shardActionHandler to, so we can execute locally without endless spinning.
In order to unify the handling and reuse the CLITool infrastructure
the plugin manager should make use of this as well.

This obsolets the -i and --install options but requires the user
to use `install` as the first argument of the CLI.

This is basically just a port of the existing functionality, which
is also the reason why this is not a refactoring of the plugin manager,
which will come in a separate commit.
The new plugin manager parser was not called correctly in the scripts.
In addition the plugin manager now creates a plugins/ directory in case
it does not exist.

Also the integration tests called the plugin manager in the deprecated way.
It asks to double check thread pool rejection. I did and don't see problems with it.
…lled from the update cluster state thread

This is important for correct handling of the joining thread. This causes assertions to trip in our test runs. See http://build-us-00.elastic.co/job/es_g1gc_master_metal/11653/ as an example

Closes elastic#12372
The release and smoke test python scripts used to install
plugins in the old fashion.

Also the BATS testing suite installed/removed plugins in that
way. Here the marvel tests have been removed, as marvel currently
does not work with the master branch.

In addition documentation has been updated as well, where it was
still missing.
There is no need to maintain additional state as to if a primary was allocated post api creation on the index routing table, we hold all this information already in the UnassignedInfo class.
closes elastic#12374
- field names cannot be mapped with `.` in them
- fixed asciidoc issue where the list was not recognized as a list
…ge path

In preparation for the move to building the core zip, tar.gz, rpm, and deb as separate modules, refactored check_license_and_sha.pl to:

* accept a license dir and path to the package to check on the command line
* to be able to extract zip, tar.gz, deb, and rpm
* all packages except rpm will work on Windows
The help files are using a unix based file separator, where as
the test relies on the help being based on the file system separator.

This commit fixes the test to remove all `\r` characters before
comparing strings.

The test has also been moved into its own CliToolTestCase, as it does
not need to be an integration test.
instead fail the failsafe plugin, which means the external cluster will still get shut down
With the `recover_on_any_node` setting, these unit tests check that the
correct node list and versions are returned.
…and do not unnecessarily duplicate Lucene segments in Azure Storage

This commit addresses an issue that was leading to snapshot corruption for snapshots stored as blobs in Azure Storage.

The underlying issue is that in cases when multiple snapshots of an index were taken and persisted into Azure Storage, snapshots subsequent
to the first would repeatedly overwrite the snapshot files. This issue does render useless all snapshots except the final snapshot.

The root cause of this is due to String concatenation involving null. In particular, to list all of the blobs in a snapshot directory in
Azure the code would use the method listBlobsByPrefix where the prefix is null. In the listBlobsByPrefix method, the path keyPath + prefix
is constructed. However, per 5.1.11, 5.4 and 15.18.1 of the Java Language Specification, the reference null is first converted to the string
"null" before performing the concatenation. This leads to no blobs being returned and therefore the snapshot mechanism would operate as if
it were writing the first snapshot of the index. The fix is simply to check if prefix is null and handle the concatenation accordingly.

Upon fixing this issue so that subsequent snapshots would no longer overwrite earlier snapshots, it was discovered that the snapshot metadata
returned by the listBlobsByPrefix method was not sufficient for the snapshot layer to detect whether or not the Lucene segments had already
been copied to the Azure storage layer in an earlier snapshot. This led the snapshot layer to unnecessarily duplicate these Lucene segments
in Azure Storage.

The root cause of this is due to known behavior in the CloudBlobContainer.getBlockBlobReference method in the Azure API. Namely, this method
does not fetch blob attributes from Azure. As such, the lengths of all the blobs appeared to the snapshot layer to be of length zero and
therefore they would compare as not equal to any new blobs that the snapshot layer is going to persist. To remediate this, the method
CloudBlockBlob.downloadAttributes must be invoked. This will fetch the attributes from Azure Storage so that a proper comparison of the
blobs can be performed.

Closes elastic/elasticsearch-cloud-azure#51, closes elastic/elasticsearch-cloud-azure#99
First batch of unit tests to verify the behavior of replica allocator
When performing an operation on a primary, the state is captured and the
operation is performed on the primary shard. The original request is
then modified to increment the version of the operation as preparation
for it to be sent to the replicas.

If the request first fails on the primary during the translog sync
(because the Engine is already closed due to shadow primaries closing
the engine on relocation), then the operation is retried on the new primary
after being modified for the replica shards. It will then fail due to the
version being incorrect (the document does not yet exist but the request
expects a version of "1").

Order of operations:

- Request is executed against primary
- Request is modified (version incremented) so it can be sent to replicas
- Engine's translog is fsync'd if necessary (failing, and throwing an exception)
- Modified request is retried against new primary

This change ignores the exception where the engine is already closed
when syncing the translog (similar to how we ignore exceptions when
refreshing the shard if the ?refresh=true flag is used).
@rjernst
Copy link
Member

rjernst commented Jul 22, 2015

LGTM!

@rmuir rmuir added v2.0.0-beta1 and removed WIP labels Jul 22, 2015
@rmuir rmuir closed this in 4040f19 Jul 22, 2015
@clintongormley clintongormley added >breaking :Core/Infra/Plugins Plugin API and infrastructure labels Jul 23, 2015
@clintongormley clintongormley changed the title refactor pluginservice Refactor pluginservice Jul 23, 2015
imotov added a commit to imotov/elasticsearch-native-script-example that referenced this pull request Jul 23, 2015
imotov added a commit to imotov/elasticsearch-native-script-example that referenced this pull request Jul 28, 2015
imotov added a commit to imotov/elasticsearch-analysis-morphology that referenced this pull request Jul 28, 2015
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Aug 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bin/elasticsearch should include plugins in classpath